Some of you might remember the great MongoDB saga, which ended with me migrating from MongoDB to SQLite after losing my data more often than not. After the Nth time I lost my data, I decided I had enough and decided to migrate to SQLite. I also decided not to use MongoDB for historious, as I had originally planned. I wanted to share my story with the world, just to give people another use case.
This is that story.
For reference, everything below ran on a Core 2 Duo MacBook Santa Rosa with 2 GB of RAM and an intel X25-M 160GB SSD.
Shortly after deciding to migrate to SQLite, I wrote a simple program to get documents from MongoDB and insert them into an SQLite database using Elixir (and, by extension, SQLAlchemy). I used transactions and did one commit every 1,000 documents, to speed things up.
After my first run, however, I discovered that it took ages to complete 1,000 insertions. It wasn’t unbearably slow, but it was slow enough that it would take two days to insert the 3,000,000 user rows I had, not to mention the other 70,000,000 rows of relationships I had to insert later.
For a short while I considered leaving my data in MongoDB and just making many, many backups, but then I decided to persevere and try to see if postgres was any faster on my single-user use case. In short, it wasn’t. Insertions took about the same time on the two databases. I did notice, however, that the script was CPU-bound, which didn’t seem right. I figured that most time should be spent in memory I/O instead of CPU, so I tried to bypass the ORM and use raw SQL statements.
That sped up my program by more than ten times. Where MongoDB took about 5 seconds for 1,000 insertions, SQLite takes 0.3. That is more than an entire order of magnitude! To be fair to MongoDB, however, I was using pymongo, which I’m sure took a fair chunk of CPU to run. I think that pymongo should be faster than Elixir, since it has fewer things to do, but I might be way off on this.
During the course of the migration, I killed SQLite many a time while it was writing. My database did get corrupt once (because I killed it and then deleted the journal, so you can’t really blame SQLite for it), but all I had to do is dump and restore it, and all my data was back (or, at least, most of it, enough that I didn’t notice any missing). This is in contrast to MongoDB, which lost, over the time I used it, more data than it stored (by my count, literally).
In summary, I am very very impressed with SQLite. 3,000 insertions a second is nothing to sneer at (this is with checking for duplicates and failing if they exist, so they aren’t pure inserts), and it provides strong durability in one, self-contained file. I would still use MongoDB, but only if I really really needed a schemaless database, I had only a few thousand rows and I didn’t care about the data in it at all. For anything else, I would use an ACID-compliant database, and SQLite is a great fit if the use is single-user. I’m not sure how SQLite fares on multi-user cases because I haven’t tested it yet, but for all other uses and for data gathering, it has exceeded even my most optimistic expectations.