Using SQLite for great science, part 1 (or 3, I guess).

Some of you might remember the great MongoDB saga, which ended with me migrating from MongoDB to SQLite after losing my data more often than not. After the Nth time I lost my data, I decided I had enough and decided to migrate to SQLite. I also decided not to use MongoDB for historious, as I had originally planned. I wanted to share my story with the world, just to give people another use case.

This is that story.

For reference, everything below ran on a Core 2 Duo MacBook Santa Rosa with 2 GB of RAM and an intel X25-M 160GB SSD.

Shortly after deciding to migrate to SQLite, I wrote a simple program to get documents from MongoDB and insert them into an SQLite database using Elixir (and, by extension, SQLAlchemy). I used transactions and did one commit every 1,000 documents, to speed things up.

After my first run, however, I discovered that it took ages to complete 1,000 insertions. It wasn’t unbearably slow, but it was slow enough that it would take two days to insert the 3,000,000 user rows I had, not to mention the other 70,000,000 rows of relationships I had to insert later.

For a short while I considered leaving my data in MongoDB and just making many, many backups, but then I decided to persevere and try to see if postgres was any faster on my single-user use case. In short, it wasn’t. Insertions took about the same time on the two databases. I did notice, however, that the script was CPU-bound, which didn’t seem right. I figured that most time should be spent in memory I/O instead of CPU, so I tried to bypass the ORM and use raw SQL statements.

That sped up my program by more than ten times. Where MongoDB took about 5 seconds for 1,000 insertions, SQLite takes 0.3. That is more than an entire order of magnitude! To be fair to MongoDB, however, I was using pymongo, which I’m sure took a fair chunk of CPU to run. I think that pymongo should be faster than Elixir, since it has fewer things to do, but I might be way off on this.

During the course of the migration, I killed SQLite many a time while it was writing. My database did get corrupt once (because I killed it and then deleted the journal, so you can’t really blame SQLite for it), but all I had to do is dump and restore it, and all my data was back (or, at least, most of it, enough that I didn’t notice any missing). This is in contrast to MongoDB, which lost, over the time I used it, more data than it stored (by my count, literally).

In summary, I am very very impressed with SQLite. 3,000 insertions a second is nothing to sneer at (this is with checking for duplicates and failing if they exist, so they aren’t pure inserts), and it provides strong durability in one, self-contained file. I would still use MongoDB, but only if I really really needed a schemaless database, I had only a few thousand rows and I didn’t care about the data in it at all. For anything else, I would use an ACID-compliant database, and SQLite is a great fit if the use is single-user. I’m not sure how SQLite fares on multi-user cases because I haven’t tested it yet, but for all other uses and for data gathering, it has exceeded even my most optimistic expectations.

As always, this is in no way meant to be objective, and this should go without saying. It is just what I encountered, and it might not be representative at all of the norm. Also, I have no axe to grind with any database, and I started with MongoDB with the best intentions. I hope that someday MongoDB can reach the level of ACID-compliant databases, not only in terms of usefulness but also of speed. More robust, open software benefits the community as a whole.

Stavros' Stuff

On programming and other things.

Using SQLite for great science, part 1 (or 3, I guess).

Conceived on Jun 14, 2010

Stavros

Guy who likes computers

Connect with me

This site is part of the webring:

Recent Posts

Made with ♥ in Greece