So the problem was to make the following little algorithm really really fast:
- loop over many items
- for each item, do a couple of million database queries
- each query was looking for a different rowid
My initial solution was slow. Very slow.
Using SQLite, the super-fast and super-lightweight SQL server, and the standard practice of using prepared statements and persistent DB connections didn’t help at all. I was just doing an insane amount of queries and it couldn’t handle it. The creation of a DB cursor was killing me (among other things). The thing is, most of the time I was expecting it to return an empty result set–that is, I was wasting all this time creating DB cursors and empty result sets when I was mostly just testing for the existence of a record.
My first attempt at speeding this up was to create a set (as in a std::set) in memory to hold all my key values. I thought that searching a set would be much quicker since it has less overhead than a DB query. Well, unfortunately, it seems that STL sets do have a lot of overhead. I replaced DB cursors with iterators. It did speed things up slightly, but not nearly enough.
Going into more complex data structures revealed Judy Arrays, a sort of trie on steroids. Since I only needed to test for existence with it, I was able to use the Judy1 variant, which is essentially a bit-set. I was quite impressed with the space requirements, at 1,000,000 entries, each entry was using (on average) 2 bytes. Although this is larger than a standard bit-set, the Judy array lets me resize it on-the-fly which I cannot do with a standard bit-set. So, for small data-sets, it uses very little space, and for larger data-sets it scales up nicely.
So, the space is good, but what really sold me was the speed. These things were fast. The official site says it is optimized to avoid cache-line fills. Their basic rational is that if, for example, a cache-miss takes at least 50 cycles and you can avoid it with more code that runs in at most 50 cycles, you win. This does come at a cost though since the code to do this is horribly complex. At 20,000 lines, some critics say that it is too complex. My thought is that I don’t care about the code in a library so long as it works and works fast–also, when’s the last time you looked at STLs code and said “wow, that’s so simple and short?”
So, in the end, using a Judy array in my project increased speed by a few orders of magnitude. For more information about the library or the technical rational, check out the official site: http://judy.sourceforge.net/.