Index Speedups in ArangoDB 2.8: Enhancements

November 27 2015,/Performance

The upcoming 2.8 version of ArangoDB will provide several improvements in the area of index usage and query optimization.

First of all, hash and skiplist indexes can now index individual array values. A dedicated post on this will follow shortly. Second, the query optimizer can make use multiple indexes per collection for queries with OR-combined filter conditions. This again is a subject for another post. Third, there have been some speed improvements due to changes in the general index handling code. This is what this post is about.

In order to assess the speedups in 2.8, I have run some already existing performance tests that I initially ran when comparing ArangoDB 2.5 with 2.6. The test cases and methodology are detailed in this earlier blog post.

For measuring the index-related performance improvements, I simply re-ran the index related tests in 2.7 and in 2.8 / devel. I did not bother re-running all tests from the original blog article because only some are index-related. In particular, I only ran these tests again:

join-key: for each document in the collection, perform a join on the _key attribute on the collection itself (i.e. FOR c1 IN @@c FOR c2 IN @@c FILTER c1._key == c2._key RETURN c1)
join-id: ditto, but perform the join using the _id attribute
join-hash-number and join-hash-string: ditto, but join using a hash index on a numeric or string attribute
join-skiplist-number and join-skiplist-string: ditto, but join using a skiplist index on a numeric or string attribute lookup-key, lookup-hash-number, lookup-hash-string, lookup-skiplist-number, lookup-skiplist-string: compile an IN-list of 10,000 lookup values and search these 10,000 documents in the collection using either the primary index (_key attribute), a hash index or a skiplist index. The latter two are tested on numeric and string attributes.

The test queries were run 5 times each on collections containing 10,000, 100,000 and 1,000,000 documents.

Here are the query execution times from 2.7 and 2.8 for the individual tests, in seconds (less is better):

test name                collection     avg 2.7 (s)      avg 2.8 (s)
--------------------------------------------------------------------
join-key                 10k                 0.0997           0.0612
join-key                 100k                1.0611           0.6538
join-key                 1000k              10.2975           6.2507
join-id                  10k                 0.1088           0.0606
join-id                  100k                1.1126           0.6694
join-id                  1000k              10.8180           6.5336
join-hash-number         10k                 0.1044           0.0679
join-hash-number         100k                1.1137           0.7430
join-hash-number         1000k              10.7923           7.1534
join-hash-string         10k                 0.1193           0.0712
join-hash-string         100k                1.2656           0.7915
join-hash-string         1000k              12.3075           7.6667
join-skiplist-number     10k                 0.1387           0.1030
join-skiplist-number     100k                1.6693           1.3062
join-skiplist-number     1000k              18.0406          15.5203
join-skiplist-string     10k                 0.1928           0.1469
join-skiplist-string     100k                2.3166           1.8997
join-skiplist-string     1000k              27.1513          23.2058
lookup-key               10k                 1.4996           1.5245
lookup-key               100k                1.4951           1.5189
lookup-key               1000k               1.5545           1.5662
lookup-hash-number       10k                 1.5572           1.5526
lookup-hash-number       100k                1.5595           1.5435
lookup-hash-number       1000k               1.5436           1.5648
lookup-hash-string       10k                 1.6023           1.5623
lookup-hash-string       100k                1.5892           1.5741
lookup-hash-string       1000k               1.5841           1.5770
lookup-skiplist-number   10k                 1.5978           1.6145
lookup-skiplist-number   100k                1.5782           1.6269
lookup-skiplist-number   1000k               1.5891           1.6258
lookup-skiplist-string   10k                 1.6443           1.6840
lookup-skiplist-string   100k                1.6787           1.6985
lookup-skiplist-string   1000k               1.7319           1.7562
in-key                   10k                 0.1076           0.0754
in-key                   100k                0.1083           0.0775
in-key                   1000k               0.1104           0.0773
in-hash-number           10k                 0.0898           0.0696
in-hash-number           100k                0.0889           0.0674
in-hash-number           1000k               0.0877           0.0675
in-hash-string           10k                 0.1174           0.0867
in-hash-string           100k                0.1188           0.0878
in-hash-string           1000k               0.1174           0.0853
in-skiplist-number       10k                 0.1095           0.0849
in-skiplist-number       100k                0.1110           0.0873
in-skiplist-number       1000k               0.1106           0.0910
in-skiplist-string       10k                 0.1744           0.1315
in-skiplist-string       100k                0.1990           0.1594
in-skiplist-string       1000k               0.2369           0.1968

It looks like joins and IN list lookups got significantly faster in 2.8, whereas the performance for point lookups is more or less the same as in 2.7.

Note that the changes to the index code in 2.8 only affect how indexes are accessed from within AQL queries and how filtering works. No changes have been made for other index operations such as insert, updates, and removals.

Jan Steemann

After more than 30 years of playing around with 8 bit computers, assembler and scripting languages, Jan decided to move on to work in database engineering. Jan is now a senior C/C++ developer with the ArangoDB core team, being there from version 0.1. He is mostly working on performance optimization, storage engines and the querying functionality. He also wrote most of AQL (ArangoDB’s query language).

November 27 2015,Jan Steemann

1 Comments

Haseb Ansari on January 14 2016, at 3:47 pm

multiple attributes indexing ???? HOWWWW??????

Reply

Fireside Chat – Powering GenAI: The Critical Foundations for Scale. Watch Now

Index Speedups in ArangoDB 2.8: Enhancements

Jan Steemann

1 Comments

Leave a Comment Cancel Reply

Tags

Quick Links

Info

About Us

Stay In Touch