Making Subqueries fast

Subqueries in AQL are an expressive and useful tool when writing queries. And now with a rewrite of the query execution for release 3.6, they are also pretty fast.

After laying the foundation with a major rework of the AQL internals in 3.5, we now improved the way subqueries are executed. This was achieved by allowing batching beyond the boundaries of a single subquery. We call it subquery splicing.

How much of a benefit this is depends, of course, a lot on the actual query. If a subquery produces several hundred or thousands of result rows, the effect will probably be negligible. On the other hand, for several subqueries that produce smaller amount of result rows, the difference is pretty significant. This is particularly true in a cluster, because a lot of HTTP requests and round trips can be saved by effective batching.

Benchmarks

One very common example for using subqueries are (left outer) joins like this one:

FOR doc IN someCollection
    LET others =(
        FOR x IN otherCollection
            FILTER doc.foreignKey == x._key 
            RETURN x
    )
    RETURN {doc, others}

Such queries are now noticeably faster on a single server because the subquery can be inlined. In a cluster, it can be up to 30 times faster! For some nested subqueries, this can be even higher, though we’d like to restrain ourselves to more common query patterns for this test.

In the following, three examples are presented that we benchmarked which show how significantly the new execution scheme improves subquery performance. The same queries are run with and without the optimization rule splice-subqueries. Disabling the rule forces the pre-3.6 execution scheme.

Self-join

The following query iterates over a collection, and for every document does a subquery which looks up the documents with the same attribute value via an index. For each document, the number of documents with a matching value is returned.

FOR c IN coll
    LET sub = (FOR s IN coll FILTER s.value1 == c.value1 RETURN s)
    RETURN LENGTH(sub)

Side-by-side joins

This is similar to the above query, but additionally looks up all documents where another field matches for each document, and returns, per document, the summed number of matches.

FOR c IN coll
    LET sub1 = (FOR s IN coll FILTER s.value1 == c.value1 RETURN s)
    LET sub2 = (FOR s IN coll FILTER s.value2 == c.value1 RETURN s)
    RETURN LENGTH(sub1) + LENGTH(sub2)

Nested joins

The last query also iterates over a whole collection, and for each document executes a subquery, which in turn does five different lookups in separate subqueries.

FOR c IN coll
    LET sub = (
        FOR s IN 1..5
            LET subsub = (FOR t IN coll FILTER t.value1 == c.value1 + s RETURN t)
            FILTER LENGTH(subsub) > 0
            RETURN s
    )
    RETURN LENGTH(sub)

To put the results into perspective, note that each query returns one result per document in the test collection. The three queries make 2, 3, and 6 document lookups times the collection size, respectively – that’s e.g. 200k, 300k and 600k lookups for a collection with 100k documents. The result set will, for all three queries, be of the same size as the whole collection, and the collection will be read multiple times in full. Thus the execution times are relatively high; this is so we get reliable and comparable results for the performance test. In principle, speedups are similar for smaller result sets as well.

Single server tests

All single server tests are run against a collection with 100.000 documents.

testname	collection	runs	avg (s)	avg speedup
Self join, without splicing	100k	5	0.5680
Self join, with splicing	100k	5	0.4875	1.17
Side-by-side joins, without splicing	100k	5	1.6362
Side-by-side joins, with splicing	100k	5	0.9170	1.78
Nested joins, without splicing	100k	5	3.2399
Nested joins, with splicing	100k	5	2.7808	1.17

Cluster tests

The cluster tests are done on a cluster with three machines, each running one Agent, Coordinator and DB-Server. The test collection contains 10.000 documents. Note that the lookups done in the collection cannot make use of any document locality, so every lookup has to be done on every DB-Server.

testname	collection	runs	avg (s)	avg speedup
Self join, without splicing	10k	5	36.4326
Self join, with splicing	10k	5	1.2663	28.8
Side-by-side joins, without splicing	10k	5	76.1388
Side-by-side joins, with splicing	10k	5	2.7961	27.2
Nested joins, without splicing	10k	5	196.8784
Nested joins, with splicing	10k	5	7.7322	25.5

Performance tests

If you would like to verify the performance yourself, the tests quoted above are
part of the ArangoDB simple performance tests repository. These tests were done with ArangoDB v3.6.0-rc1.

What do I have to do as a user

Nothing, apart from upgrading to 3.6. The optimizer will enable the new subqueries automatically (but see caveats below). Look out for the optimizer rule splice-subqueries in the explain or profile output.

As always, make sure to take a backup before upgrading!

Caveats

The new execution scheme for subqueries cannot yet be used in some cases, particularly in conjunction with LIMIT or COLLECT WITH COUNT INTO operations. See the description of the optimizer rule splice-subqueries in the AQL optimizer rules documentation for details.

We are already working on lifting this restriction, and you can expect it to be publicly available in the next minor version of ArangoDB.

Fireside Chat – Powering GenAI: The Critical Foundations for Scale. Watch Now