Enhanced AQL in ArangoDB 2.5: Improved Query Capabilities
Contained in 2.5 are some small but useful AQL language improvements plus several AQL optimizer improvements.
We are working on further AQL improvements for 2.5, but work is still ongoing. This post summarizes the improvements that are already completed and will be shipped with the initial ArangoDB 2.5 release.
Language improvements
Dynamic attribute names
Often the need arises to dynamically name object attributes in return values. In AQL this was not directly possible so far, though there were some workarounds available to achieve about the same result. The recipe about dynamic attribute names summarizes the options that are available to pre-ArangoDB 2.5 users.
With ArangoDB 2.5, dynamic attribute names can be constructed much more easily and flexibly. Object attribute names in ArangoDB 2.5 can be specified using static string literals, bind parameters, and dynamic expressions.
Dynamic expressions are most interesting, and to disambiguate them from other regular string literal attribute names, dynamic attribute names need to be enclosed in square brackets ([
and ]
). I have written about that before in this blog.
Here is an example query that uses the new syntax:
FOR i IN [ 17, 23, 42, 83 ]
RETURN { [ CONCAT('value-of-', i, ' * ', i) ] : i * i }
This will produce:
[
{
"value-of-17 * 17" : 289
},
{
"value-of-23 * 23" : 529
},
{
"value-of-42 * 42" : 1764
},
{
"value-of-83 * 83" : 6889
}
]
Functions added
The following AQL functions have been added in 2.5:
MD5(value)
: produces the MD5 hash ofvalue
SHA1(value)
: produces the SHA1 hash ofvalue
RANDOM_TOKEN(length)
: produces a pseudo-random string of the specified length. Such strings can be used for id or token generation. Tokens consist only of letters (lower and upper case) plus digits, so they are also URL-safe
Optimizer improvements
Optimizer rules
The following AQL optimizer rules have been added in ArangoDB 2.5:
propagate-constant-attributes
This rule will look inside FILTER
conditions for constant value equality comparisons, and insert the constant values in other places in FILTER
s. For example, the rule will insert 42
instead of i.value
in the second FILTER
of the following query:
FOR i IN c1
FOR j IN c2
FILTER i.value == 42
FILTER j.value == i.value
RETURN 1
move-calculations-down
This rule moves calculations down in the execution plan as far as possible. The intention is to move calculations beyond filters, in order to avoid calculations and computations for documents that will be filtered away anyway.
If a query contains a lot of computations and a lot of documents will be skipped because of filters, this rule might provide a big benefit.
A more detailed example is provided in this post.
The already existing optimizer rule use-index-for-sort
was also improved in the following way:
- the rule can now remove
SORT
s also in case a non-sorted index (i.e. a hash index) is used for an equality lookup and all sort attributes are covered by the index. - the rule can also remove
SORT
s in case the sort critieria excludes the left-most index attributes, but the left-most index attributes are used in aFILTER
for equality-only lookups.
Here is an example that will use an existing skiplist index on [ value1
, value2
] for sorting, removing the extra SORT
:
FOR doc IN collection
FILTER doc.value1 == 1
SORT doc.value2
RETURN doc
Index usage
The AQL optimizer now supports sparse indexes, a feature added in 2.5.
It will use them automatically in queries when appropriate and when safe. Sparse indexes do exclude certain documents purposely, so the optimizer always has to figure out whether it can use a sparse index to satisfy a given FILTER
condition.
The optimizer will also take into account index selectivity estimates when there are multiple index candidates.
Estimates
The optimizer estimates for the number of documents to be returned by a query or a subquery are more accurate now for several types of queries. For example, if the optimizer can use a primary key, an edge index, or a hash index in a given query part, it will use the index selectivity estimates for calculating the number of return documents.
These estimates will be a lot more accurate than the previoulsy hard-coded filtering factors, and can lead to better optimizer decisions and reporting (because estimates are returned in explain
results, too).
Memory savings
Finally, the optimizer will now detect if the data-modification part in a data-modification query can be executed in lockstep with the data-retrieval part of the same query. Previously, a data-modification query always executed its data-retrieval part first, and then executed its data-modification part. This could have resulted in big intermediate result sets which to retrieval part constructed in order to pass them to the modification part of the query.
Here’s an example query:
FOR doc IN test
INSERT doc INTO backup
In the above query, the FOR
loop is the retrieval part, and the INSERT
is the modification part. The optimizer in 2.5 will check if the two parts of the query are independent, and if it turns out they are, will execute them in lockstep instead of sequentially.
The execution in lockstep is not necessarily faster than sequential execution, but it can save lots of memory if the data-retrieval part constructed big intermediate result sets.
Miscellaneous changes
The AQL query execution statistics now also provide an attribute filtered
. Its value indicates how many documents were filtered by FilterNodes
in the AQL query. This can be used as an indicator for whether indexes should be added, and for how effective indexes are used for filtering.
Get the latest tutorials, blog posts and news: