Mastering AQL: Return Distinct Values | ArangoDB Blog
Last week saw the addition of the RETURN DISTINCT
for AQL queries. This is a new shortcut syntax for making result sets unique.
For this purpose it can be used as an easier-to-memorize alternative for the already existing COLLECT
statement. COLLECT
is very flexible and can be used for multiple purposes, but it is syntactic overkill for making a result-set unique.
The new RETURN DISTINCT
syntax makes queries easier to write and understand.
Here’s a non-scientific proof for this claim:
Compare the following queries, which both return each distinct age
attribute value from the collection:
FOR doc IN collection
COLLECT age = doc.age
RETURN age
With RETURN DISTINCT
:
FOR doc IN collection
RETURN DISTINCT doc.age
Clearly, the query using RETURN DISTINCT
is more intuitive, especially for AQL beginners. Apart from that, using RETURN DISTINCT
will save a bit of typing compared to the longer COLLECT
-based query.
Internally both COLLECT
and RETURN DISTINCT
will work by creating an AggregateNode
. The optimizer will try the sorted and the hashed variants for both, so they should perform about the same.
However, the result of a RETURN DISTINCT
does not have any guaranteed order, so the optimizer will not insert a post-SORT
for it. It may do so for a regular COLLECT
.
As mentioned before, COLLECT
is more flexible than RETURN DISTINCT
. Notably, COLLECT
is superior to RETURN DISTINCT
when the result set should be made unique using more than one criterion, e.g.
FOR doc IN collection
COLLECT status = doc.status, age = doc.age,
RETURN { status, age }
This is currently not achievable via RETURN DISTINCT
, as it only works with a single criterion.
Get the latest tutorials, blog posts and news: