ArangoDB 2.8: New Features and Enhancements
We welcome 2016 with our first big news yet – the release of ArangoDB 2.8!
Now you can use new AQL keywords to traverse a graph even more convenient – a big deal for those who like to get the maximum out of their connected data. ArangoDB is getting faster with every iteration, in this release we have implemented several AQL functions and arithmetic operations in super-fast C++ code, optimizer rules and indexing improved further to help you getting things done faster. Download ArangoDB 2.8 here.
Array Indexes
The added Array Indexes are a major improvement to ArangoDB that you will love and never want to miss again. Hash indexes and skiplist indexes can now be defined for array values as well, so it’s freaking fast to access documents by individual array values. Let assume you want to retrieve articles that are tagged with “graphdb”, you can now use an index on the tags array:
{
text: "Here's what I want to retrieve...",
tags: [ "graphdb", "ArangoDB", "multi-model" ]
}
An added hash-index on tags (ensureHashIndex("tags[*]")
) can be used for finding all documents having "graphdb"
somewhere in their tags array using the following AQL query:
FOR doc IN documents
FILTER "graphdb" IN doc.tags[*]
RETURN doc
Have fun with these new indexes!
AQL Graph Traversal
Next, the mentioned AQL graph traversals. The query language AQL adds the keywords GRAPH
, OUTBOUND
, INBOUND
and ANY
for use in graph traversals. Using plain AQL in ArangoDB 2.8 you can create a shopping list for your friends birthday gifts, related to products they already own and up to 5 ideas ordered by price.
FOR friend IN OUTBOUND @me isFriendOf
LET toBuy = (
FOR bought IN OUTBOUND friend hasBought
FOR combinedProduct IN OUTBOUND bought combinedProducts
SORT combinedProduct.price
LIMIT 5
RETURN combinedProduct
)
RETURN { friend, toBuy }
You can improve this list by limiting the result to friends that have a birthday within the next 2 months (assuming
birthday: "1970-01-15"
).
LET maxDate = DATE_ADD(DATE_NOW(), 2, 'months')
...
FILTER DATE_ISO8601(DATE_YEAR(DATE_NOW()),DATE_MONTH(friend.birthday),DATE_DAY(friend.birthday)) < maxDate
Using Dave
as a bind parameter for @me, we get the following result for our shopping tour:
[
{
"friend": {
"name": "Julia",
"_id": "users/Julia",
"_rev": "1868379126",
"_key": "Julia"
},
"toBuy": [
{
"price": 12,
"name": "SanDisk Extreme SDHC UHS-I/U3 16GB Memory Card",
"_id": "products/SanDisk16",
"_rev": "2012820470",
"_key": "SanDisk16"
},
{
"price": 21,
"name": "Lightweight Tripod 60-Inch with Bag",
"_id": "products/Tripod",
"_rev": "2003514358",
"_key": "Tripod"
},
{
"price": 99,
"name": "Apple Pencil",
"_id": "products/ApplePencil",
"_rev": "2019177462",
"_key": "ApplePencil"
},
{
"price": 169,
"name": "Smart Keyboard",
"_id": "products/SmartKeyboard",
"_rev": "2020160502",
"_key": "SmartKeyboard"
}
]
},
{
"friend": {
"name": "Debby",
"city": "Dallas",
"_id": "users/Debby",
"_rev": "1928803318",
"_key": "Debby"
},
"toBuy": [
{
"price": 12,
"name": "Lixada Bag for Self Balancing Scooter",
"_id": "products/LixadaScooterBag",
"_rev": "2018194422",
"_key": "LixadaScooterBag"
}
]
}
]
Usage of these new keywords as collection names, variable names or attribute names in AQL queries will not be possible without quoting. For example, the following AQL query will still work as it uses a quoted collection name and a quoted attribute name:
FOR doc IN `OUTBOUND`
RETURN doc.`any`
Please have a look in the documentation for further details.
Syntax for managed graphs:
FOR vertex[, edge[, path]] IN MIN [..MAX] OUTBOUND|INBOUND|ANY startVertex GRAPH graphName
Working on collection sets:
FOR vertex[, edge[, path]] IN MIN[..MAX] OUTBOUND|INBOUND|ANY startVertex edgeCollection1, .., edgeCollectionN
AQL COLLECT … AGGREGATE
Additional, there is a cool new aggregation feature that was added after the beta releases. AQL introduces the keyword AGGREGATE
for use in AQL COLLECT
statements.
Using AGGREGATE
allows more efficient aggregation (incrementally while building the groups) than previous versions of AQL, which built group aggregates afterwards from the total of all group values.
AGGREGATE
can be used inside a COLLECT
statement only. If used, it must follow the declaration of grouping keys:
FOR doc IN collection
COLLECT gender = doc.gender AGGREGATE minAge = MIN(doc.age), maxAge = MAX(doc.age)
RETURN { gender, minAge, maxAge }
or, if no grouping keys are used, it can follow the COLLECT
keyword:
FOR doc IN collection
COLLECT AGGREGATE minAge = MIN(doc.age), maxAge = MAX(doc.age)
RETURN { minAge, maxAge }
Only specific expressions are allowed on the right-hand side of each AGGREGATE
assignment:
- on the top level the expression must be a call to one of the supported aggregation functions
LENGTH
,MIN
,MAX
,SUM
,AVERAGE
,STDDEV_POPULATION
,STDDEV_SAMPLE
,VARIANCE_POPULATION
, orVARIANCE_SAMPLE
- the expression must not refer to variables introduced in the
COLLECT
itself
Within the last weeks we have already published blog posts on several new features and enhancements in ArangoDB 2.8. So have a look at AQL function speedups, automatic deadlock detection (which is backported to 2.7.5 as well). The blog post about using multiple indexes per collection is worth to read, as well as the index speedups article. In the web interface you can now use bind parameters in the AQL editor.
There is a lot more to read in the changelog of ArangoDB 2.8 and we will proceed with the presentation of some features in detailed blog posts. You can find the latest documentation on docs.arangodb.com.
4 Comments
Leave a Comment
Get the latest tutorials, blog posts and news:
Does the new traversal syntax (INBOUND, OUTBOUND) support any options? E.g. how to treat duplicates, visitor function and such?
Hi, the intention of the new traversal syntax is to be more simple and use less options than the function-style traversal. This simplification allows us internally to use several shortcuts and optimizations and allows for an easier entry point for a user. Also we think they cover a lot of use-cases already.
Therefore there not (yet) any plan for further options except the direction and the steps for these traversals. The “visitor” function should be implemented in the later AQL statements (which covers every visitor just returning or counting attributes). Duplicates can be removed using the DISTINCT modifier.
However if these features are not powerful enough you can still use the function-style traversals which are indeed more generic and powerful, but also more complicated to configure.
About the visitor function, just to make it clear, are you saying that one would e.g. have code like:
let vertices = (AQL using outbound/inbound/any with optional DISTINCT and filter etc)
Then just run a regular for v in vertices VISITOR(v)
something like that?
If you want to use DISTINCT this is exactly the way to go, DISTINCT is executed in the return step so you need to save the distinct set of vertices before executing the VISITOR on all of them.
If you do not want to use DISTINCT you can also write:
FOR v IN OUTBOUND @start @@edge FILTER … RETURN VISITOR(v)
which will be more efficient