Comparing ArangoDB AQL to Neo4j Cypher
ArangoDB is a multi-model database, and one of its supported data models are graphs. If you come from Neo4j, this comparison should help you to get started with ArangoDB’s graph related features, but also demonstrate what else you can do with a native multi-model database like ArangoDB.
Language Models
Cypher is a query language solely focused on graphs, created by and primarily used in Neo4j. As you might already know, the pattern you want to find in the full graph is described in a visual way, like ASCII art. Around that, clauses inspired by SQL like WHERE
, ORDER BY
and others are used to process the data. It also covers data definition with the CREATE
keyword. The language can be classified as declarative, but less structured than SQL. There are also functions which can be called, like shortestPath()
.
In comparison, AQL is a full multi-model query language – encompassing document, relational, search and graph query capabilities. It was invented to overcome the limitations of SQL for dealing with schemaless data and the JSON document model. It enables multi-model queries with one language backed by a single database core.
What that means is that you can do, for example, a prefix search over multiple collections and fields (ArangoSearch), then a traversal from the found documents to neighbor nodes at a variable depth, then resolve values in the found documents by using a join, and all that in a single query at high speed. AQL is declarative, but also borrows concepts from programming languages. A lot of core functionality is based around the FOR
loop construct. There are also plenty of functions. CRUD operations are supported via the INSERT
, UPDATE
, REPLACE
and REMOVE
constructs, but collections and indices can’t be created or managed through AQL. It can be done in the server’s web interface, arangosh (an interactive shell we ship) or through the HTTP API instead.
Graph Database Concepts in ArangoDB
Naming convention comparison
Here is a quick overview of terms which describe similar concepts:
AQL | Cypher |
---|---|
vertex | node |
edge | relationship |
collection | (group of nodes) |
document | (node with properties) |
document collection | node label |
edge collection | relationship type |
attribute | property |
depth | hops |
array | list |
object | map |
While you can use arbitrary labels and types in Neo4j in an ad-hoc fashion, it is necessary to create collections in ArangoDB before you can insert vertices and edges into them. In ArangoDB you may create secondary indices on collections for faster lookup speeds. Collections can be organized in databases for multi-tenancy.
Keyword Comparison
The basic language constructs and their keywords in comparison:
AQL | Cypher |
---|---|
FOR … IN … RETURN | MATCH … RETURN |
FOR … IN | UNWIND |
FILTER | WHERE |
SORT | ORDER BY |
LIMIT count | LIMIT count |
LIMIT offset, count | SKIP offset LIMIT count |
OUTBOUND* | --> |
INBOUND* | <-- |
ANY | -- |
INSERT … INTO | CREATE |
UPDATE … IN | SET |
REPLACE … IN | SET |
REMOVE … IN | DELETE |
* in Cypher you express the edge direction as stored or you use two hyphens to traverse an edge either way. In AQL, you provide a start vertex and control the traversal direction with a keyword: OUTBOUND
to follow in edge direction, INBOUND
to follow in reverse direction or ANY
to traverse the edge regardless of the direction.
Example Data
We use a simple company graph for our comparison:
As you can see, edges point from superior to subordinate in our demonstration. Beside their names, we will also give them a job title (role) and an age:
Name | Role | Age |
---|---|---|
Ann | Boss | 42 |
Tracey | Developer Lead | 35 |
Josefina | Marketing Lead | 29 |
Sammy | Programmer | 35 |
Eryn | Frontend Developer | 51 |
Quinn | Graphics Designer | 42 |
Mark | Marketing Operations | 35 |
To store this data in ArangoDB, we use a trivial model:
- Our employee nodes will be stored in a document collection Employee
- The relations will be stored in an edge collection manages
Data Model in ArangoDB
A few remarks:
- Collections need to be created before data can be inserted into them. You can use the ArangoDB’s web interface to do so.
- Every collection has a primary index on a special property, the
_key
attribute. This index is automatically created and can not be removed. The_key
attribute stores the document key as string, which is unique within a collection. - There is a virtual attribute
_id
for stored documents, which is the concatenation of the collection name, a forward slash and the document key. It uniquely identifies a document within a database. - Edges are also documents in ArangoDB, but with special
_from
and_to
attributes which reference other documents (nodes). Because documents are JSON objects you may store arbitrary attributes on edges, including nested objects. - Edge collections have a special edge index built-in, which enables fast graph traversals. It indexes the
_from
and_to
attributes, which reference other documents using_id
values.
To try out the AQL queries presented below, get ArangoDB if you don’t have it already, then open its web interface, go to COLLECTIONS and create a document collection Employee and an edge collection manages. Then click on QUERIES and run the following query:
LET temp = (FOR e IN [ {"_key":"ann", name:"Ann", "role":"boss", "age":42}, {"_key":"tracey", name:"Tracey", "role":"lead developer", "age":35}, {"_key":"josefina", name:"Josefina", "role":"marketing manager", "age":29}, {"_key":"sammy", name:"Sammy", "role":"programmer", "age":35}, {"_key":"eryn", name:"Eryn", "role":"frontend developer", "age":51}, {"_key":"quinn", name:"Quinn", "role":"graphics designer", "age":42}, {"_key":"mark", name:"Mark", "role":"marketing operator", "age":35} ] INSERT e INTO Employee) FOR m IN [ {"_from": "Employee/ann", "_to": "Employee/tracey"}, {"_from": "Employee/ann", "_to": "Employee/josefina"}, {"_from": "Employee/tracey", "_to": "Employee/sammy"}, {"_from": "Employee/tracey", "_to": "Employee/eryn"}, {"_from": "Employee/josefina", "_to": "Employee/quinn"}, {"_from": "Employee/josefina", "_to": "Employee/mark"} ] INSERT m INTO manages
This will create the regular documents and the edge documents in the two collections.
Basic Traversals
The basic syntax for traversals in AQL is as follows:
Let us compare some queries so that you understand how it works.
Get the employees directly managed by Ann:
FOR v IN OUTBOUND "Employee/ann" manages RETURN v.name
MATCH (:Employee {name:'Ann'})-[:MANAGES]->(e:Employee) RETURN e.name
Result of AQL query:
[ "Tracey", "Josefina" ]
Find the superior of Tracey:
FOR v IN INBOUND "Employee/tracey" manages RETURN v.name
MATCH (e:Employee)-[:MANAGES]-> (:Employee {name:'Tracey'}) RETURN e.name
Result of AQL query:
[ "Ann" ]
Get the employees managed by Ann, directly and indirectly (up to two levels, which means the entire graph in our example):
FOR e IN 1..2 OUTBOUND "Employee/Ann" manages RETURN e.name
MATCH (:Employee {name:'Ann'})-[:MANAGES*1..2]-> (e:Employee) RETURN e.name
Result of AQL query:
[ "Tracey", "Sammy", "Eryn", "Josefina", "Quinn", "Mark" ]
Traversals in AQL default to a depth of 1, so FOR … IN OUTBOUND …
means the minimum and maximum number of hops will be 1. If you write FOR … IN 2 …
then the minimum as well as the maximum will be 2. To specify different values you write it as shown in above query. Traversals with an unlimited depth like in Cypher using an asterisk (*) is not supported in AQL, but you may set a very high maximum.
Pattern Matching
In ArangoDB, we call traversals with conditions pattern matching. Without conditions it would be a simple traversal, even though in Cypher every search may be considered a pattern matching.
Using the previous query, let us extend it with filter conditions. In below example, we want to find employees at least 30 and at most 35, managed by Ann directly or indirectly:
FOR e IN 1..2 OUTBOUND "Employee/ann" manages FILTER e.age >= 30 AND e.age < 40 RETURN {name: e.name, age: e.age}
MATCH (:Employee {name:'Ann'})-[:MANAGES*1..2]-> (e:Employee) WHERE e.age > 30 AND e.age <= 40 RETURN e.name
Result of AQL query:
[ { "name": "Tracey", "age": 35 }, { "name": "Sammy", "age": 35 }, { "name": "Mark", "age": 35 } ]
Shortest Path
We can determine the official channel for Quinn to pass a message on to Eryn by finding the shortest path between them. We follow in any direction, because the edge orientation changes midway at Ann. If you know that the direction doesn’t change on the paths you are interested, then use either directed traversals, so INBOUND
or OUTBOUND
in AQL.
FOR e IN ANY SHORTEST_PATH "Employee/quinn" TO "Employee/eryn" manages RETURN e.name
MATCH (quinn:Employee {name:'Quinn'}),(eryn:Employee {name:'Eryn'}), p=shortestPath((quinn)-[*]-(eryn)) UNWIND nodes(p) as n RETURN n.name
Result of AQL query:
[ "Quinn", "Josefina", "Ann", "Tracey", "Eryn" ]
Aggregation
AQL comes with a broad aggregation framework to group by one or multiple values. It can also be used to calculate things like an average value on the fly. The following example could also use a graph traversal, but for simplicity we just use all employee records we have to calculate the average age, rounded to a full number:
FOR e IN Employee COLLECT AGGREGATE avg = AVG(e.age) RETURN ROUND(avg)
MATCH (e:Employee) RETURN ROUND(AVG(e.age))
Result of AQL query:
[ 38 ]
Here is a simple example how to group by age and count how many employees are of the same age:
FOR e IN Employee COLLECT age = e.age WITH COUNT INTO count RETURN {age, count}
MATCH (e:Employee) RETURN e.age as age, COUNT(e.age) as count
Result of AQL query:
[ { "age": 29, "count": 1 }, { "age": 35, "count": 3 }, { "age": 42, "count": 2 }, { "age": 51, "count": 1 } ]