Comparing ArangoDB AQL to Neo4j Cypher

ArangoDB is a multi-model database, and one of its supported data models are graphs. If you come from Neo4j, this comparison should help you to get started with ArangoDB’s graph related features, but also demonstrate what else you can do with a native multi-model database like ArangoDB.

Language Models

Cypher is a query language solely focused on graphs, created by and primarily used in Neo4j. As you might already know, the pattern you want to find in the full graph is described in a visual way, like ASCII art. Around that, clauses inspired by SQL like WHERE, ORDER BY and others are used to process the data. It also covers data definition with the CREATE keyword. The language can be classified as declarative, but less structured than SQL. There are also functions which can be called, like shortestPath().

In comparison, AQL is a full multi-model query language – encompassing document, relational, search and graph query capabilities. It was invented to overcome the limitations of SQL for dealing with schemaless data and the JSON document model. It enables multi-model queries with one language backed by a single database core.

What that means is that you can do, for example, a prefix search over multiple collections and fields (ArangoSearch), then a traversal from the found documents to neighbor nodes at a variable depth, then resolve values in the found documents by using a join, and all that in a single query at high speed. AQL is declarative, but also borrows concepts from programming languages. A lot of core functionality is based around the FOR loop construct. There are also plenty of functions. CRUD operations are supported via the INSERT, UPDATE, REPLACE and REMOVE constructs, but collections and indices can’t be created or managed through AQL. It can be done in the server’s web interface, arangosh (an interactive shell we ship) or through the HTTP API instead.

Graph Database Concepts in ArangoDB

Naming convention comparison

Here is a quick overview of terms which describe similar concepts:

AQL	Cypher
vertex	node
edge	relationship
collection	(group of nodes)
document	(node with properties)
document collection	node label
edge collection	relationship type
attribute	property
depth	hops
array	list
object	map

While you can use arbitrary labels and types in Neo4j in an ad-hoc fashion, it is necessary to create collections in ArangoDB before you can insert vertices and edges into them. In ArangoDB you may create secondary indices on collections for faster lookup speeds. Collections can be organized in databases for multi-tenancy.

Keyword Comparison

The basic language constructs and their keywords in comparison:

AQL	Cypher
FOR … IN … RETURN	MATCH … RETURN
FOR … IN	UNWIND
FILTER	WHERE
SORT	ORDER BY
LIMIT count	LIMIT count
LIMIT offset, count	SKIP offset LIMIT count
OUTBOUND^*	-->
INBOUND^*	<--
ANY	--
INSERT … INTO	CREATE
UPDATE … IN	SET
REPLACE … IN	SET
REMOVE … IN	DELETE

* in Cypher you express the edge direction as stored or you use two hyphens to traverse an edge either way. In AQL, you provide a start vertex and control the traversal direction with a keyword: OUTBOUND to follow in edge direction, INBOUND to follow in reverse direction or ANY to traverse the edge regardless of the direction.

Example Data

We use a simple company graph for our comparison:

As you can see, edges point from superior to subordinate in our demonstration. Beside their names, we will also give them a job title (role) and an age:

Name	Role	Age
Ann	Boss	42
Tracey	Developer Lead	35
Josefina	Marketing Lead	29
Sammy	Programmer	35
Eryn	Frontend Developer	51
Quinn	Graphics Designer	42
Mark	Marketing Operations	35

To store this data in ArangoDB, we use a trivial model:

Our employee nodes will be stored in a document collection Employee
The relations will be stored in an edge collection manages

Data Model in ArangoDB

A few remarks:

Collections need to be created before data can be inserted into them. You can use the ArangoDB’s web interface to do so.
Every collection has a primary index on a special property, the _key attribute. This index is automatically created and can not be removed. The _key attribute stores the document key as string, which is unique within a collection.
There is a virtual attribute _id for stored documents, which is the concatenation of the collection name, a forward slash and the document key. It uniquely identifies a document within a database.
Edges are also documents in ArangoDB, but with special _from and _to attributes which reference other documents (nodes). Because documents are JSON objects you may store arbitrary attributes on edges, including nested objects.
Edge collections have a special edge index built-in, which enables fast graph traversals. It indexes the _from and _to attributes, which reference other documents using _id values.

To try out the AQL queries presented below, get ArangoDB if you don’t have it already, then open its web interface, go to COLLECTIONS and create a document collection Employee and an edge collection manages. Then click on QUERIES and run the following query:

LET temp = (FOR e IN [
 {"_key":"ann", name:"Ann", "role":"boss", "age":42},
 {"_key":"tracey", name:"Tracey", "role":"lead developer", "age":35},
 {"_key":"josefina", name:"Josefina", "role":"marketing manager", "age":29},
 {"_key":"sammy", name:"Sammy", "role":"programmer", "age":35},
 {"_key":"eryn", name:"Eryn", "role":"frontend developer", "age":51},
 {"_key":"quinn", name:"Quinn", "role":"graphics designer", "age":42},
 {"_key":"mark", name:"Mark", "role":"marketing operator", "age":35}
] INSERT e INTO Employee)
 
FOR m IN [
 {"_from": "Employee/ann", "_to": "Employee/tracey"},
 {"_from": "Employee/ann", "_to": "Employee/josefina"},
 {"_from": "Employee/tracey", "_to": "Employee/sammy"},
 {"_from": "Employee/tracey", "_to": "Employee/eryn"},
 {"_from": "Employee/josefina", "_to": "Employee/quinn"},
 {"_from": "Employee/josefina", "_to": "Employee/mark"}
] INSERT m INTO manages

This will create the regular documents and the edge documents in the two collections.

Basic Traversals

The basic syntax for traversals in AQL is as follows:

Let us compare some queries so that you understand how it works.

Get the employees directly managed by Ann:

FOR v IN OUTBOUND "Employee/ann" 
manages RETURN v.name

MATCH (:Employee
{name:'Ann'})-[:MANAGES]->(e:Employee) 
RETURN e.name

Result of AQL query:

[ "Tracey", "Josefina" ]

Find the superior of Tracey:

FOR v IN INBOUND "Employee/tracey" 
manages RETURN v.name

MATCH (e:Employee)-[:MANAGES]->
(:Employee {name:'Tracey'}) 
RETURN e.name

Result of AQL query:

[ "Ann" ]

Get the employees managed by Ann, directly and indirectly (up to two levels, which means the entire graph in our example):

FOR e IN 1..2 
OUTBOUND "Employee/Ann" manages 
RETURN e.name

MATCH (:Employee 
{name:'Ann'})-[:MANAGES*1..2]->
(e:Employee) 
RETURN e.name

Result of AQL query:

[
   "Tracey",
   "Sammy",
   "Eryn",
   "Josefina",
   "Quinn",
   "Mark"
]

Traversals in AQL default to a depth of 1, so FOR … IN OUTBOUND … means the minimum and maximum number of hops will be 1. If you write FOR … IN 2 … then the minimum as well as the maximum will be 2. To specify different values you write it as shown in above query. Traversals with an unlimited depth like in Cypher using an asterisk (*) is not supported in AQL, but you may set a very high maximum.

Pattern Matching

In ArangoDB, we call traversals with conditions pattern matching. Without conditions it would be a simple traversal, even though in Cypher every search may be considered a pattern matching.

Using the previous query, let us extend it with filter conditions. In below example, we want to find employees at least 30 and at most 35, managed by Ann directly or indirectly:

FOR e IN 1..2 OUTBOUND
"Employee/ann" manages 
FILTER e.age >= 30 AND e.age < 40 
RETURN {name: e.name, age: e.age}

MATCH (:Employee 
{name:'Ann'})-[:MANAGES*1..2]->
(e:Employee) 
WHERE e.age > 30 AND e.age <= 40 
RETURN e.name

Result of AQL query:

[
  { "name": "Tracey", "age": 35 },
  { "name": "Sammy", "age": 35 },
  { "name": "Mark", "age": 35 }
]

Shortest Path

We can determine the official channel for Quinn to pass a message on to Eryn by finding the shortest path between them. We follow in any direction, because the edge orientation changes midway at Ann. If you know that the direction doesn’t change on the paths you are interested, then use either directed traversals, so INBOUND or OUTBOUND in AQL.

FOR e IN ANY 
SHORTEST_PATH "Employee/quinn" 
TO "Employee/eryn" manages 
RETURN e.name

MATCH (quinn:Employee 
{name:'Quinn'}),(eryn:Employee 
{name:'Eryn'}), 
p=shortestPath((quinn)-[*]-(eryn)) 
UNWIND nodes(p) as n 
RETURN n.name

Result of AQL query:

[
  "Quinn",
  "Josefina",
  "Ann",
  "Tracey",
  "Eryn"
]

Aggregation

AQL comes with a broad aggregation framework to group by one or multiple values. It can also be used to calculate things like an average value on the fly. The following example could also use a graph traversal, but for simplicity we just use all employee records we have to calculate the average age, rounded to a full number:

FOR e IN Employee
COLLECT AGGREGATE avg = AVG(e.age)
RETURN ROUND(avg)

MATCH (e:Employee)
RETURN ROUND(AVG(e.age))

Result of AQL query:

[ 38 ]

Here is a simple example how to group by age and count how many employees are of the same age:

FOR e IN Employee
COLLECT age = e.age WITH COUNT INTO count
RETURN {age, count}

MATCH (e:Employee)
RETURN e.age as age, 
COUNT(e.age) as count

Result of AQL query:

[
  { "age": 29, "count": 1 },
  { "age": 35, "count": 3 },
  { "age": 42, "count": 2 },
  { "age": 51, "count": 1 }
]

Comparing ArangoDB AQL to Neo4j Cypher

Language Models

Graph Database Concepts in ArangoDB

Naming convention comparison

Keyword Comparison

Example Data

Data Model in ArangoDB

Basic Traversals

Pattern Matching

Shortest Path

Aggregation

Quick Links

Info

About Us

Stay In Touch