Introducing ArangoDB 3.8 – Graph Analytics at Scale

July 29, 2021/General, Releases

Estimated reading time: 5 minutes

We are proud to announce the GA release of ArangoDB 3.8!

With this release, we improve many analytics use cases we have been seeing – both from our customers and open-source users – with the addition of new features such as AQL window operations, graph and Geo analytics, as well as new ArangoSearch functionality.

If you want to get your hands on ArangoDB 3.8, you can either download the Community or Enterprise Edition, pull our Docker images, or start a free trial of our managed service ArangoGraph.

As with any release, ArangoDB 3.8 comes with many improvements, bug fixes, and features. Feel free to browse through the complete feature list in the release notes to appreciate all the work which has gone into this release.

In this blog post, we want to focus on some of the highlights including AQL Window Operations, Weighted Graph Traversals, Pipeline Analyzer and Geo Support in ArangoSearch.

AQL Window Operations

The WINDOW keyword can be used for aggregations over related rows, usually preceding and / or following rows.

The WINDOW operation performs a COLLECT AGGREGATE-like operation on a set of query rows. However, whereas a COLLECT operation groups multiple query rows into a single result group, a WINDOW operation produces a result for each query row:

The row for which function evaluation occurs is called the current row
The query rows related to the current row over which function evaluation occurs comprise the window frame for the current row

There are two syntax variants for WINDOW operations:

Row-based (evaluated across adjacent documents)
Range-based (evaluated across value or duration range)

Weighted Graph Traversals

Graph traversals in ArangoDB 3.8 support a new traversal type, "weighted", which enumerates paths by increasing weights.

The cost of an edge can be read from an attribute which can be specified with the weightAttribute option.

FOR x, v, p IN 0..10 OUTBOUND "places/York" GRAPH "kShortestPathsGraph"
  OPTIONS {
    order: "weighted",
    weightAttribute: "travelTime",
    uniqueVertices: "path"
  }

As the previous traversal option bfs was deprecated, the new preferred way to start a breadth-first search from now on is with order: "bfs". The default remains depth-first search if no order is specified, but can also be explicitly requested with order: "dfs".

ArangoSearch Pipeline & AQL Analyzers

ArangoSearch added a new Analyzer type, "pipeline", for chaining effects of multiple Analyzers into one. This allows for example to combine text normalization for a case insensitive search with n-gram tokenization, or to split text at multiple delimiting characters followed by stemming.

Furthermore, the new Analyzer type “aql"is capable of running an AQL query (with some restrictions) to perform data manipulation/filtering. For example, a user can define a soundex analyzer for phonetically similar term search:

arangosh> var a = analyzers.save("soundex", "aql", { queryString: "RETURN SOUNDEX(@param)" }, ["frequency", "norm", "position"]);

Note that the query must not access the storage engine. This means no FOR loops over collections or Views, no use of the DOCUMENT() function and no graph traversals.

Enhanced Geo support in ArangoSearch

While AQL has supported Geo indexing and functions for a long time, ArangoDB 3.8 adds Geo support also to ArangoSearch with the GeoJSON and GeoPoint analyzer and respective ArangoSearch Geo functions:

Geo_Contains()
Geo_Distance()
Geo_In_Range()
Geo_Intersects()

NB: Check out the community ArangoBnB project to learn more about Geo capabilities in ArangoSearch.

Improved Replication Protocol

For collections created with ArangoDB 3.8, a new internal data format is used that allows for a very fast synchronization of differences between the leader and a follower that is trying to reconnect.

The new format used in 3.8 is based on Merkle trees, making it more efficient to pin-point the data differences between the leader and a follower that is trying to reconnect.

The algorithmic complexity of the new protocol is determined by the amount of differences between the leader and follower shard data, meaning that if there are no or very few differences, the getting-in-sync protocol will run very fast. In previous versions of ArangoDB, the complexity of the protocol was determined by the number of documents in the shard, and the protocol required a scan over all documents in the shard on both the leader and the follower to find the differences.

The new protocol is used automatically for all collections/shards created with ArangoDB 3.8. Collections/shards created with earlier versions will use the old protocol, which is still fully supported. Note that such “old” collections will only benefit from the new protocol if the collections are logically dumped and recreated/restored using arangodump and arangorestore.

Other notable features

UI and Visualizer Improvements
Default per-query and optional global memory limits
Metrics 2.0 version and user specific Grafana dashboards
Arangodump improvements
JavaScript security
AQL bit functions
Sorting performance improvements
Agency Memory consumption
Hardware acceleration for Encryption

Upgrade

Upgrading to ArangoDB 3.8 can be performed with zero downtime following the upgrade instructions for your respective deployment option. Please note our recent update advisory and update either to a newer 3.6/3.7 version or 3.8 if you are running an affected version.

ArangoGraph

The easiest way to give ArangoDB 3.8 a spin is ArangoGraph, ArangoDB’s managed service in the cloud.

Feedback

Feel free to provide any feedback either via our Slack channel or mailing list.

Special Edition Lunch Session

Join Simran Spiller on August 4th for a special Graph and Beyond Lunch Session #15.5 – Aggregating Time-Series Data with AQL.

The new WINDOW operation added to AQL in ArangoDB 3.8 allows you to compute running totals, rolling averages, and other statistical properties of your sensor, log, and other data. You can aggregate adjacent documents (or rows if you will), as well as documents in value or duration ranges with a sliding window.

In this lunch and learn session, we will take a look at the two syntax variants of the WINDOW operation and go over a few examples queries with visual explanations.

Hear More from the Author

Graph Analytics with ArangoDB

ArangoML

Continue Reading

Introducing Developer Deployments on ArangoDB ArangoGraph

ArangoBnB: Data Preparation Case Study

C++ Memory Model: Migrating from X86 to ARM

Joerg Schad

Jörg Schad is our CTO. In a previous life, he worked on Machine Learning Infrastructure in health care, distributed systems at Mesosphere, implemented distributed and in-memory databases, and conducted research in the Hadoop and Cloud area. He’s a frequent speaker at meetups, international conferences, and lecture halls.

July 29, 2021,Joerg Schad