ArangoDB 3.12 – Performance for all Your Data Models

March 27 2024,/ArangoGraphML, Graphs

Estimated reading time: 6 minutes

We are proud to announce the GA release of ArangoDB 3.12!

Congrats to the team and community for the latest ArangoDB release 3.12! ArangoDB 3.12 is focused on greatly improving performance and observability both for the core database and our search offering. In this blog post, we will go through some of the most important changes to ArangoDB and give you an idea of how this can be utilized in your products.

Just in case you prefer to try ArangoDB 3.12 directly rather than just reading about it, you can either download the Community Version orEnterprise Trial, pull our docker images, or head over to our Managed Service ArangoGraphfor a free trial.

Improved memory accounting and usage

Version 3.12 features multiple improvements to the observability of ArangoDB deployments. Memory usage is more accurately tracked and additional metrics have been added for monitoring the memory consumption.

Note that AQL queries may now report a higher memory usage and thus run into memory limits sooner.

This AQL efficiently identifies accounts involved in a suspicious chain of transactions originating from a flagged account, considering the rapidity and sequence of these transactions.

The RocksDB block cache metric rocksdb_block_cache_usage now also includes the memory used for table building, table reading, file metadata, flushing and compactions by default.

Furthermore, the memory usage of some subsystems has been optimized. When dropping a database, all contained collections are now marked as dropped immediately. Ongoing operations on these collections can be stopped earlier, and memory for the underlying collections and indexes can be reclaimed sooner. Memory used for index selective estimates is now also released early. ArangoSearch has a smaller memory footprint for removal operations now.

All these changes together will make 3.12 much more stable and resilient against out-of-memory situations, in particular in resource constraint situations like containerized deployment, since memory scarcity is detected earlier and handled more gracefully.

Parallel execution within an AQL query

The new async-prefetch optimizer rule allows certain operations of a query to asynchronously prefetch the next batch of data while processing the current batch, allowing parts of the query to run in parallel. This will lead to performance improvements if there is still reserve (scheduler) capacity.

The new Par column in a query explain output shows which nodes of a query are eligible for asynchronous prefetching. Write queries, graph execution nodes, nodes inside subqueries, LIMIT nodes and their dependencies above, as well as all query parts that include a RemoteNode are not eligible.

The profiling output for queries includes a new Par column as well, but it shows the number of successful parallel asynchronous prefetch calls.

Improved joins

The AQL optimizer now automatically recognizes opportunities for improving local joins (e.g. using smart-joins or satellite collection) using the merge join algorithm. Queries containing segments of two or more index scans local to a database server can now be optimized, if the filter conditions are eligible.

Multi-dimensional indexes

The previously experimental ZKD index type is now stable and has been renamed to MDI. Existing indexes keep the ZKD type.

Multi-dimensional indexes can now be declared as sparse to exclude documents from the index that do not have the defined attributes or if they are explicitly set to null values. If a value other than null is set, it still needs to be numeric.

Multi-dimensional indexes now support storedValues to cover queries for better performance.

An additional MDI-prefixed index variant has been added that lets you specify additional attributes for the index to narrow down the search space using equality checks. This can, for example, be used as a vertex-centric index for graph traversals, if created on an edge collection with the first attribute in prefixFields set to _from or _to.

WAND optimization (Enterprise Edition)

For ArangoSearch Views and inverted indexes (and by extension search-alias Views), you can define a list of sort expressions you want to optimize. This is also known as WAND optimization.

If you query a View with the SEARCH operation in combination with a SORT and LIMIT operation, search results can be retrieved faster if the SORT expression matches one of the optimized expressions.

Only sorting by highest rank is supported, that is, sorting by the result of a scoring function in descending order (DESC).

SEARCH parallelization

Search queries can now be parallelized across segments using multiple threads. This helps to speed up many queries. The effect is particularly spectacular if not all search data is cached in RAM, since then reading the data from disk or SSD is the bottleneck for the query. We have seen speedups of 16x in such situations because the parallelization helps to better use the available I/O bandwidth.

Other notable features

Wildcard Analyzer
multi_delimiter Analyzer
External versioning support
Filter matching syntax for UPSERT operations
readOwnWrites option for UPSERT operations
Added AQL functions:
- PARSE_COLLECTION()
- PARSE_KEY()
- REPEAT()
- TO_CHAR()
- RANDOM()
Improved late document materialization
Transparent compression of requests and responses between ArangoDB servers and client tools (to save network bandwidth between availability zones)

Why are changes being made?

ArangoDB constantly focuses on enhancing the product and platform offering and rationalizing the product feature set based on usage, supportability, and long-term applicability to the ecosystem. We enhance and maintain the features and functions that add the most value to our community and customers.

To this end, we have conducted an extensive analysis of the core and supplementary system components and have strategically decided to improve stability, performance, sustainability, and reliability while simultaneously reducing the product’s complexity.

What is being changed?

The following is a summary of changes in the upcoming 3.12 release¹.

Platform Support: Linux will be the primary supported platform for native binaries. Any other platforms, like Windows and macOS, can continue to leverage ArangoDB via containerization.
Deployment options: Deployment options will be simplified to exclude Active Failover mode using multiple single servers. High availability via a cluster deployment (3 or more servers) will be the standard deployment route for failover needs.
Pregel: ArangoDB’s Data Science Service will replace Pregel.
Supplementary Features: DC2DC (ArangoSync) and LDAP will no longer be supported due to low demand, while JavaScript transactions will be deprecated.
VelocyStream Support(VST): HTTP/2 support will replace VelocyStream

I am an active user of ArangoDB on Windows and/or macOS, for development use, what should I do?

ArangoDB continues to support Linux containers on Windows and macOS. You can continue to use the ArangoDB server and clients on your Windows via Windows Subsystem for Linux (WSL2 and Docker/Podman Desktop) and macOS (Docker/Podman) machines using Linux Docker containers.

Customers using the ArangoDB Starter on Windows or macOS can use kube-arangodb to orchestrate the operation and management of ArangoDB containers.

Are the alternatives available either now or coming in the future to some of the server-side changes?

Currently, the following alternatives exist:

Active Failover: Customers are recommended to move to a OneShard cluster deployment available in the Enterprise Edition.
Pregel: ArangoDB is currently working on a separate Data Science Service that will replace the features provided by Pregel, with higher performance and enhanced scalability. The Data Science Service is expected to be in limited availability until the end of Q1 2024
DC2DC: Currently, no immediate replacement for DC2DC is planned, though we anticipate an improved alternative in the future. However, we do not have an ETA for it at this time. In the meantime, customers can use third-party solutions for host-based, hypervisor-based, or storage-array-based replication.
JavaScript Transactions: You can use Stream Transactions and, in some cases, AQL
LDAP: Use the built-in authentication

What should customers who use VelocyStream (VST) do?

Customers currently using VelocyStream should migrate to using HTTP/2 in their respective SDK.

Summary of changes:

#	Feature	Alternatives
Platform Support
1.	Native² Windows support (both client and server)	Containerization
2.	Native² macOS support (both client and server)	Containerization
3.	Starter Support for Docker and arangod	kube-arangodb orchestration
Server Side
4.	Active Failover	OneShard / Cluster
5.	Pregel	Data Science Service
6.	DC2DC (ArangoSync)	Third-party host or disk array solutions
7.	LDAP	Internal authentication and authorization
SDK / Drivers / Client Libraries:
8.	VelocyStream (VST)	HTTP/2
Transaction Support
9.	JavaScript Transactions	AQL and Stream Transactions

Note:

These changes do not impact the support and availability of features on any existing and supported (non-EoL) versions of ArangoDB prior to 3.12
Only native support is being dropped on Windows and macOS, you can use containers to run and deploy ArangoDB on either Windows, macOS, or Linux.

Learn more

Watch our release webinar to learn more about ArangoDB 3.12. Click to Watch.

Carsten Tang

March 27 2024,Carsten Tang

Fireside Chat – Powering GenAI: The Critical Foundations for Scale. Watch Now