ArangoDB 3.2: RocksDB, Pregel, Fault Tolerant Foxx, Satellite Collections

July 20, 2017 General, Releases

We are pleased to announce the release of ArangoDB 3.2. Get it here. After an unusually long hackathon, we eliminated two large roadblocks, added a long overdue feature and integrated an interesting new one into this release. Furthermore, we’re proud to report that we increased performance of ArangoDB on average by 35%, while at the same time reduced the memory footprint compared to version 3.1. In combination with a greatly improved cluster management, we think ArangoDB 3.2 is by far our best work. (see release notes for more details)

One key goal of ArangoDB has always been to provide a rock solid platform for building ideas. Our users should always feel safe to try new things with minimal effort by relying on ArangoDB. Todays 3.2 release is an important milestone towards this goal. We’re excited to release such an outstanding product today.

RocksDB

With the integration of Facebook's RocksDB, as a first pluggable storage engine in our architecture, users can now work with as much data as fits on disk. Together with the better locking behavior of RocksDB (i.e., document-level locks), write intensive applications will see significant performance improvements. With no memory limit and only document-level locks, we have eliminated two roadblocks for many users. If one chooses RocksDB as the storage engine, everything, including indexes will persist on disk. This will significantly reduce start-up time.
See this how-to on “Comparing new RocksDB and mmfiles engine” to test the new engine for your operating system and use case.

Pregel

Distributed graph processing was a missing feature in ArangoDB’s graph toolbox. We’re willing to admit that, especially since we managed to fill this need by implementing the Pregel computing model.

With PageRank, Community Detection, Vertex Centrality Measures and further algorithms, ArangoDB can now be used to gain high-level insights into the hidden characteristics of graphs. For instance, you can use graph processing capabilities to detect communities. You can then use the results to shard your data efficiently to a cluster and thereby enable SmartGraph usage to its full potential. We’re confident that with the integration of distributed graph processing, users will now have one of the most complete graph toolsets available in a single database.

Test the new pregel integration with this Community Detection Tutorial and further sharpen advanced graph skills with this new tutorial about Using SmartGraphs in ArangoDB.

Fault-Tolerant Foxx Services

Many people already enjoy using our Foxx JavaScript framework for data-centric microservices. Defining your own highly configurable HTTP routes with full access to the ArangoDB core on the C++ level can be pretty handy. In version 3.2, our Foxx team completely rewrote the management internals to support fault-tolerant Foxx services. This ensures multi-coordinator clusters will always keep their services in sync, and new coordinators are fully initialized, even when all existing coordinators are unavailable.

Test the new fault-tolerant Foxx yourself or learn Foxx by following the brand new Foxx tutorial.

Powerful Graph Visualization

Managing and processing graph data may not be enough, causing visualizing insights to be important. No worries. With ArangoDB 3.2, this can be handled easily. You can use the open-source option via arangoexport to export the data and then import it into Cytoscape (check out the tutorial).

Or you can just plug in the brand new Keylines 3.5 via Foxx and install an on-demand connection. With this option, you will always have the latest data visualized neatly in Keylines without any export/import hassle. Just follow this tutorial to get started with ArangoDB and Keylines.

Read-Only Users

To enhance basic user management in ArangoDB, we added Read-Only Users. The rights of these users can be defined on database and collection levels. On the database level, users can be given administrator rights, read access or denied access. On the collection level, within a database, users can be given read/write, read only or denied access. If a user is not given access to a database or a collection, the databases and collections won’t be shown to that user. Take the tutorial about new User Management.

We also improved geo queries since this is becoming more important to our community. With geo_cursor, it’s now possible to sort documents by distance to a certain point in space (Take the tutorial). This makes queries simple like, “Where can I eat vegan in a radius of one mile around Times Square?” We plan to add support for other geo-spatial functions (e.g., polygons, multi-polygons) in the next minor release. So watch for that.

ArangoDB 3.2 Enterprise Edition: More Room for Secure Growth

The Enterprise Edition of ArangoDB is focused on solving enterprise-scale problems and secure work with data. In version 3.1, we introduced SmartGraphs to bring fast traversal response times to sharded datasets in a cluster. We also added auditing and enhanced encryption control. Download ArangoDB Enterprise Edition (forever free evaluation).

Working closely with one of our larger clients, we further explored and improved an idea we had about a year ago. Satellite Collections is the exciting result of this collaboration. It’s designed to enable faster join operations when working with sharded datasets. To avoid expensive network hops during join processing among machines, one has ‘only’ to find a solution to enable joins locally.

With Satellite Collections, you can define collections to shard to a cluster, as well as set collections to replicate to each machine. The ArangoDB query optimizer knows where each shard is located and sends requests to the DBServers involved, which then execute the query locally. The DBservers will then send the partial results back to the Coordinator which puts together the final result. With this approach, network hops during join operations on sharded collections can be avoided, hence query performance is increased and network traffic reduced. This can be more easily understood with an example. In the schema below, collection C is sharded to multiple machines, while the smaller satellites (i.e., S1 - S5) are replicated to each machine, orbiting the shards of C.

Use cases for Satellite Collection are plentiful. In this more in-depth blog post, we use the example of an IoT case. Personalized patient treatment based on genome sequencing analytics is another excellent example where efficient join operations involving large datasets, can help improve patient care and save infrastructure costs.

Security Enhancements

From the very beginning of ArangoDB, we have been concerned with security. AQL is already protected from injections. By using Foxx, sensitive data can be contained within in a database, with only the results being passed to other systems, thus minimizing security exposure. But this is not always enough to meet enterprise scale-security requirements. With version 3.1, we introduced Auditing and Enhanced Encryption Control and with ArangoDB 3.2, we added even more protection to safeguard data.

Encryption at Rest

With RocksDB, you can encrypt the data stored on disk using a highly secure AES algorithm. Even if someone steals one of your disks, they won’t be able to access the data. With this upgrade, ArangoDB takes another big step towards HIPAA compliance.

Enhanced Authentication with LDAP

Conclusion & You

The entire ArangoDB team is proud to release version 3.2 of ArangoDB -- this should not be a surprise considering all of the improvements we made. We hope you will enjoy the upgrade. We invite you to take ArangoDB 3.2 for a spin and to let us know what you think. We look forward to your feedback!
Download ArangoDB 3.2

More info...

ArangoDB 3.2 Beta: RocksDB Storage Engine & Distributed Graph Cluster

June 13, 2017 General, Releases

We’re excited to release today the beta of ArangoDB 3.2. It’s feature rich, well tested and hopefully plenty of fun for all of you. Keen to take it for a spin? Get ArangoDB 3.2 beta here.

With ArangoDB 3.2, we’re introducing the long-awaited pluggable storage engine and its first new citizen, RocksDB from Facebook

RocksDB: You can now use as much data in ArangoDB as you can fit on your disk. Plus, you can enjoy performance boosts on writes by having only document-level locks (more info below).
Pregel: Furthermore, we implemented distributed graph processing with Pregel for discovering hidden patterns, identify communities and perform in-depth analytics of large graph data sets.
ClusterFoxx: Another important upgrade is what we internally and playfully call the ClusterFoxx. The Foxx management internals have been rewritten from the ground up to make sure multi-coordinator cluster setups always keep their services in sync and new coordinators are fully initialised even when all existing coordinators are unavailable.
Enterprise: Working with some of our largest customers, we’ve added further security and scalability features to ArangoDB Enterprise like LDAP integration, Encryption at Rest, and the brand new Satellite Collections.

The goal of the whole ArangoDB 3 release cycle has been to scale the multi-model idea to new heights. Getting ‘ready’ for large scale applications is not done overnight and it’s definitely not possible without the help of a strong community. We’d like to invite all of you to lend us a helping hand to make ArangoDB 3.2 the best release ever. Please push this beta to its limits: test it for your use cases and compare the performance of the new features like RocksDB. Let us know on Github any bug that you find. Don’t worry about hurting our feelings: we want to fix any problems.

Join the Beta Bug Hunt Challenge and win a $200 Amazon Gift Card as first prize. You can find more details about this reward program at the end of this post.

New Storage Engine RocksDB

ArangoDB now comes with two storage engines: mmfiles and RocksDB. If you want to compare the engines, you can use arangodump to export data from either engine and arangorestore to import into the other. MMFILES are generally well suited for use-cases that fit into main memory, while RocksDB allows larger than memory work-sets.

RocksDB has plenty of configuration options; we have selected the general purpose options. Please let us know how it works for your use case so that we can further optimize the implementation. Also notice that we do many tests under Linux, Windows and macOS. However, we optimize for Linux. Any feedback regarding other operating systems is very welcome. Check out the step by step guide to compare both storage engines for your use case and OS!

Benefits of RocksDB Storage Engine:

Document-level locks: performance boost for write intensive applications. Writes don’t block reads, and reads don’t block writes
Support for large datasets: go beyond the limit of main memory and stay performant
Persistent indexes: faster index build after restart

Things to consider before switching to RocksDB

RocksDB allows concurrent writes: Write conflicts can be raised. Applications switching from MMFiles must be prepared for exceptions
Transaction Limit in RocksDB: The total size of transactions is limited in RocksDB. Modifying statements on large amounts of documents have to commit in between -- with AQL this is done by default.
Engine Selection on Server/Cluster Level: It’s not possible to mix both storage engines within a single instance or cluster installation. Transaction handling and write ahead log formats are different.

Find all important details about RocksDB in the storage engine documentation, as well as answers to common questions about our RocksDB integration in the FAQ.

Please note that ArangoDB 3.2 beta is fully tested, but not yet fully optimized (known-issues RocksDB). If you find something that is much slower with RocksDB compared to your current queries with the MMFiles engine, please create a github ticket. Please check the comparison guide here.

New Distributed Graph Processing

With the new implementation of distributed graph processing, you are now able to analyze even very large graph data sets as a whole. Internally, we implemented the pregel computing model to enable ArangoDB to support arbitrary graph algorithms, which will scale with your data -- or with the size of your database cluster.

Pregel Messages

You can already use a number of well-known graph algorithms:

PageRank
Weakly Connected Components
Strongly Connected Components
HITS (hubs and authorities)
Single-Source Shortest Path
Community Detection via Label Propagation
Vertex Centrality measures
- Closeness Centrality via Effective Closeness
- Betweenness Centrality via LineRank

By using these new capabilities, you are now able, for example, to detect communities within your graph, shard your data according to these communities and leverage ArangoDB SmartGraphs to perform highly efficient graph traversals even in a cluster setup.

New ClusterFoxx

Managing your JavaScript microservices is now easier and more reliable than ever before. The Foxx management internals have been rewritten to make sure multi-coordinator cluster setups always keep their services in sync and new coordinators are fully initialised even when all existing coordinators are unavailable.

Additionally, the new fully documented REST API for managing Foxx services enables you to install, upgrade and configure your services using your existing devops processes. And if your service only consists of a single JavaScript file, you can now forego the manifest and upload that file directly, instead of creating a full bundle.

Further useful new features included in ArangoDB 3.2 beta

geo_cursor: Get documents sorted by distance to a certain point in space. You can also apply filters and limits to geo_cursor.
arangoexport: Export data as JSON, JSONL and even graphs as XGMML for visualisation in Cytoscape. You can find details in the Alpha2 release post.

Download ArangoDB 3.2 beta Community

New Enterprise Edition Features in 3.2

The Enterprise Edition of ArangoDB is focused to solve enterprise-scale problems and meet high security standards. In version 3.1, we introduced SmartGraphs to bring fast traversal times to sharded graph datasets. With SatelliteCollections, we enable the same performance boost to join operations at scale.

SatelliteCollections

From genome-sequencing projects to massive online games and beyond, we see the need for join operations including sharded collections and sub-second response times.

With SatelliteCollections, you can define collections to shard to a cluster and collections to replicate to each machine. The ArangoDB query optimizer knows where each shard is located and sends the requests to the DBServers involved, which then executes the query, locally. With this approach, network hops during join operations on sharded collections can be avoided and response times can be close to that of a single instance.

In the example below, collection C is large and sharded to multiple machines while the smaller satellites (S1-S5) are replicated to each machine.

We are super excited to see what you will create with this new feature and welcome any feedback you can provide. The Enterprise Edition of ArangoDB is forever free for evaluation. So feel free to take it for a spin.