Speeding Up Array Operations: ArangoDB Performance Tips
Last week some further optimization slipped into 2.6. The optimization can provide significant speedups in AQL queries using huge array/object bind parameters and passing them into V8-based functions.
It started with an ArangoDB user reporting a specific query to run unexpectedly slow. The part of the query that caused the problem was simple and looked like this:
FOR doc IN collection
FILTER doc.attribute == @value
RETURN TRANSLATE(doc.from, translations, 0)
In the original query, translations
was a big, constant object literal. Think of something like the following, but with a lot more values:
{
"p1" : 1,
"p2" : 2,
"p3" : 40,
"p4" : 9,
"p5" : 12
}
The translations were used for replacing an attribute value in existing documents with a lookup table computed outside the AQL query.
The number of values in the translations
object was varying from query to query, with no upper bound on the number of values. It was possible that the query was running with 50,000 lookup values in the translations
object.
Performance Comparison: ArangoDB vs MongoDB, Neo4j, OrientDB
My recent blog post “Native multi-model can compete” has sparked considerable interest on HN and other channels. As expected, the community has immediately suggested improvements to the published code base and I have already published updated results several times (special thanks go to Hans-Peter Grahsl, Aseem Kishore, Chris Vest and Michael Hunger).
Please note: An update is available (June ’15) and a new performance test with PostgreSQL added.
Here are the latest figures and diagrams:
The aim of the exercise was to show that a multi-model database can successfully compete with special players on their own turf with respect to performance and memory consumption. Therefore it is not surprising that quite a few interested readers have asked, whether I could include OrientDB, the other prominent native multi-model database.
Working with ArangoDB: Insights from Francis at Boostport
As an open-source project we are always happy when we learn about new projects that use ArangoDB and we are thankful for any feedback; on how working with ArangoDB and/or interacting with the team – has helped your projects to develop. If you have a story you want to share, please get in touch.
Recently we have received a nice feedback from Francis (Boostport) that reached us with the launch of his new product. Parts of Boostport are realized using Foxx-JavaScript extensions on ArangoDB:
“I really enjoyed working on ArangoDB. It’s very stable, well-documented and the API docs are very clear, as I use the REST api. Support is also top-notch. Bugs were often fixed hours or days after discovery and in one case, after I submitted an enhancement request for custom AQL functions, Jan implemented them over the next few days.
For me, the most important feature is being able to build custom AQL functions in javascript. This allowed me to easily perform analytics on social data, generate the appropriate views and send them back to the client for consumption. Finally, I also really liked how minimal configuration is needed to get it up and running. As a plus, I like how you can set up replication using the REST interface. :)”
Public Key Infrastructure: Setup Guide for Debian & Ubuntu
We want to have a full chain of trust for our debian packages. Therefore the Suse Open Build Service (OBS) service signs them. We publish the key alongside the repository.
However, one can do better and do the validation right on apt-get install arangodb
. Here’s how: (more…)
Multi-Model Benchmark: Assessing ArangoDB’s Versatility
Claudius Weinberger, CEO ArangoDB
TL;DR Native multi-model databases combine different data models like documents or graphs in one tool and even allow to mix them in a single query. How can this concept compete with a pure document store like MongoDB or a graph database like Neo4j? I myself and a lot of folks in the community asked that question.
So here are some benchmark results: 100k reads → competitive; 100k writes → competitive; friends-of-friends → superior; shortest-path → superior; aggregation → superior.
Feel free to comment, join the discussion on HN and contribute – it’s all on Github.
Getting Unique Values: Efficient Data Retrieval in ArangoDB
While paging through the issues in the ArangoDB issue tracker I came across issue #987, titled “Trying to get distinct document attribute values from a large collection fails”.
The issue was opened around 10 months ago when ArangoDB 2.2 was around. We improved AQL performance somewhat since then, so I was eager to see how the query would perform in ArangoDB 2.6, especially when comparing it to 2.2.
For reproduction I quickly put together some example data to run the query on:
var db = require("org/arangodb").db;
var c = db._create("test");
for (var i = 0; i < 4 * 1000 * 1000; ++i) {
c.save({ _key: "test" + i, value: (i % 100) });
}
require("internal").wal.flush(true, true);
ArangoDB 2.6 Alpha3: Testing New Features & Performance
The 2.6 release preparations are on track: with a 3rd alpha release available for testing purposes today. Please download the latest alpha build and provide us your valuable feedback.
We put great efforts in speeding-up core ArangoDB functionality to make AQL queries perform much better than in earlier versions of ArangoDB.
The queries that improved most in 2.6 over 2.5 include:
FILTER
conditions: simpleFILTER
conditions we’ve tested are 3 to 5 times faster- simple joins using the primary index (
_key
attribute), hash index or skiplist index are 2 to 3.5 times faster - sorting on a string attribute is 2.5 to 3 times faster
- extracting the
_key
or other top-level attributes from documents is 4 to 5 times faster COLLECT
statements: simpleCOLLECT
statements we’ve tested are 7 to 15 times faster
More details on the performance improvements and the test-setup will be published in a follow-up blog post. For now, try out 2.6 alpha3 version – we’ve done our very best to make ArangoDB a lot faster. ; )
What’s new in ArangoDB 2.6
For a full list of changes and improvements please consult the change-log. Over the next week we might also add some more functionality to 2.6, mainly some improvements in the shortest-path implementation and other graph related AQL queries.
(more…)
MERII Hummingbird A80 Optimus Cluster: ArangoDB Deployment
For running ArangoDB in clusters doing performance tests we wanted to have a non virtualized set of descent hardware with fast ethernet connection, enough RAM (since thats what Arango needs) and multicore CPU. Since you need a bunch of them, cheap ARM devel boards come to mind. The original Raspberry PI (we have those) is out of the game due to V8 is not supporting it anymore. The now available PI 2 doesn’t cut it, since its ethernet NIC is connected via USB (as on the original PI). The Odroid series only have one of both: Fast ethernet or enough RAM. The Cubieboard 4 wasn’t available yet, but its Allwinner A80 SOC seemed a good choice. Then we met the Merii Optimus board, which seems to be almost the same as the PCDuino (now renamed to Arches) with the A80. While we got a bunch of them for a descent price over at Pollin, the upstream support wasn’t that good.
However, with some help of the SunXi-Linux Project we started flashing OS images to replace the preloaded Android image with the Merii Linux image. Since the userland of the Merii image is pretty sparse, we wanted something more useable. There is already a how-to on running Ubuntu which requires running a Windows host. We prefer a Linux host and want to run a Debian. Since the new Pi2 is also able to run regular Debian with ArmV7, we pick the root fs
from sjoerd.