ArangoDB 2.6 New Release: Enhanced Features & Performance

June 23 2015,/Releases

We are proud to announce the latest release of ArangoDB with lot’s of improvements and many new features. ArangoDB 2.6 is available for download for many different operating systems. In the new release the focus is on performance improvements. For instance sorting on a string attribute is up to 3 times faster. There are also improvements in the shortest-path implementation and other graph related AQL queries.

Look at some of our previous blogposts like: Reusable Foxx Apps with Configurations, Document your Foxx Apps with Swagger 2 or the Improved System User Authentication to learn more about ArangoDB 2.6 and check the manual for a deeper dive into specific features.

Claudius, CEO: “The performance improvements in every area of ArangoDB 2.6 make ArangoDB an effective alternative to other databases. I am very proud of the product and the team, and we can expect much more in the next few months.”

Some of the performance improvements:

FILTER conditions: simple FILTER conditions we’ve tested are 3 to 5 times faster
simple joins using the primary index (_key attribute), hash index or skiplist index are 2 to 3.5 times faster
extracting the _key or other top-level attributes from documents is 4 to 5 times faster
COLLECT statements: simple COLLECT statements we’ve tested are 7 to 15 times faster

Max, Software Architect: “With ArangoDB 2.6 we accelerate some of our key areas significantly. Everybody benefits from these improvements, especially people who like to use more complex and ambitious queries. Our latest benchmark already shows this because we have used a preview of ArangoDB 2.6.”

Please give ArangoDB 2.6 a try and provide us with your valuable feedback.

Features and Improvements

front-end: display of query execution time
front-end: demo page added. only working if demo data is available.
front-end: renamed query submit to execute
front-end: added query explain feature
removed startup option --log.severity
added optional limit parameter for AQL function FULLTEXT
make fulltext index also index text values that are contained in direct sub-objects of the indexed attribute.

Previous versions of ArangoDB only indexed the attribute value if it was a string. Sub-attributes of the index attribute were ignored when fulltext indexing.

Now, if the index attribute value is an object, the object’s values will each be included in the fulltext index if they are strings. If the index attribute value is an array, the array’s values will each be included in the fulltext index if they are strings.

For example, with a fulltext index present on the translations attribute, the following text values will now be indexed:

var c = db._create("example"); c.ensureFulltextIndex("translations"); c.insert({ translations: { en: "fox", de: "Fuchs", fr: "renard", ru: "лиса" } }); c.insert({ translations: "Fox is the English translation of the German word Fuchs" }); c.insert({ translations: [ "ArangoDB", "document", "database", "Foxx" ] });

c.fulltext("translations", "лиса").toArray(); // returns only first document c.fulltext("translations", "Fox").toArray(); // returns first and second documents c.fulltext("translations", "prefix:Fox").toArray(); // returns all three documents

added batch document removal and lookup commands:

collection.lookupByKeys(keys) collection.removeByKeys(keys)

These commands can be used to perform multi-document lookup and removal operations efficiently from the ArangoShell. The argument to these operations is an array of document keys.

Also added HTTP APIs for batch document commands:

PUT /_api/simple/lookup-by-keys
PUT /_api/simple/remove-by-keys
properly prefix document address URLs with the current database name for calls to API method GET /_api/document?collection=... (that method will return partial URLs to all documents in the collection).Previous versions of ArangoDB returned the URLs starting with /_api/ but without the current database name, e.g. /_api/document/mycollection/mykey. Starting with 2.6, the response URLs will include the database name as well, e.g. /_db/_system/_api/document/mycollection/mykey.
subquery optimizations for AQL queriesThis optimization avoids copying intermediate results into subqueries that are not required by the subquery.
return value optimization for AQL queriesThis optimization avoids copying the final query result inside the query’s main ReturnNode.
allow @ and . characters in document keys, tooThis change also lead to document keys being URL-encoded when returned in HTTP location response headers.
added alternative implementation for AQL COLLECTThe alternative method uses a hash table for grouping and does not require its input elements to be sorted. It will be taken into account by the optimizer for COLLECT statements that do not use an INTO clause.
In case a COLLECT statement can use the hash table variant, the optimizer will create an extra plan for it at the beginning of the planning phase. In this plan, no extra SORT node will be added in front of the COLLECT because the hash table variant of COLLECT does not require sorted input. Instead, a SORT node will be added after it to sort its output. This SORT node may be optimized away again in later stages. If the sort order of the result is irrelevant to the user, adding an extra SORT null after a hash COLLECT operation will allow the optimizer to remove the sorts altogether.

In addition to the hash table variant of COLLECT, the optimizer will modify the original plan to use the regular COLLECT implementation. As this implementation requires sorted input, the optimizer will insert a SORT node in front of the COLLECT. This SORT node may be optimized away in later stages.

The created plans will then be shipped through the regular optimization pipeline. In the end, the optimizer will pick the plan with the lowest estimated total cost as usual. The hash table variant does not require an up-front sort of the input, and will thus be preferred over the regular COLLECT if the optimizer estimates many input elements for the COLLECT node and cannot use an index to sort them.

The optimizer can be explicitly told to use the regular sorted variant of COLLECT by suffixing a COLLECT statement with OPTIONS { "method" : "sorted" }. This will override the optimizer guesswork and only produce the sorted variant of COLLECT.
re-factored cursor HTTP REST API for cursorsThe HTTP REST API for cursors (/_api/cursor) has been refactored to improve its performance and use less memory.
A post showing some of the performance improvements can be found here: http://jsteemann.github.io/blog/2015/04/01/improvements-for-the-cursor-api/
simplified return value syntax for data-modification AQL queriesArangoDB 2.4 since version allows to return results from data-modification AQL queries. The syntax for this was quite limited and verbose:
FOR i IN 1..10 INSERT { value: i } IN test LET inserted = NEW RETURN inserted

The LET inserted = NEW RETURN inserted was required literally to return the inserted documents. No calculations could be made using the inserted documents.

This is now more flexible. After a data-modification clause (e.g. INSERT, UPDATE, REPLACE, REMOVE, UPSERT) there can follow any number of LET calculations. These calculations can refer to the pseudo-values OLD and NEW that are created by the data-modification statements.

This allows returning projections of inserted or updated documents, e.g.:

FOR i IN 1..10 INSERT { value: i } IN test RETURN { _key: NEW._key, value: i }

Still not every construct is allowed after a data-modification clause. For example, no functions can be called that may access documents.

More information can be found here: http://jsteemann.github.io/blog/2015/03/27/improvements-for-data-modification-queries/
added AQL UPSERT statementThis adds an UPSERT statement to AQL that is a combination of both INSERT and UPDATE / REPLACE. The UPSERT will search for a matching document using a user-provided example. If no document matches the example, the insert part of the UPSERT statement will be executed. If there is a match, the update / replace part will be carried out:
UPSERT { page: ‘index.html’ } /* search example / INSERT { page: ‘index.html’, pageViews: 1 } / insert part / UPDATE { pageViews: OLD.pageViews + 1 } / update part */ IN pageViews

UPSERT can be used with an UPDATE or REPLACE clause. The UPDATE clause will perform a partial update of the found document, whereas the REPLACE clause will replace the found document entirely. The UPDATE or REPLACE parts can refer to the pseudo-value OLD, which contains all attributes of the found document.

UPSERT statements can optionally return values. In the following query, the return attribute found will return the found document before the UPDATE was applied. If no document was found, found will contain a value of null. The updated result attribute will contain the inserted / updated document:

UPSERT { page: ‘index.html’ } /* search example / INSERT { page: ‘index.html’, pageViews: 1 } / insert part / UPDATE { pageViews: OLD.pageViews + 1 } / update part */ IN pageViews RETURN { found: OLD, updated: NEW }

A more detailed description of UPSERT can be found here: http://jsteemann.github.io/blog/2015/03/27/preview-of-the-upsert-command/
adjusted default configuration value for --server.backlog-size from 10 to 64.
issue #1231: bug xor feature in AQL: LENGTH(null) == 4This changes the behavior of the AQL LENGTH function as follows:
if the single argument to LENGTH() is null, then the result will now be 0. In previous versions of ArangoDB, the result of LENGTH(null) was 4.
if the single argument to LENGTH() is true, then the result will now be 1. In previous versions of ArangoDB, the result of LENGTH(true) was 4.
if the single argument to LENGTH() is false, then the result will now be 0. In previous versions of ArangoDB, the result of LENGTH(false) was 5.The results of LENGTH() with string, numeric, array object argument values do not change.
- issue #1298: Bulk import if data already exists (#1298)
This change extends the HTTP REST API for bulk imports as follows:

When documents are imported and the _key attribute is specified for them, the import can be used for inserting and updating/replacing documents. Previously, the import could be used for inserting new documents only, and re-inserting a document with an existing would have failed with a unique key constraint violated error.

The above behavior is still the default. However, the API now allows controlling the behavior in case of a unique key constraint error via the optional URL parameter onDuplicate.

This parameter can have one of the following values:
error: when a unique key constraint error occurs, do not import or update the document but report an error. This is the default.
update: when a unique key constraint error occurs, try to (partially) update the existing document with the data specified in the import. This may still fail if the document would violate secondary unique indexes. Only the attributes present in the import data will be updated and other attributes already present will be preserved. The number of updated documents will be reported in the updated attribute of the HTTP API result.
replace: when a unique key constraint error occurs, try to fully replace the existing document with the data specified in the import. This may still fail if the document would violate secondary unique indexes. The number of replaced documents will be reported in the updated attribute of the HTTP API result.
ignore: when a unique key constraint error occurs, ignore this error. There will be no insert, update or replace for the particular document. Ignored documents will be reported separately in the ignored attribute of the HTTP API result.The result of the HTTP import API will now contain the attributes ignored and updated, which contain the number of ignored and updated documents respectively. These attributes will contain a value of zero unless the onDuplicate URL parameter is set to either update or replace (in this case the updated attribute may contain non-zero values) or ignore (in this case the ignored attribute may contain a non-zero value).
To support the feature, arangoimp also has a new command line option --on-duplicate which can have one of the values error, update, replace, ignore. The default value is error.

A few examples for using arangoimp with the --on-duplicate option can be found here: http://jsteemann.github.io/blog/2015/04/14/updating-documents-with-arangoimp/
changed behavior of db._query() in the ArangoShell:if the command’s result is printed in the shell, the first 10 results will be printed. Previously only a basic description of the underlying query result cursor was printed. Additionally, if the cursor result contains more than 10 results, the cursor is assigned to a global variable more, which can be used to iterate over the cursor result.
Example:

arangosh [_system]> db._query(“FOR i IN 1..15 RETURN i”) [object ArangoQueryCursor, count: 15, hasMore: true]

[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]

type ‘more’ to show more documents

arangosh [_system]> more [object ArangoQueryCursor, count: 15, hasMore: false]

[ 11, 12, 13, 14, 15 ]
Breaking Changes:
AQL command: GRAPH_SHORTEST_PATH now only returns IDs and does not extract data any more. It yields an additional option includeData which is an object, taking exactly two keys:
edges set to true will extract all data stored alongside the edges of each path.
vertices set to true will extract all data stored alongside the vertices of each path. The default value of these parameters has been set to false
JS Module: general-graph: All graph measurements returning exactly one value returned an array containing exactly this one value. Now they will return the value directly. Modified functions are:
graph._absoluteEccentricity
graph._eccentricity
graph._absoluteCloseness
graph._closeness
graph._absoluteBetweenness
graph._betweenness
graph._radius
graph._diameter
First started Arango Databases will create the ‘_graph’ collection without waitForSync, so default behaviour of Create & Delete operations on whole graphs change:
POST /_api/graph would by default return HTTP 201 Created, will now return 202 Accepted
DELETE /_api/graph/graph-name would by default return with HTTP 200, will now return 202 Accepted unless waitForSync is specified as parameter, or the ‘_graph’ collections waitForSync attribute was set.
Improved GRAPH_SHORTEST_PATH computationThis involved a change in the default behaviour. The default setting will now only print the distance and the ids of nodes. We have added an optional boolean parameter includeData if this is set to true all documents and edges in the result will be fully expanded. We have also added an optional parameter includePath of type object. It has two optional subattributes vertices and edges both of type boolean. Both can be set individually and the result will include all vertices on the path if includePath.vertices == true and all edge if includePath.edges == true respectivly. So if you want to get the exactly old result back you have to set GRAPH_SHORTEST_PATH(<graph>, <source>, <target>, {includeData: true, includePath: {edges: true, vertices: true}})
The default behaviour is now independent of the size of documents as the extraction part could be optimized. Also the internal algorithm to find all pathes from one source to several targets has been massively improved.
added support for HTTP push aka chunked encoding
issue #1051: add info whether server is running in service or user mode?This will add a “mode” attribute to the result of the result of HTTP GET /_api/version?details=true
“mode” can have the following values:
- standalone: server was started manually (e.g. on command-line)
service: service is running as Windows service, in daemon mode or under the supervisor
- increased default value of --server.request-timeout from 300 to 1200 seconds for client tools (arangosh, arangoimp, arangodump, arangorestore)
- increased default value of --server.connect-timeout from 3 to 5 seconds for client tools (arangosh, arangoimp, arangodump, arangorestore)
added startup option --server.foxx-queues-poll-intervalThis startup option controls the frequency with which the Foxx queues manager is checking the queue (or queues) for jobs to be executed.
The default value is 1 second. Lowering this value will result in the queue manager waking up and checking the queues more frequently, which may increase CPU usage of the server. When not using Foxx queues, this value can be raised to save some CPU time.
added startup option --server.foxx-queues-system-onlyThis startup option controls whether the Foxx queue manager will check queue and job entries in the _system database only. Restricting the Foxx queue manager to the _system database will lead to the queue manager having to check only the queues collection of a single database, whereas making it check the queues of all databases might result in more work to be done and more CPU time to be used by the queue manager.
The default value is true, so that the queue manager will only check the queues in the _system database.
make Foxx queues really database-specific.Foxx queues were and are stored in a database-specific collection _queues. However, a global cache variable for the queues led to the queue names being treated database-independently, which was wrong. Since 2.6, Foxx queues names are truly database-specific, so the same queue name can be used in two different databases for two different queues. Until then, it is advisable to think of queues as already being database-specific, and using the database name as a queue name prefix to be avoid name conflicts, e.g.:
var queueName = “myQueue”; var Foxx = require(“org/arangodb/foxx”); Foxx.queues.create(db._name() + “:” + queueName);
fixed issue #1247: debian init script problems
multi-threaded index creation on collection loadWhen a collection contains more than one secondary index, they can be built in memory in parallel when the collection is loaded. How many threads are used for parallel index creation is determined by the new configuration parameter --database.index-threads. If this is set to 0, indexes are built by the opening thread only and sequentially. This is equivalent to the behavior in 2.5 and before.
speed up building up primary index when loading collections
added count attribute to parameters.json file. This attribute indicates the number of live documents in the collection on unload. It is read when the collection is (re)loaded to determine the initial size for the collection’s primary index
removed remainders of MRuby integration, removed arangoirb
simplified controllers property in Foxx manifests. You can now specify a filename directly if you only want to use a single file mounted at the base URL of your Foxx app.
simplified exports property in Foxx manifests. You can now specify a filename directly if you only want to export variables from a single file in your Foxx app.
added support for Node.js-style exports in Foxx exports. Your Foxx exports file can now export arbitrary values using the module.exports property instead of adding properties to the exports object.
added scripts property to Foxx manifests. You should now specify the setup and teardown files as properties of the scripts object in your manifests.
updated joi package to 6.0.8.
added extendible package.
added Foxx model lifecycle events to repositories. See #1257.

Frank Celler

Frank is both entrepreneur and backend developer, developing mostly memory databases for two decades. He is the CTO and co-founder of ArangoDB. Try to challenge Frank asking him questions on C, C++ and MRuby. Besides Frank organizes Cologne’s NoSQL group & is an active member of NoSQL community.

June 23 2015,Frank Celler

4 Comments

Nicolas Harraudeau on June 26 2015, at 5:47 pm

Hi,
I am wondering if the roadmap up to date. I don’t see all the points listed here https://www.arangodb.com/roadmap/ in this changelog. Any B-tree indexes, lock-free cursors?
Great job by the way!

Reply
- fceller on June 29 2015, at 8:58 am
  
  We have added sparse cursors and moved some of the cursor stuff into the right direction. I’ve updated the roadmap.
  
  Reply
  - Nicolas Harraudeau on June 29 2015, at 9:26 am
    
    Thank you, I thought I had missed something.
    Do you know in which version the lock-free cursors will be available? I won’t take it badly if you prefer not to answer. I know how features can be postponed in a big project 😉
    
    You might also want to update the documentation. This page at least has old roadmap references: https://docs.arangodb.com/Sharding/StatusOfImplementation.html (“Our software architecture is fully prepared for replication, automatic fail-over and recovery of a cluster, which will be implemented for version 2.3 (see our road map).”)
    
    Reply
    - fceller on June 29 2015, at 12:09 pm
      
      The plan is to finish that for 2.7 – we have started it, but did not finish in time for 2.6. It is a major project and as you wrote, sometimes more urgent features need to implemented.
      
      Reply

ArangoDB 2.6 New Release: Enhanced Features & Performance

Features and Improvements

Frank Celler

4 Comments

Leave a Comment Cancel Reply

Tags

Quick Links

Info

About Us

Stay In Touch