ArangoDB 2.6 New Release: Enhanced Features & Performance
We are proud to announce the latest release of ArangoDB with lot’s of improvements and many new features. ArangoDB 2.6 is available for download for many different operating systems. In the new release the focus is on performance improvements. For instance sorting on a string attribute is up to 3 times faster. There are also improvements in the shortest-path implementation and other graph related AQL queries.
Look at some of our previous blogposts like: Reusable Foxx Apps with Configurations, Document your Foxx Apps with Swagger 2 or the Improved System User Authentication to learn more about ArangoDB 2.6 and check the manual for a deeper dive into specific features.
Claudius, CEO: “The performance improvements in every area of ArangoDB 2.6 make ArangoDB an effective alternative to other databases. I am very proud of the product and the team, and we can expect much more in the next few months.”
Some of the performance improvements:
FILTER
conditions: simpleFILTER
conditions we’ve tested are 3 to 5 times faster- simple joins using the primary index (
_key
attribute), hash index or skiplist index are 2 to 3.5 times faster - extracting the
_key
or other top-level attributes from documents is 4 to 5 times faster COLLECT
statements: simpleCOLLECT
statements we’ve tested are 7 to 15 times faster
Max, Software Architect: “With ArangoDB 2.6 we accelerate some of our key areas significantly. Everybody benefits from these improvements, especially people who like to use more complex and ambitious queries. Our latest benchmark already shows this because we have used a preview of ArangoDB 2.6.”
Please give ArangoDB 2.6 a try and provide us with your valuable feedback.
Features and Improvements
- front-end: display of query execution time
- front-end: demo page added. only working if demo data is available.
- front-end: renamed query submit to execute
- front-end: added query explain feature
- removed startup option
--log.severity
- added optional
limit
parameter for AQL functionFULLTEXT
- make fulltext index also index text values that are contained in direct sub-objects of the indexed attribute.
Previous versions of ArangoDB only indexed the attribute value if it was a string. Sub-attributes of the index attribute were ignored when fulltext indexing.
Now, if the index attribute value is an object, the object’s values will each be included in the fulltext index if they are strings. If the index attribute value is an array, the array’s values will each be included in the fulltext index if they are strings.
For example, with a fulltext index present on the translations
attribute, the following text values will now be indexed:
var c = db._create("example"); c.ensureFulltextIndex("translations"); c.insert({ translations: { en: "fox", de: "Fuchs", fr: "renard", ru: "лиса" } }); c.insert({ translations: "Fox is the English translation of the German word Fuchs" }); c.insert({ translations: [ "ArangoDB", "document", "database", "Foxx" ] });
c.fulltext("translations", "лиса").toArray(); // returns only first document c.fulltext("translations", "Fox").toArray(); // returns first and second documents c.fulltext("translations", "prefix:Fox").toArray(); // returns all three documents
- added batch document removal and lookup commands:
collection.lookupByKeys(keys) collection.removeByKeys(keys)
These commands can be used to perform multi-document lookup and removal operations efficiently from the ArangoShell. The argument to these operations is an array of document keys.
Also added HTTP APIs for batch document commands:
- PUT /_api/simple/lookup-by-keys
- PUT /_api/simple/remove-by-keys
- properly prefix document address URLs with the current database name for calls to API method GET
/_api/document?collection=...
(that method will return partial URLs to all documents in the collection).Previous versions of ArangoDB returned the URLs starting with/_api/
but without the current database name, e.g./_api/document/mycollection/mykey
. Starting with 2.6, the response URLs will include the database name as well, e.g./_db/_system/_api/document/mycollection/mykey
. - subquery optimizations for AQL queriesThis optimization avoids copying intermediate results into subqueries that are not required by the subquery.
- return value optimization for AQL queriesThis optimization avoids copying the final query result inside the query’s main
ReturnNode
. - allow
@
and.
characters in document keys, tooThis change also lead to document keys being URL-encoded when returned in HTTPlocation
response headers. - added alternative implementation for AQL COLLECTThe alternative method uses a hash table for grouping and does not require its input elements to be sorted. It will be taken into account by the optimizer for
COLLECT
statements that do not use anINTO
clause.In case a
COLLECT
statement can use the hash table variant, the optimizer will create an extra plan for it at the beginning of the planning phase. In this plan, no extraSORT
node will be added in front of theCOLLECT
because the hash table variant ofCOLLECT
does not require sorted input. Instead, aSORT
node will be added after it to sort its output. ThisSORT
node may be optimized away again in later stages. If the sort order of the result is irrelevant to the user, adding an extraSORT null
after a hashCOLLECT
operation will allow the optimizer to remove the sorts altogether.In addition to the hash table variant of
COLLECT
, the optimizer will modify the original plan to use the regularCOLLECT
implementation. As this implementation requires sorted input, the optimizer will insert aSORT
node in front of theCOLLECT
. ThisSORT
node may be optimized away in later stages.The created plans will then be shipped through the regular optimization pipeline. In the end, the optimizer will pick the plan with the lowest estimated total cost as usual. The hash table variant does not require an up-front sort of the input, and will thus be preferred over the regular
COLLECT
if the optimizer estimates many input elements for theCOLLECT
node and cannot use an index to sort them.The optimizer can be explicitly told to use the regular sorted variant of
COLLECT
by suffixing aCOLLECT
statement withOPTIONS { "method" : "sorted" }
. This will override the optimizer guesswork and only produce the sorted variant ofCOLLECT
. - re-factored cursor HTTP REST API for cursorsThe HTTP REST API for cursors (
/_api/cursor
) has been refactored to improve its performance and use less memory.A post showing some of the performance improvements can be found here: http://jsteemann.github.io/blog/2015/04/01/improvements-for-the-cursor-api/
- simplified return value syntax for data-modification AQL queriesArangoDB 2.4 since version allows to return results from data-modification AQL queries. The syntax for this was quite limited and verbose:
FOR i IN 1..10 INSERT { value: i } IN test LET inserted = NEW RETURN inserted
The
LET inserted = NEW RETURN inserted
was required literally to return the inserted documents. No calculations could be made using the inserted documents.This is now more flexible. After a data-modification clause (e.g.
INSERT
,UPDATE
,REPLACE
,REMOVE
,UPSERT
) there can follow any number ofLET
calculations. These calculations can refer to the pseudo-valuesOLD
andNEW
that are created by the data-modification statements.This allows returning projections of inserted or updated documents, e.g.:
FOR i IN 1..10 INSERT { value: i } IN test RETURN { _key: NEW._key, value: i }
Still not every construct is allowed after a data-modification clause. For example, no functions can be called that may access documents.
More information can be found here: http://jsteemann.github.io/blog/2015/03/27/improvements-for-data-modification-queries/
- added AQL
UPSERT
statementThis adds anUPSERT
statement to AQL that is a combination of bothINSERT
andUPDATE
/REPLACE
. TheUPSERT
will search for a matching document using a user-provided example. If no document matches the example, the insert part of theUPSERT
statement will be executed. If there is a match, the update / replace part will be carried out:UPSERT { page: ‘index.html’ } /* search example / INSERT { page: ‘index.html’, pageViews: 1 } / insert part / UPDATE { pageViews: OLD.pageViews + 1 } / update part */ IN pageViews
UPSERT
can be used with anUPDATE
orREPLACE
clause. TheUPDATE
clause will perform a partial update of the found document, whereas theREPLACE
clause will replace the found document entirely. TheUPDATE
orREPLACE
parts can refer to the pseudo-valueOLD
, which contains all attributes of the found document.UPSERT
statements can optionally return values. In the following query, the return attributefound
will return the found document before theUPDATE
was applied. If no document was found,found
will contain a value ofnull
. Theupdated
result attribute will contain the inserted / updated document:UPSERT { page: ‘index.html’ } /* search example / INSERT { page: ‘index.html’, pageViews: 1 } / insert part / UPDATE { pageViews: OLD.pageViews + 1 } / update part */ IN pageViews RETURN { found: OLD, updated: NEW }
A more detailed description of
UPSERT
can be found here: http://jsteemann.github.io/blog/2015/03/27/preview-of-the-upsert-command/ - adjusted default configuration value for
--server.backlog-size
from 10 to 64. - issue #1231: bug xor feature in AQL: LENGTH(null) == 4This changes the behavior of the AQL
LENGTH
function as follows: - if the single argument to
LENGTH()
isnull
, then the result will now be0
. In previous versions of ArangoDB, the result ofLENGTH(null)
was4
. - if the single argument to
LENGTH()
istrue
, then the result will now be1
. In previous versions of ArangoDB, the result ofLENGTH(true)
was4
. - if the single argument to
LENGTH()
isfalse
, then the result will now be0
. In previous versions of ArangoDB, the result ofLENGTH(false)
was5
.The results ofLENGTH()
with string, numeric, array object argument values do not change.- issue #1298: Bulk import if data already exists (#1298)
This change extends the HTTP REST API for bulk imports as follows:
When documents are imported and the
_key
attribute is specified for them, the import can be used for inserting and updating/replacing documents. Previously, the import could be used for inserting new documents only, and re-inserting a document with an existing would have failed with a unique key constraint violated error.The above behavior is still the default. However, the API now allows controlling the behavior in case of a unique key constraint error via the optional URL parameter
onDuplicate
.This parameter can have one of the following values:
error
: when a unique key constraint error occurs, do not import or update the document but report an error. This is the default.update
: when a unique key constraint error occurs, try to (partially) update the existing document with the data specified in the import. This may still fail if the document would violate secondary unique indexes. Only the attributes present in the import data will be updated and other attributes already present will be preserved. The number of updated documents will be reported in theupdated
attribute of the HTTP API result.replace
: when a unique key constraint error occurs, try to fully replace the existing document with the data specified in the import. This may still fail if the document would violate secondary unique indexes. The number of replaced documents will be reported in theupdated
attribute of the HTTP API result.ignore
: when a unique key constraint error occurs, ignore this error. There will be no insert, update or replace for the particular document. Ignored documents will be reported separately in theignored
attribute of the HTTP API result.The result of the HTTP import API will now contain the attributesignored
andupdated
, which contain the number of ignored and updated documents respectively. These attributes will contain a value of zero unless theonDuplicate
URL parameter is set to eitherupdate
orreplace
(in this case theupdated
attribute may contain non-zero values) orignore
(in this case theignored
attribute may contain a non-zero value).To support the feature, arangoimp also has a new command line option
--on-duplicate
which can have one of the valueserror
,update
,replace
,ignore
. The default value iserror
.A few examples for using arangoimp with the
--on-duplicate
option can be found here: http://jsteemann.github.io/blog/2015/04/14/updating-documents-with-arangoimp/- changed behavior of
db._query()
in the ArangoShell:if the command’s result is printed in the shell, the first 10 results will be printed. Previously only a basic description of the underlying query result cursor was printed. Additionally, if the cursor result contains more than 10 results, the cursor is assigned to a global variablemore
, which can be used to iterate over the cursor result.Example:
arangosh [_system]> db._query(“FOR i IN 1..15 RETURN i”) [object ArangoQueryCursor, count: 15, hasMore: true]
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]
type ‘more’ to show more documents
arangosh [_system]> more [object ArangoQueryCursor, count: 15, hasMore: false]
[ 11, 12, 13, 14, 15 ]
- Breaking Changes:
- AQL command:
GRAPH_SHORTEST_PATH
now only returns IDs and does not extract data any more. It yields an additional optionincludeData
which is an object, taking exactly two keys: edges
set to true will extract all data stored alongside the edges of each path.vertices
set to true will extract all data stored alongside the vertices of each path. The default value of these parameters has been set tofalse
- JS Module: general-graph: All graph measurements returning exactly one value returned an array containing exactly this one value. Now they will return the value directly. Modified functions are:
graph._absoluteEccentricity
graph._eccentricity
graph._absoluteCloseness
graph._closeness
graph._absoluteBetweenness
graph._betweenness
graph._radius
graph._diameter
- First started Arango Databases will create the ‘_graph’ collection without waitForSync, so default behaviour of Create & Delete operations on whole graphs change:
- POST /_api/graph would by default return HTTP 201 Created, will now return 202 Accepted
- DELETE /_api/graph/
graph-name
would by default return with HTTP 200, will now return 202 Accepted unless waitForSync is specified as parameter, or the ‘_graph’ collections waitForSync attribute was set. - Improved GRAPH_SHORTEST_PATH computationThis involved a change in the default behaviour. The default setting will now only print the distance and the ids of nodes. We have added an optional boolean parameter
includeData
if this is set totrue
all documents and edges in the result will be fully expanded. We have also added an optional parameterincludePath
of type object. It has two optional subattributesvertices
andedges
both of type boolean. Both can be set individually and the result will include all vertices on the path ifincludePath.vertices == true
and all edge ifincludePath.edges == true
respectivly. So if you want to get the exactly old result back you have to setGRAPH_SHORTEST_PATH(<graph>, <source>, <target>, {includeData: true, includePath: {edges: true, vertices: true}})
The default behaviour is now independent of the size of documents as the extraction part could be optimized. Also the internal algorithm to find all pathes from one source to several targets has been massively improved.
- added support for HTTP push aka chunked encoding
- issue #1051: add info whether server is running in service or user mode?This will add a “mode” attribute to the result of the result of HTTP GET
/_api/version?details=true
“mode” can have the following values:
standalone
: server was started manually (e.g. on command-line)
service
: service is running as Windows service, in daemon mode or under the supervisor- increased default value of
--server.request-timeout
from 300 to 1200 seconds for client tools (arangosh, arangoimp, arangodump, arangorestore) - increased default value of
--server.connect-timeout
from 3 to 5 seconds for client tools (arangosh, arangoimp, arangodump, arangorestore)
- increased default value of
- added startup option
--server.foxx-queues-poll-interval
This startup option controls the frequency with which the Foxx queues manager is checking the queue (or queues) for jobs to be executed.The default value is
1
second. Lowering this value will result in the queue manager waking up and checking the queues more frequently, which may increase CPU usage of the server. When not using Foxx queues, this value can be raised to save some CPU time. - added startup option
--server.foxx-queues-system-only
This startup option controls whether the Foxx queue manager will check queue and job entries in the_system
database only. Restricting the Foxx queue manager to the_system
database will lead to the queue manager having to check only the queues collection of a single database, whereas making it check the queues of all databases might result in more work to be done and more CPU time to be used by the queue manager.The default value is
true
, so that the queue manager will only check the queues in the_system
database. - make Foxx queues really database-specific.Foxx queues were and are stored in a database-specific collection
_queues
. However, a global cache variable for the queues led to the queue names being treated database-independently, which was wrong. Since 2.6, Foxx queues names are truly database-specific, so the same queue name can be used in two different databases for two different queues. Until then, it is advisable to think of queues as already being database-specific, and using the database name as a queue name prefix to be avoid name conflicts, e.g.:var queueName = “myQueue”; var Foxx = require(“org/arangodb/foxx”); Foxx.queues.create(db._name() + “:” + queueName);
- fixed issue #1247: debian init script problems
- multi-threaded index creation on collection loadWhen a collection contains more than one secondary index, they can be built in memory in parallel when the collection is loaded. How many threads are used for parallel index creation is determined by the new configuration parameter
--database.index-threads
. If this is set to 0, indexes are built by the opening thread only and sequentially. This is equivalent to the behavior in 2.5 and before. - speed up building up primary index when loading collections
- added
count
attribute toparameters.json
file. This attribute indicates the number of live documents in the collection on unload. It is read when the collection is (re)loaded to determine the initial size for the collection’s primary index - removed remainders of MRuby integration, removed arangoirb
- simplified
controllers
property in Foxx manifests. You can now specify a filename directly if you only want to use a single file mounted at the base URL of your Foxx app. - simplified
exports
property in Foxx manifests. You can now specify a filename directly if you only want to export variables from a single file in your Foxx app. - added support for Node.js-style exports in Foxx exports. Your Foxx exports file can now export arbitrary values using the
module.exports
property instead of adding properties to theexports
object. - added
scripts
property to Foxx manifests. You should now specify thesetup
andteardown
files as properties of thescripts
object in your manifests. - updated
joi
package to 6.0.8. - added
extendible
package. - added Foxx model lifecycle events to repositories. See #1257.
4 Comments
Leave a Comment
Get the latest tutorials, blog posts and news:
Hi,
I am wondering if the roadmap up to date. I don’t see all the points listed here https://www.arangodb.com/roadmap/ in this changelog. Any B-tree indexes, lock-free cursors?
Great job by the way!
We have added sparse cursors and moved some of the cursor stuff into the right direction. I’ve updated the roadmap.
Thank you, I thought I had missed something.
Do you know in which version the lock-free cursors will be available? I won’t take it badly if you prefer not to answer. I know how features can be postponed in a big project 😉
You might also want to update the documentation. This page at least has old roadmap references: https://docs.arangodb.com/Sharding/StatusOfImplementation.html (“Our software architecture is fully prepared for replication, automatic fail-over and recovery of a cluster, which will be implemented for version 2.3 (see our road map).”)
The plan is to finish that for 2.7 – we have started it, but did not finish in time for 2.6. It is a major project and as you wrote, sometimes more urgent features need to implemented.