background

GA of ArangoDB 2.7 – Big + for Indexes, Throughput, AQL and Foxx

Long awaited and now we´ve finished it! New major release of ArangoDB 2.7 is ready for download. First of all a big thanks to our community for your great support! We´ve implemented a lot of your ideas! After your feedback to RC1 and RC2 we are happy to bring a new major release to the world. With ArangoDB 2.7 we increased our performance even further and improved query handling a lot.

What big improvements are in for you?

Index buckets

  • The primary indexes and hash indexes of collections can now split into multiple index buckets.

Throughput Enhancements

  • A lot is not enough. Throughput is another key requirement for a premium database. Again we pushed our throughput a big step forward with 2.7.

AQL Improvements – Ease of Use and Performance

  • Our goal was to further shorten and ease the writing of statements. AQL has always been an efficient and intuitive query language similar to SQL but with ArangoDB 2.7 AQL got even better.

Find a detailed overview in our blogpost about RC1.

Furthermore we fixed some issues and enabled Foxx apps to be installed underneath URL path /_open/, so they can be (intentionally) accessed without authentification. The extensibility for your data-centric microservices got even bigger.

Please find all details below or on github

Changes from 2.7(RC2) to 2.7

  • fixed request statistics aggregation When arangod was started in supervisor mode, the request statistics always showed 0 requests, as the statistics aggregation thread did not run then.
  • read server configuration files before dropping privileges. this ensures that the SSL keyfile specified in the configuration can be read with the server’s start privileges (i.e. root when using a standard ArangoDB package).
  • fixed replication with a 2.6 replication configuration and issues with a 2.6 master
  • raised default value of --server.descriptors-minimum to 1024
  • allow Foxx apps to be installed underneath URL path /_open/, so they can be (intentionally) accessed without authentication.
  • added allowImplicit sub-attribute in collections declaration of transactions. The allowImplicit attributes allows making transactions fail should they read-access a collection that was not explicitly declared in the collections array of the transaction.
  • added “special” password ARANGODB_DEFAULT_ROOT_PASSWORD. If you pass ARANGODB_DEFAULT_ROOT_PASSWORD as password, it will read the password from the environment variable ARANGODB_DEFAULT_ROOT_PASSWORD

Changes in 2.7 RC2 and RC1

RC2

  • fix over-eager datafile compactionThis should reduce the need to compact directly after loading a collection when a collection datafile contained many insertions and updates for the same documents. It should also prevent from re-compacting already merged datafiles in case not many changes were made. Compaction will also make fewer index lookups than before.
  • added syncCollection() function in module org/arangodb/replicationThis allows synchronizing the data of a single collection from a master to a slave server. Synchronization can either restore the whole collection by transferring all documents from the master to the slave, or incrementally by only transferring documents that differ. This is done by partitioning the collection’s entire key space into smaller chunks and comparing the data chunk-wise between master and slave. Only chunks that are different will be re-transferred.

    The syncCollection() function can be used as follows:

    require(“org/arangodb/replication”).syncCollection(collectionName, options);

    e.g.

    require(“org/arangodb/replication”).syncCollection(“myCollection”, { endpoint: “tcp://127.0.0.1:8529”, /* master / username: “root”, / username for master / password: “secret”, / password for master / incremental: true / use incremental mode */ });

  • additionally allow the following characters in document keys:( ) + , = ; $ ! * ' %

RC1

  • upgraded Swagger to version 2.0 for the DocumentationThis gives the user better prepared test request structures. More conversions will follow so finally client libraries can be auto-generated.
  • added extra AQL functions for date and time calculation and manipulation. These functions were contributed by GitHub users @CoDEmanX and @friday. A big thanks for their work!The following extra date functions are available from 2.7 on:
    • DATE_DAYOFYEAR(date): Returns the day of year number of date. The return values range from 1 to 365, or 366 in a leap year respectively.
    • DATE_ISOWEEK(date): Returns the ISO week date of date. The return values range from 1 to 53. Monday is considered the first day of the week. There are no fractional weeks, thus the last days in December may belong to the first week of the next year, and the first days in January may be part of the previous year’s last week.
    • DATE_LEAPYEAR(date): Returns whether the year of date is a leap year.
    • DATE_QUARTER(date): Returns the quarter of the given date (1-based):
      • 1: January, February, March
      • 2: April, May, June
      • 3: July, August, September
      • 4: October, November, December
    • DATE_DAYS_IN_MONTH(date): Returns the number of days in date‘s month (28..31).
    • DATE_ADD(date, amount, unit): Adds amount given in unit to date and returns the calculated date.

    unit can be either of the following to specify the time unit to add or subtract (case-insensitive):

    • y, year, years
    • m, month, months
    • w, week, weeks
    • d, day, days
    • h, hour, hours
    • i, minute, minutes
    • s, second, seconds
    • f, millisecond, milliseconds

    amount is the number of units to add (positive value) or subtract (negative value).

    DATE_SUBTRACT(date, amount, unit): Subtracts *amount given in unit from date and returns the calculated date.

    It works the same as DATE_ADD(), except that it subtracts. It is equivalent to calling DATE_ADD() with a negative amount, except that DATE_SUBTRACT() can also subtract ISO durations. Note that negative ISO durations are not supported (i.e. starting with -P, like -P1Y).

    • DATE_DIFF(date1, date2, unit, asFloat): Calculate the difference between two dates in given time unit, optionally with decimal places. Returns a negative value if date1 is greater than date2.
    • DATE_COMPARE(date1, date2, unitRangeStart, unitRangeEnd): Compare two partial dates and return true if they match, false otherwise. The parts to compare are defined by a range of time units.

    The full range is: years, months, days, hours, minutes, seconds, milliseconds. Pass the unit to start from as unitRangeStart, and the unit to end with as unitRangeEnd. All units in between will be compared. Leave out unitRangeEnd to only compare unitRangeStart.

    • DATE_FORMAT(date, format): Format a date according to the given format string. It supports the following placeholders (case-insensitive):
    • %t: timestamp, in milliseconds since midnight 1970-01-01
    • %z: ISO date (0000-00-00T00:00:00.000Z)
    • %w: day of week (0..6)
    • %y: year (0..9999)
    • %yy: year (00..99), abbreviated (last two digits)
    • %yyyy: year (0000..9999), padded to length of 4
    • %yyyyyy: year (-009999 .. +009999), with sign prefix and padded to length of 6
    • %m: month (1..12)
    • %mm: month (01..12), padded to length of 2
    • %d: day (1..31)
    • %dd: day (01..31), padded to length of 2
    • %h: hour (0..23)
    • %hh: hour (00..23), padded to length of 2
    • %i: minute (0..59)
    • %ii: minute (00..59), padded to length of 2
    • %s: second (0..59)
    • %ss: second (00..59), padded to length of 2
    • %f: millisecond (0..999)
    • %fff: millisecond (000..999), padded to length of 3
    • %x: day of year (1..366)
    • %xxx: day of year (001..366), padded to length of 3
    • %k: ISO week date (1..53)
    • %kk: ISO week date (01..53), padded to length of 2
    • %l: leap year (0 or 1)
    • %q: quarter (1..4)
    • %a: days in month (28..31)
    • %mmm: abbreviated English name of month (Jan..Dec)
    • %mmmm: English name of month (January..December)
    • %www: abbreviated English name of weekday (Sun..Sat)
    • %wwww: English name of weekday (Sunday..Saturday)
    • %&: special escape sequence for rare occasions
    • %%: literal %
    • %: ignored
  • new WAL logfiles and datafiles are now created non-sparseThis prevents SIGBUS signals being raised when memory of a sparse datafile is accessed and the disk is full and the accessed file part is not actually disk-backed. In this case the mapped memory region is not necessarily backed by physical memory, and accessing the memory may raise SIGBUS and crash arangod.
  • the internal.download() function and the module org/arangodb/request used some internal library function that handled the sending of HTTP requests from inside of ArangoDB. This library unconditionally set an HTTP header Accept-Encoding: gzip in all outgoing HTTP requests.This has been fixed in 2.7, so Accept-Encoding: gzip is not set automatically anymore. Additionally, the header User-Agent: ArangoDB is not set automatically either. If client applications desire to send these headers, they are free to add it when constructing the requests using the download function or the request module.
  • fixed issue #1436: org/arangodb/request advertises deflate without supporting it
  • added template string generator function aqlQuery for generating AQL queriesThis can be used to generate safe AQL queries with JavaScript parameter variables or expressions easily:

    var name = ‘test’; var attributeName = ‘_key’; var query = aqlQueryFOR u IN users FILTER u.name == ${name} RETURN u.${attributeName}; db._query(query);

  • report memory usage for document header data (revision id, pointer to data etc.) in db.collection.figures(). The memory used for document headers will now show up in the already existing attribute indexes.size. Due to that, the index sizes reported by figures() in 2.7 will be higher than those reported by 2.6, but the 2.7 values are more accurate.
  • IMPORTANT CHANGE: the filenames in dumps created by arangodump now contain not only the name of the dumped collection, but also an additional 32-digit hash value. This is done to prevent overwriting dump files in case-insensitive file systems when there exist multiple collections with the same name (but with different cases).For example, if a database has two collections: test and Test, previous versions of ArangoDB created the files
    • test.structure.json and test.data.json for collection test
    • Test.structure.json and Test.data.json for collection Test

    This did not work for case-insensitive filesystems, because the files for the second collection would have overwritten the files of the first. arangodump in 2.7 will create the following filenames instead:

    • test_098f6bcd4621d373cade4e832627b4f6.structure.json and test_098f6bcd4621d373cade4e832627b4f6.data.json
    • Test_0cbc6611f5540bd0809a388dc95a615b.structure.json and Test_0cbc6611f5540bd0809a388dc95a615b.data.json

    These filenames will be unambiguous even in case-insensitive filesystems.

  • IMPORTANT CHANGE: make arangod actually close lingering client connections when idle for at least the duration specified via --server.keep-alive-timeout. In previous versions of ArangoDB, connections were not closed by the server when the timeout was reached and the client was still connected. Now the connection is properly closed by the server in case of timeout. Client applications relying on the old behavior may now need to reconnect to the server when their idle connections time out and get closed (note: connections being idle for a long time may be closed by the OS or firewalls anyway – client applications should be aware of that and try to reconnect).
  • IMPORTANT CHANGE: when starting arangod, the server will drop the process privileges to the specified values in options --server.uid and --server.gid instantly after parsing the startup options.That means when either --server.uid or --server.gid are set, the privilege change will happen earlier. This may prevent binding the server to an endpoint with a port number lower than 1024 if the arangodb user has no privileges for that. Previous versions of ArangoDB changed the privileges later, so some startup actions were still carried out under the invoking user (i.e. likely root when started via init.d or system scripts) and especially binding to low port numbers was still possible there.

    The default privileges for user arangodb will not be sufficient for binding to port numbers lower than 1024. To have an ArangoDB 2.7 bind to a port number lower than 1024, it needs to be started with either a different privileged user, or the privileges of the arangodb user have to raised manually beforehand.

  • added AQL optimizer rule patch-update-statements
  • Linux startup scripts and systemd configuration for arangod now try to adjust the NOFILE (number of open files) limits for the process. The limit value is set to 131072 (128k) when ArangoDB is started via start/stop commands
  • When ArangoDB is started/stopped manually via the start/stop commands, the main process will wait for up to 10 seconds after it forks the supervisor and arangod child processes. If the startup fails within that period, the start/stop script will fail with an exit code other than zero. If the startup of the supervisor or arangod is still ongoing after 10 seconds, the main program will still return with exit code 0. The limit of 10 seconds is arbitrary because the time required for a startup is not known in advance.
  • added startup option --database.throw-collection-not-loaded-errorAccessing a not-yet loaded collection will automatically load a collection on first access. This flag controls what happens in case an operation would need to wait for another thread to finalize loading a collection. If set to true, then the first operation that accesses an unloaded collection will load it. Further threads that try to access the same collection while it is still loading immediately fail with an error (1238, collection not loaded). This is to prevent all server threads from being blocked while waiting on the same collection to finish loading. When the first thread has completed loading the collection, the collection becomes regularly available, and all operations from that point on can be carried out normally, and error 1238 will not be thrown anymore for that collection.

    If set to false, the first thread that accesses a not-yet loaded collection will still load it. Other threads that try to access the collection while loading will not fail with error 1238 but instead block until the collection is fully loaded. This configuration might lead to all server threads being blocked because they are all waiting for the same collection to complete loading. Setting the option to true will prevent this from happening, but requires clients to catch error 1238 and react on it (maybe by scheduling a retry for later).

    The default value is false.

  • added better control-C support in arangoshWhen CTRL-C is pressed in arangosh, it will now print a ^C first. Pressing CTRL-C again will reset the prompt if something was entered before, or quit arangosh if no command was entered directly before.

    This affects the arangosh version build with Readline-support only (Linux and MacOS).

    The MacOS version of ArangoDB for Homebrew now depends on Readline, too. The Homebrew formula has been changed accordingly. When self-compiling ArangoDB on MacOS without Homebrew, Readline now is a prerequisite.

  • increased default value for collection-specific indexBuckets value from 1 to 8Collections created from 2.7 on will use the new default value of 8 if not overridden on collection creation or later using collection.properties({ indexBuckets: ... }).

    The indexBuckets value determines the number of buckets to use for indexes of type primary, hash and edge. Having multiple index buckets allows splitting an index into smaller components, which can be filled in parallel when a collection is loading. Additionally, resizing and reallocation of indexes are faster and less intrusive if the index uses multiple buckets, because resize and reallocation will affect only data in a single bucket instead of all index values.

    The index buckets will be filled in parallel when loading a collection if the collection has an indexBuckets value greater than 1 and the collection contains a significant amount of documents/edges (the current threshold is 256K documents but this value may change in future versions of ArangoDB).

  • changed HTTP client to use poll instead of select on Linux and MacOSThis affects the ArangoShell and user-defined JavaScript code running inside arangod that initiates its own HTTP calls.

    Using poll instead of select allows using arbitrary high file descriptors (bigger than the compiled in FD_SETSIZE). Server connections are still handled using epoll, which has never been affected by FD_SETSIZE.

  • implemented AQL LIKE function using ICU regexes
  • added RETURN DISTINCT for AQL queries to return unique results:FOR doc IN collection RETURN DISTINCT doc.status

    This change also introduces DISTINCT as an AQL keyword.

  • removed createNamedQueue() and addJob() functions from org/arangodb/tasks
  • use less locks and more atomic variables in the internal dispatcher and V8 context handling implementations. This leads to improved throughput in some ArangoDB internals and allows for higher HTTP request throughput for many operations.A short overview of the improvements can be found here:

    http://www.arangodb.com/2015/08/throughput-enhancements/

  • added shorthand notation for attribute names in AQL object literals:LET name = “Peter” LET age = 42 RETURN { name, age }

    The above is the shorthand equivalent of the generic form

    LET name = “Peter” LET age = 42 RETURN { name : name, age : age }

  • removed configure option --enable-timingsThis option did not have any effect.
  • removed configure option --enable-figuresThis option previously controlled whether HTTP request statistics code was compiled into ArangoDB or not. The previous default value was true so statistics code was available in official packages. Setting the option to false led to compile errors so it is doubtful the default value was ever changed. By removing the option some internal statistics code was also simplified.
  • removed run-time manipulation methods for server endpoints:
    • db._removeEndpoint()
    • db._configureEndpoint()
    • HTTP POST /_api/endpoint
    • HTTP DELETE /_api/endpoint
  • AQL query result cacheThe query result cache can optionally cache the complete results of all or selected AQL queries. It can be operated in the following modes:
    • off: the cache is disabled. No query results will be stored
    • on: the cache will store the results of all AQL queries unless their cache attribute flag is set to false
    • demand: the cache will store the results of AQL queries that have their cache attribute set to true, but will ignore all others

    The mode can be set at server startup using the --database.query-cache-mode configuration option and later changed at runtime.

    The following HTTP REST APIs have been added for controlling the query cache:

    • HTTP GET /_api/query-cache/properties: returns the global query cache configuration
    • HTTP PUT /_api/query-cache/properties: modifies the global query cache configuration
    • HTTP DELETE /_api/query-cache: invalidates all results in the query cache

    The following JavaScript functions have been added for controlling the query cache:

    • require("org/arangodb/aql/cache").properties(): returns the global query cache configuration
    • require("org/arangodb/aql/cache").properties(properties): modifies the global query cache configuration
    • require("org/arangodb/aql/cache").clear(): invalidates all results in the query cache
  • do not link arangoimp against V8
  • AQL function call arguments optimizationThis will lead to arguments in function calls inside AQL queries not being copied but passed by reference. This may speed up calls to functions with bigger argument values or queries that call functions a lot of times.
  • upgraded V8 version to 4.3.61
  • removed deprecated AQL SKIPLIST function.This function was introduced in older versions of ArangoDB with a less powerful query optimizer to retrieve data from a skiplist index using a LIMIT clause. It was marked as deprecated in ArangoDB 2.6.

    Since ArangoDB 2.3 the behavior of the SKIPLIST function can be emulated using regular AQL constructs, e.g.

    FOR doc IN @@collection FILTER doc.value >= @value SORT doc.value DESC LIMIT 1 RETURN doc

    • the skip() function for simple queries does not accept negative input any longer. This feature was deprecated in 2.6.0.
    • fix exception handling

    In some cases JavaScript exceptions would re-throw without information of the original problem. Now the original exception is logged for failure analysis.

    • based REST API method PUT /_api/simple/all on the cursor API and make it use AQL internally.

    The change speeds up this REST API method and will lead to additional query information being returned by the REST API. Clients can use this extra information or ignore it.

    • Foxx Queue job success/failure handlers arguments have changed from (jobId, jobData, result, jobFailures) to (result, jobData, job).
    • added Foxx Queue job options repeatTimes, repeatUntil and repeatDelay to automatically re-schedule jobs when they are completed.
    • added Foxx manifest configuration type password to mask values in the web interface.
    • fixed default values in Foxx manifest configurations sometimes not being used as defaults.
    • fixed optional parameters in Foxx manifest configurations sometimes not being cleared correctly.
    • Foxx dependencies can now be marked as optional using a slightly more verbose syntax in your manifest file.
    • converted Foxx constructors to ES6 classes so you can extend them using class syntax.
    • updated aqb to 2.0.
    • updated chai to 3.0.
    • Use more madvise calls to speed up things when memory is tight, in particular at load time but also for random accesses later.
    • Overhauled web interface

    The web interface now has a new design.

    The API documentation for ArangoDB has been moved from “Tools” to “Links” in the web interface.

    The “Applications” tab in the web interfaces has been renamed to “Services”.

Julie Ferrario

Leave a Comment





Get the latest tutorials, blog posts and news: