Efficient Lock-Free Data Structure Protection | ArangoDB Blog

Motivation

In multi-threaded applications running on multi-core systems, it occurs often that there are certain data structures, which are frequently read but relatively seldom changed. An example of this would be a database server that has a list of databases that changes rarely, but needs to be consulted for every single query hitting the database. In such situations one needs to guarantee fast read access as well as protection against inconsistencies, use after free and memory leaks.

Therefore we seek a lock-free protection mechanism that scales to lots of threads on modern machines and uses only C++11 standard library methods. The mechanism should be easy to use and easy to understand and prove correct. This article presents a solution to this, which is probably not new, but which we still did not find anywhere else.

The concrete challenge at hand

Assume a global data structure on the heap and a single atomic pointer P to it. If (fast) readers access this completely unprotected, then a (slow) writer can create a completely new data structure and then change the pointer to the new structure with an atomic operation. Since writing is not time critical, one can easily use a mutex to ensure that there is only a single writer at any given time. The only problem is to decide, when it is safe to destruct the old value, because the writer cannot easily know that no reader is still accessing the old values. The challenge is aggravated by the fact that without thread synchronization it is unclear, when a reader actually sees the new pointer value, in particular on a multi-core machine with a complex system of caches.

If you want to see our solution directly, scroll down to “Source code links“. We first present a classical good approach and then try to improve on it. (more…)

More info...

Throughput Enhancements: Boosting ArangoDB Performance

We’ve recently been working on improving ArangoDB’s throughput, especially when using the ArangoDB’s interface.

In this post, I will show some of the improvements already achieved, though the work is not yet finished. Therefore, the results shown here are still somewhat preliminary.

We wanted to measure improvements for ArangoDB’s HTTP interface, and so we used wrk as an external HTTP load generator.

During the tests, wrk called some specific URLs on a local ArangoDB instance on an otherwise idle machine. The test was run with ArangoDB 2.6 and devel. The ArangoDB instances were started with their default configuration.

wrk was invoked with varying amounts of client connections and threads, so the tests cover serial and concurrent/parallel requests:

bash invoking wrk

wrk -c $CONNECTIONS -t $THREADS -d 10 $URL

The number of connections ($CONNECTIONS) and threads ($THREADS) were both varied from 1 to 8. wrk requires at least as many connections as threads.

(more…)

More info...

ArangoDB 2.6.4: Maintenance Release Overview | ArangoDB Blog

ArangoDB Version 2.6.4 comes with an upgraded V8 engine (4.1.0.27) and is ready to download now. In the 2.5 branch we’ve published a 2.5.7 maintenance release as well.

arangodb.com/download

More info...

Running V8 Isolates in Multi-Threaded ArangoDB

ArangoDB allows running user-defined JavaScript code in the database. This can be used for more complex, stored procedures-like database operations. Additionally, ArangoDB’s Foxx framework can be used to make any database functionality available via an HTTP REST API. It’s easy to build data-centric microservices with it, using the scripting functionality for tasks like access control, data validation, sanitation etc.

We often get asked how the scripting functionality is implemented under the hood. Additionally, several people have asked how ArangoDB’s JavaScript functionality relates to node.js.

This post tries to explain that in detail.

(more…)

More info...

Arango Weekly 31: Official Docker Repo & New Release 2.6.3

ArangoDB is now an Official Repo in the Docker Hub, one of just four additions in the last 2 months. Please try and tell your friends! ArangoDB 2.6 is known as a performance release and we’ve continued to improve the core by killing locks and optimizing code. Looks like we can show some impressive performance boosts soon. Furthermore, Mike Williamson wrote a blog post on modeling data with ArangoDB last week, that is worth to read.

Follow ArangoDB on LinkedIn and add ArangoDB as a skill. We would appreciate your help. Keep an eye on our blog or follow us on Twitter for news about ArangoDB.

(more…)

More info...

AQL Object Literal Simplification: ArangoDB Query Optimization

ArangoDB’s devel branch recently saw a change that makes writing some AQL queries a bit simpler.

The change introduces an optional shorthand notation for object attributes in the style of ES6’s enhanced object literal notation.

For example, consider the following query that groups values by age attribute and counts the number of documents per distinct age value:

FOR doc IN collection
  COLLECT age = doc.age WITH COUNT INTO length
  RETURN { age: age, length: length } 

The object declaration in the last line of the query is somewhat redundant because one has to type identical attribute names and values:

RETURN { age: age, length: length } 

In this case, the new shorthand notation simplifies the RETURN to:

RETURN { age, length }

In general, the shorthand notation can be used for all object literals when there is an attribute name that refers to a query variable of the same name.

It can also be mixed with the longer notation, e.g.:

RETURN { age, length, dateCreated: DATE_NOW() }
More info...

ArangoDB 2.6.3: Maintenance Release for Stability & Performance

A maintenance release of ArangoDB is available, we have fixed an issue with NULL bytes inside attribute values (#1409) that occurs when fetching a document via REST API.

Release 2.5.6 and 2.6.3 can be downloaded from arangodb.com/download now.

More info...

Mastering AQL: Return Distinct Values | ArangoDB Blog

Last week saw the addition of the RETURN DISTINCT for AQL queries. This is a new shortcut syntax for making result sets unique.

For this purpose it can be used as an easier-to-memorize alternative for the already existing COLLECT statement. COLLECT is very flexible and can be used for multiple purposes, but it is syntactic overkill for making a result-set unique.

New to multi-model and graphs? Check out our free ArangoDB Graph Course.

The new RETURN DISTINCT syntax makes queries easier to write and understand.

Here’s a non-scientific proof for this claim:

Compare the following queries, which both return each distinct age attribute value from the collection:

FOR doc IN collection
  COLLECT age = doc.age
  RETURN age

With RETURN DISTINCT:

FOR doc IN collection
  RETURN DISTINCT doc.age

Clearly, the query using RETURN DISTINCT is more intuitive, especially for AQL beginners. Apart from that, using RETURN DISTINCT will save a bit of typing compared to the longer COLLECT-based query.

Internally both COLLECT and RETURN DISTINCT will work by creating an AggregateNode. The optimizer will try the sorted and the hashed variants for both, so they should perform about the same.

However, the result of a RETURN DISTINCT does not have any guaranteed order, so the optimizer will not insert a post-SORT for it. It may do so for a regular COLLECT.

As mentioned before, COLLECT is more flexible than RETURN DISTINCT. Notably, COLLECT is superior to RETURN DISTINCT when the result set should be made unique using more than one criterion, e.g.

FOR doc IN collection
  COLLECT status = doc.status, age = doc.age, 
  RETURN { status, age }

This is currently not achievable via RETURN DISTINCT, as it only works with a single criterion.

More info...

ArangoDB Nightly Travis Builds: Continuous Integration Updates

Great news for driver maintainers that want access to the latest developments in ArangoDB. Many of you have asked us if we can provide a nightly build of our ArangoDB database to improve CI test automation using Travis-CI. The Travis builds for ArangoDB 2.6, 2.7 and devel will be generated and published shortly after midnight (GMT).

More info...

Arango Weekly 30: New Performance Results & O’Reilly Article

Maybe you’ve noticed that there was no ArangoDB newsletter last week. So here’s the news of the last two weeks and the announcement that during the summer our NL will be biweekly. 🙂

In the meantime we’ve improved the performance of the shortest path implementation significantly and rerun the Multi-Model performance tests. The article Data modeling with multi-model databases – a use case for multi-model databases – was a huge success on O’Reilly Radar last week, it had the most page views of all Radar articles. It’s worth to read.

Finally, Mesosphere launched it’s SDK and developer program and we are proud to be one of the first partners that integrate into DCOS. Stay tuned, there will be more to come.

Keep an eye on our blog or follow us on Twitter for news about ArangoDB.

(more…)

More info...

Get the latest tutorials,
blog posts and news: