Efficient Lock-Free Data Structure Protection | ArangoDB Blog
Motivation
In multi-threaded applications running on multi-core systems, it occurs often that there are certain data structures, which are frequently read but relatively seldom changed. An example of this would be a database server that has a list of databases that changes rarely, but needs to be consulted for every single query hitting the database. In such situations one needs to guarantee fast read access as well as protection against inconsistencies, use after free and memory leaks.
Therefore we seek a lock-free protection mechanism that scales to lots of threads on modern machines and uses only C++11 standard library methods. The mechanism should be easy to use and easy to understand and prove correct. This article presents a solution to this, which is probably not new, but which we still did not find anywhere else.
The concrete challenge at hand
Assume a global data structure on the heap and a single atomic pointer P to it. If (fast) readers access this completely unprotected, then a (slow) writer can create a completely new data structure and then change the pointer to the new structure with an atomic operation. Since writing is not time critical, one can easily use a mutex to ensure that there is only a single writer at any given time. The only problem is to decide, when it is safe to destruct the old value, because the writer cannot easily know that no reader is still accessing the old values. The challenge is aggravated by the fact that without thread synchronization it is unclear, when a reader actually sees the new pointer value, in particular on a multi-core machine with a complex system of caches.
If you want to see our solution directly, scroll down to “Source code links“. We first present a classical good approach and then try to improve on it. (more…)
Throughput Enhancements: Boosting ArangoDB Performance
We’ve recently been working on improving ArangoDB’s throughput, especially when using the ArangoDB’s interface.
In this post, I will show some of the improvements already achieved, though the work is not yet finished. Therefore, the results shown here are still somewhat preliminary.
We wanted to measure improvements for ArangoDB’s HTTP interface, and so we used wrk as an external HTTP load generator.
During the tests, wrk called some specific URLs on a local ArangoDB instance on an otherwise idle machine. The test was run with ArangoDB 2.6 and devel
. The ArangoDB instances were started with their default configuration.
wrk was invoked with varying amounts of client connections and threads, so the tests cover serial and concurrent/parallel requests:
bash invoking wrk
wrk -c $CONNECTIONS -t $THREADS -d 10 $URL
The number of connections ($CONNECTIONS
) and threads ($THREADS
) were both varied from 1 to 8. wrk requires at least as many connections as threads.
ArangoDB 2.6.4: Maintenance Release Overview | ArangoDB Blog
ArangoDB Version 2.6.4 comes with an upgraded V8 engine (4.1.0.27) and is ready to download now. In the 2.5 branch we’ve published a 2.5.7 maintenance release as well.
Running V8 Isolates in Multi-Threaded ArangoDB
ArangoDB allows running user-defined JavaScript code in the database. This can be used for more complex, stored procedures-like database operations. Additionally, ArangoDB’s Foxx framework can be used to make any database functionality available via an HTTP REST API. It’s easy to build data-centric microservices with it, using the scripting functionality for tasks like access control, data validation, sanitation etc.
We often get asked how the scripting functionality is implemented under the hood. Additionally, several people have asked how ArangoDB’s JavaScript functionality relates to node.js.
This post tries to explain that in detail.
Arango Weekly 31: Official Docker Repo & New Release 2.6.3
ArangoDB is now an Official Repo in the Docker Hub, one of just four additions in the last 2 months. Please try and tell your friends! ArangoDB 2.6 is known as a performance release and we’ve continued to improve the core by killing locks and optimizing code. Looks like we can show some impressive performance boosts soon. Furthermore, Mike Williamson wrote a blog post on modeling data with ArangoDB last week, that is worth to read.
Follow ArangoDB on LinkedIn and add ArangoDB as a skill. We would appreciate your help. Keep an eye on our blog or follow us on Twitter for news about ArangoDB.
AQL Object Literal Simplification: ArangoDB Query Optimization
ArangoDB’s devel
branch recently saw a change that makes writing some AQL queries a bit simpler.
The change introduces an optional shorthand notation for object attributes in the style of ES6’s enhanced object literal notation.
For example, consider the following query that groups values by age
attribute and counts the number of documents per distinct age
value:
FOR doc IN collection
COLLECT age = doc.age WITH COUNT INTO length
RETURN { age: age, length: length }
The object declaration in the last line of the query is somewhat redundant because one has to type identical attribute names and values:
RETURN { age: age, length: length }
In this case, the new shorthand notation simplifies the RETURN
to:
RETURN { age, length }
In general, the shorthand notation can be used for all object literals when there is an attribute name that refers to a query variable of the same name.
It can also be mixed with the longer notation, e.g.:
RETURN { age, length, dateCreated: DATE_NOW() }
ArangoDB 2.6.3: Maintenance Release for Stability & Performance
A maintenance release of ArangoDB is available, we have fixed an issue with NULL bytes inside attribute values (#1409) that occurs when fetching a document via REST API.
Release 2.5.6 and 2.6.3 can be downloaded from arangodb.com/download now.
Mastering AQL: Return Distinct Values | ArangoDB Blog
Last week saw the addition of the RETURN DISTINCT
for AQL queries. This is a new shortcut syntax for making result sets unique.
For this purpose it can be used as an easier-to-memorize alternative for the already existing COLLECT
statement. COLLECT
is very flexible and can be used for multiple purposes, but it is syntactic overkill for making a result-set unique.
The new RETURN DISTINCT
syntax makes queries easier to write and understand.
Here’s a non-scientific proof for this claim:
Compare the following queries, which both return each distinct age
attribute value from the collection:
FOR doc IN collection
COLLECT age = doc.age
RETURN age
With RETURN DISTINCT
:
FOR doc IN collection
RETURN DISTINCT doc.age
Clearly, the query using RETURN DISTINCT
is more intuitive, especially for AQL beginners. Apart from that, using RETURN DISTINCT
will save a bit of typing compared to the longer COLLECT
-based query.
Internally both COLLECT
and RETURN DISTINCT
will work by creating an AggregateNode
. The optimizer will try the sorted and the hashed variants for both, so they should perform about the same.
However, the result of a RETURN DISTINCT
does not have any guaranteed order, so the optimizer will not insert a post-SORT
for it. It may do so for a regular COLLECT
.
As mentioned before, COLLECT
is more flexible than RETURN DISTINCT
. Notably, COLLECT
is superior to RETURN DISTINCT
when the result set should be made unique using more than one criterion, e.g.
FOR doc IN collection
COLLECT status = doc.status, age = doc.age,
RETURN { status, age }
This is currently not achievable via RETURN DISTINCT
, as it only works with a single criterion.
ArangoDB Nightly Travis Builds: Continuous Integration Updates
Great news for driver maintainers that want access to the latest developments in ArangoDB. Many of you have asked us if we can provide a nightly build of our ArangoDB database to improve CI test automation using Travis-CI. The Travis builds for ArangoDB 2.6, 2.7 and devel will be generated and published shortly after midnight (GMT).
Arango Weekly 30: New Performance Results & O’Reilly Article
Maybe you’ve noticed that there was no ArangoDB newsletter last week. So here’s the news of the last two weeks and the announcement that during the summer our NL will be biweekly. 🙂
In the meantime we’ve improved the performance of the shortest path implementation significantly and rerun the Multi-Model performance tests. The article Data modeling with multi-model databases – a use case for multi-model databases – was a huge success on O’Reilly Radar last week, it had the most page views of all Radar articles. It’s worth to read.
Finally, Mesosphere launched it’s SDK and developer program and we are proud to be one of the first partners that integrate into DCOS. Stay tuned, there will be more to come.
Keep an eye on our blog or follow us on Twitter for news about ArangoDB.
Get the latest tutorials,
blog posts and news:
Thanks for subscribing! Please check your email for further instructions.