ArangoDB 3.3: DC2DC Replication, Encrypted Backup
Just in time for the holidays we have a nice present for you all – ArangoDB 3.3. This release focuses on replication, resilience, stability and thus on general readiness for your production small and very large use cases. There are improvements for the community as well as for the Enterprise Edition. We sincerely hope to have found the right balance between them.
In the Community Edition there are:
- Easier server-level replication
- A resilient active/passive mode for single server instances with automatic failover
- RocksDB throttling for increased guaranteed write performance
- Faster collection and shard creation in the cluster
- Lots of bug fixes (most of them have been backported to 3.2)
In the Enterprise Edition there are:
- Datacenter to datacenter replication for clusters
- Encrypted backup and restore
That is, this is all about improving replication and resilience. For us, the two new exciting features are datacenter to datacenter replication and the resilient active-passive mode for single-servers.
Datacenter to datacenter replication
Every company needs a disaster recovery plan for all important systems. This is true from small units like single processes running in some container to the largest distributed architectures. For databases in particular this usually involves a mixture of fault-tolerance, redundancy, regular backups and emergency plans. The larger a data store, the more difficult it is to come up with a good strategy.
Therefore, it is desirable to be able to run a distributed database in one datacenter and replicate all transactions to another datacenter in some way. Often, transaction logs are shipped over the network to replicate everything in another, identical system in the other datacenter. Some distributed data stores have built-in support for multiple datacenter awareness and can replicate between datacenters in a fully automatic fashion.
ArangoDB 3.3 takes an evolutionary step forward by introducing multi-datacenter support, which is asynchronous datacenter to datacenter replication. Our solution is asynchronous and scales to arbitrary cluster sizes, provided your network link between the datacenters has enough bandwidth. It is fault-tolerant without a single point of failure and includes a lot of metrics for monitoring in a production scenario.
Read more on the Datacenter to Datacenter Replication and follow generic installation instructions.
This is a feature available only in the Enterprise Edition.
Server-level replication
We have had asynchronous replication functionality in ArangoDB since release 1.4. But setting it up was admittedly too hard. One major design flaw in the existing asynchronous replication was that the replication is for a single database only.
Replicating from a leader server that has multiple databases required manual fiddling on the follower for each individual database to replicate. When a new database was created on the leader, one needed to take action on the follower to ensure that data for that database got actually replicated. Replication on the follower also was not aware of when a database was dropped on the leader.
This is now fixed in 3.3. In order to set up replication on a 3.3 follower for all databases of a given 3.3 leader, there is now the so-called `globalApplier`. It has the same interface as the existing `applier`, but it will replicate from all database on the leader and not just a single one.
As a consequence, server-global replication can now be set up permanently with a single JavaScript command or API call.
A resilient active/passive mode for single server instances with automatic failover
While it was always possible to set up two servers and connect them via asynchronous replication, the replication setup was not straightforward (see above), and it also did not handle automatic failover. In case of the leader having died, one needed to have some machinery in place to stop replication on the follower and make it the leader. ArangoDB did not provide this machinery, and left it to client applications to solve the failover problem.
With 3.3, this has become much easier. There is now a mode to start two arangod
instances as a pair of connected servers with automatic failover.
The two servers are connected via asynchronous replication. One of the servers is the elected leader, and the other one is made a follower automatically. At startup, the two
servers race for leadership. The follower will automatically start replication from the leader for all databases, using the server-global replication (see above).
When the leader goes down, this is automatically detected by an agency instance, which
is also started in this mode. This instance will make the previous follower stop its replication and make it the new leader.
The follower will automatically deny all read and write requests from client applications. Only the replication is allowed to access the follower’s data until the follower becomes a new leader.
The arangodb starter does support starting two servers with asynchronous replication and failover out of the box, making the setup even easier.
The arangojs driver for JavaScript, GO, PHP Java drivers for ArangoDB are also in the making to support automatic failover in case the currently used server endpoint responds with HTTP 503. Read more details on the Java driver.
Encrypted backup
This feature allows to create an encrypted backup using arangodump
. We use AES256 for the encryption. The encryption key can be read from a file or from a generator program. It works in single server and cluster mode. Together with the encryption at rest this allows to keep all your sensible data encrypted whenever it is not in memory.
Here is an example for encrypted backup:
arangodump --collection "secret" dump --encryption.keyfile ~/SECRET-KEY
As you can see, in order to create an encrypted backup, simply add the --encryption.keyfile
option when invoking arangodump
. Needless to say, restore is equally easy using arangorestore
.
The key must be exactly 32 bytes long (this is a requirement of the AES block cipher we are using). For details see the documentation in the manual.
Note that encrypted backups can be used together with the already existing RocksDB encryption-at-rest feature, but they can also be used for the MMFiles engine, which does not have encryption-at-rest.
This is a feature available only in the Enterprise Edition.
RocksDB throttling
While throttling may sound bad at first, the RocksDB throttling is there for a good reason. It throttles write operations to RocksDB in the RocksDB storage engine, in order to prevent total stalls. The throttling is adaptive, meaning that it automatically adapts to the write rate. Read more about RocksDB throttling.
Faster shard creation in cluster
Creating collections is what all ArangoDB users do. It’s one of the first steps carried out. So it should be as quick as possible.
When using the cluster, users normally want resilience, so replicationFactor
is set to at least 2
. The number of shards is often set to pretty high values (collections with 100 shards).
Internally this will first store the collection metadata in the agency, and then the assigned shard leaders will pick up the change and will begin creating the shards. When the shards are set up on the leader, the replication is kicked off, so every data modification will not only become effective on the leader, but also on the followers. This process has got some shortcuts for the initial creation of shards in 3.3.
Conclusion
The entire ArangoDB team is proud to release version 3.3 of ArangoDB just in time for the holidays! We hope you will enjoy the upgrade. We invite you to take ArangoDB 3.3 for a spin and to let us know what you think via our Community Slack channel or hacker@arangodb.com. We look forward to your feedback!
Get the latest tutorials, blog posts and news: