Setting up Datacenter to Datacenter Replication in ArangoDB

October 12 2017,/Architecture, cluster, General, how to, Releases, Replication

Please note that this tutorial is valid for the ArangoDB 3.3 milestone 1 version of DC to DC replication!

Interested in trying out ArangoDB? Fire up your cluster in just a few clicks with ArangoDB ArangoGraph: the Cloud Service for ArangoDB. Start your free 14-day trial here

This milestone release contains data-center to data-center replication as an enterprise feature. This is a preview of the upcoming 3.3 release and is not considered production-ready.

In order to prepare for a major disaster, you can setup a backup data center that will take over operations if the primary data center goes down. For a server failure, the resilience features of ArangoDB can be used. Data center to data center is used to handle the failure of a complete data center.

Data is transported between data-centers using a message queue. The current implementation uses Apache Kafka as message queue. Apache Kafka is a commonly used open source message queue which is capable of handling multiple data-centers. However, the ArangoDB replication is not tied to Apache Kafka. We plan to support different message queues systems in the future.

The following contains a high-level description how to setup data-center to data-center replication. Detailed instructions for specific operating systems will follow shortly.

The components involved are:

The main components are:

Apache Kafka
Mirror Maker
ArangoDB Sync
ArangoDB Cluster

Installation

Kafka

This is the main data transport channel between the datacenters. Please follow the instruction on https://kafka.apache.org/ and https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330 to setup Apache Kafka in both data-centers and connect these two using Mirror Maker.

It operates on port `9092` and all data centers must be able to connect to all brokers of the other data centers on this port.

Furthermore, the brokers need to contact each other internally in each datacenter.

Besides brokers, a `mirror maker` process is used to transport Kafka messages between different data centers. The `mirror maker` processes need access to all Kafka brokers in both data centers (port `9092`).

ArangoDB Cluster

Install the enterprise release on your system using the normal install procedure. Stop and disable the default single server using the appropriate linux command (for example, `systemctl disable arangodb3.service` on RedHat).

ArangoDB Sync

The two components of ArangoDB Sync are part of the enterprise package.

Syncmaster(s)

This is the main control component. They will handle the administrative work between the datacenters.

They will use tcp port `8629` and all data centers need to allow incoming traffic from the other datacenters as well as outgoing traffic to the other datacenter. Inside a data center there will be multiple masters (typically 3) (to make it fail safe). However direct communication between these is not required and can be turned off.

Instead they will coordinate themselves using the normal ArangoDB agency. That means they have to reach all possible agents in their datacenter on port `8531` and all possible coordinators on port `8529`.

The Syncmasters will coordinate the work and delegate it to workers. These workers listen on port `8729` and need to be reachable from the Syncmasters.

Furthermore, they have to contact the datacenter local Kafka brokers on port `9092`. They do not need to contact the brokers of the other datacenter directly.

Syncworker

These will execute the actual synchronization work.

They listen on port `8729` and must be able to talk to all coordinators & DBServers in the cluster (ports `8529` & `8530`).

They also have to be able to contact the Syncmasters of their datacenter directly on port `8629`.

Furthermore, they have to contact the datacenter local Kafka brokers on port `9092`. They do not need to contact the brokers of the other datacenter directly.

Preparation

Create Certificates

The synchronization system needs 2 CA certificates, which must be
shared on all machines.

To create them (on 1 machine only), run (in folder containing `Makefile`) as root:

arangosync create client-auth ca --cert=/etc/arango/certificates/client-auth-ca.crt --key=/etc/arango/certificates/client-auth-ca.key
 
       arangosync create tls ca --cert=/etc/arango/certificates/tls-ca.crt --key=/etc/arango/certificates/tls-ca.key

Then distribute the generated files to all machines in all data centers (in `/etc/arango/certificates`).

Start the ArangoDB Clusters

Create a ArangoDB cluster in each data-center. For example, follow the instruction given in https://www.arangodb.com/2016/12/starting-arangodb-cluster-easy-way/ to start a cluster using the ArangoDB starter. The cluster must use RocksDB and must run on the standard ports, i. e. Coordinators on port 8529, DBservers on port 8530 and agents on port 8531. Therefore it is important that the default single server instances are stopped.

Configuration Files

Create `/etc/arangodb.env`. This file contains environment variable with the following semantics (values are examples).

# Name of this cluster:
CLUSTERNAME=dc1
# Name of other cluster (for mirror maker):
OTHERCLUSTERNAME=dc2
# Addresses of all starters in the cluster. (see `--starter.join` option in starter):
STARTERENDPOINTS=osync1:8528,osync2:8528,osync3:8528
# Endpoints of the ArangoDB cluster coordinators:
CLUSTERENDPOINTS=http://osync1:8529,http://osync2:8529,http://osync3:8529
# JWT secret used to communicate from sync workers to sync master:
MASTERSECRET=foo_o
# Directory containing certificates (when changing here, make sure to change them in Makefile also):
CERTIFICATEDIR=/etc/arango/certificates 
# JWT secret used to communicate to ArangoDB servers:
CLUSTERSECRET=cluster_o
# Full file path containing ${CLUSTERSECRET}:
CLUSTERSECRETPATH=/etc/arango/jwtsecret
# Port of sync master. (make sure to keep this the same as the port in MASTERENDPOINT):
MASTERPORT=8629
# Endpoints of the sync masters given to the sync workers:
MASTERENDPOINTS=https://osync1:8629,https://osync2:8629,https://osync3:8629
# Host name or (internal) IP of first zookeeper host:
ZOOKEEPERHOST1=osync1
# Host name or (internal) IP of second zookeeper host:
ZOOKEEPERHOST2=osync2
# Host name or (internal) IP of third zookeeper host:
ZOOKEEPERHOST3=osync3
# Ports that zookeeper peers use to run their protocol:
ZOOKEEPERPORTS=2888:3888
# Zookeeper client port:
ZOOKEEPERCLIENTPORT=2181
# Addresses of all Kafka brokers in the cluster (see `--mq.kafka-addr` option in arangosync)
KAFKAENDPOINTS=osync1:9092,osync2:9092,osync3:9092
# Port used by the Kafka brokers:
KAFKABROKERPORT=9092
# Kafka endpoints for the other data center (for mirror maker):
KAFKAREMOTEENDPOINTS=async1:9092,async2:9092,async3:9092

Create `/etc/arangodb.env.local` with the following content:


# Host name for this machine
HOST=
# Host identifier, for zookeeper, these must be positive integers 1-n
HOSTID=
# Public IP (reachable from the other DC, important for syncmaster)
PUBLICIP=
# Private IP (used within this DC, can be same as above)
PRIVATEIP=
# Advertised ports in Kafka, which need to be reachable from the other DC
KAFKAADVERTISEDPORT=

Start the Syncmaster(s)

For systemd based system use

[Unit]

Description=Run ArangoSync in master mode After=network.target

[Service]


Restart=on-failure
EnvironmentFile=/etc/arangodb.env 
EnvironmentFile=/etc/arangodb.env.local
ExecStartPre=/usr/bin/sh -c "mkdir -p ${CERTIFICATEDIR}"
ExecStartPre=/usr/sbin/arangosync create tls keyfile \
    --cacert=${CERTIFICATEDIR}/tls-ca.crt \
    --cakey=${CERTIFICATEDIR}/tls-ca.key \
    --keyfile=${CERTIFICATEDIR}/tls.keyfile \
    --host=${PUBLICIP} \
    --host=${PRIVATEIP} \
    --host=${HOST}
ExecStart=/usr/sbin/arangosync run master \
    --log.level=debug \
    --cluster.endpoint=${CLUSTERENDPOINTS} \
    --cluster.jwtSecret=${CLUSTERSECRET} \
    --server.keyfile=${CERTIFICATEDIR}/tls.keyfile \
    --server.client-cafile=${CERTIFICATEDIR}/client-auth-ca.crt \
    --server.endpoint=https://${PUBLICIP}:${MASTERPORT} \
    --server.port=${MASTERPORT} \
    --master.jwtSecret=${MASTERSECRET} \
    --mq.type=kafka \
    --mq.kafka-addr=${KAFKAENDPOINTS} \
    --mq.transport-topic=${CLUSTERNAME}-1 \
    --mq.transport-topic=${CLUSTERNAME}-2 \
    --mq.transport-topic=${CLUSTERNAME}-3
TimeoutStopSec=60

[Install]

WantedBy=multi-user.target

to start the Syncmaster on at least one server.

Start the Syncworkers

Again for systemd based system use

[Unit]

Description=Run ArangoSync in worker mode 
After=network.target

[Service]

Restart=on-failure
EnvironmentFile=/etc/arangodb.env 
EnvironmentFile=/etc/arangodb.env.local
Environment=PORT=8729
ExecStart=/usr/sbin/arangosync run worker \
    --log.level=debug \
    --server.endpoint=https://${PRIVATEIP}:${PORT} \
    --master.endpoint=${MASTERENDPOINTS} \
    --master.jwtSecret=${MASTERSECRET}
TimeoutStopSec=60

[Install]

WantedBy=multi-user.target

to that the Syncworkers on the servers.

Starting the Syncronization

Once you’ve completed the setup of the 2 data centers using the above
instructions, you have to connect them to each other.

(Note: replace all IP addresses, DNS names, usernames & passwords with appropriate ones)

To initiate the sync process you first have to create a client certificate using our cluster ca that we created earlier:

/usr/sbin/arangosync create client-auth keyfile \
  --cacert=/etc/arango/certificates/client-auth-ca.crt \
  --cakey=/etc/arango/certificates/client-auth-ca.key \
  --keyfile=/etc/arango/certificates/client.key \
  --host=$IP --host=$HOST

This certificate can be used to authenticate against the sync master in
this datacenter. Note that this is for example needed by the sync master
in the other datacenter (see below).

Please make sure that at this point the firewalls are configured such
that the following connections are available:

The sync masters in the two DCs have to be able to reach each other.
The mirror makers have to be able to reach all Kafka brokers in the other data center.

We now want to set up synchronization from datacenter A to datacenter B. To this end we contact the sync master in datacenter B to tell it to start synchronization from datacenter A. The following command can be executed anywhere where one can reach the sync master in datacenter B (options `–master.endpoint`), note that we give all three instances, because we do not know who is currently in charge. The endpoints given in the `–source.endpoint` options are the endpoints of the sync masters in datacenter A. Authentication of the CLI tool with the sync master in datacenter B is via user name and password:

/usr/sbin/arangosync configure sync \
  --master.endpoint=https://54.245.21.18:8629,https://54.201.250.3:8629,https://54.213.236.136:8629 \
  --master.keyfile=/etc/arango/certificates/client.key \
  --source.endpoint https://vpna.arangodb.biz:8629,https://vpna.arangodb.biz:8630,https://vpna.arangodb.biz:8631 \
  --source.cacert=/etc/arango/certificates/tls-ca.crt \
  --auth.user=root --auth.password=

During this initial sync activation the generated client certificate
will be sent to the master of datacenter B, who in turn uses it to
authenticate itself with the sync master of datacenter A.

After that you should be able to get the status:

/usr/sbin/arangosync get status \
  --master.endpoint=https://54.245.21.18:8629,https://54.201.250.3:8629,https://54.213.236.136:8629 \
  --auth.user=root --auth.password=

Finally, if you want to stop synchronization, use:

/usr/sbin/arangosync stop sync

Frank Celler

Frank is both entrepreneur and backend developer, developing mostly memory databases for two decades. He is the CTO and co-founder of ArangoDB. Try to challenge Frank asking him questions on C, C++ and MRuby. Besides Frank organizes Cologne’s NoSQL group & is an active member of NoSQL community.

October 12 2017,Frank Celler

Fireside Chat – Powering GenAI: The Critical Foundations for Scale. Watch Now