Building a Mini Database Cluster for Fun – LEGO Edition

Mini PC Database Cluster for Fun Lego

ArangoDB is a native multi-model database that could be deployed as a single database, in active failover mode or as a full blown database cluster in the cloud. To try things out I can even run a cluster on my local development machine.

Well, yes… but I wanted to be more real, 24×7, with the opportunity to manipulate all the things…

Inspired by other Raspberry PI & mini-cluster projects, I thought I could build my own, bare metal, desk compatible Mini PC database cluster.Interested in trying out ArangoDB? Fire up your cluster in just a few clicks with ArangoDB ArangoGraph: the Cloud Service for ArangoDB. Start your free 14-day trial here

Read more
More info...

ArangoDB 3.4: Enhancements in RocksDB Storage Engine

With ArangoDB 3.4 we finally made the RocksDB storage engine the default. This decision was made after a year of constant improvements to the engine to make it suitable for all our customer’s use cases. Read more

More info...

Sharding: freedom, when you need it least?

“I must have a prodigious amount of mind;
it takes me as much as a week, sometimes, to make it up!”
― Mark Twain

How many shards should one choose, when creating collections in ArangoDB clusters?

TLDR: Don’t be too shy with sharding your data in many shards across your cluster. Be mindful however that AQL-heavy applications might not profit as much from heavy distribution. Read more

More info...

Run multiple versions of ArangoDB in parallel using the .tar.gz distribution

This post uses the new `.tar.gz` binary distribution of ArangoDB to run multiple versions of ArangoDB alongside each other on the same machines. We will do a production-ready deployment on 3 cloud instances with authentication, TLS encryption, (self-signed) certificates and `systemd` service. In the end, we show how to perform a rolling upgrade for one of the clusters to a new version.

Interested in trying out ArangoDB? Fire up your cluster in just a few clicks with ArangoDB ArangoGraph: the Cloud Service for ArangoDB. Start your free 14-day trial here
Read more

More info...

Using The Linux Kernel and Cgroups to Simulate Starvation

When using a database like ArangoDB it is also important to explore how it behaves once it reaches system bottlenecks, or which KPIs (Key Performance Indicators) it can achieve in your benchmarks under certain limitations. One can achieve this by torturing the system by effectively saturating the resources using random processes.

This however will drown your system effectively - it may hinder you from capturing statistics, do debugging, and all other sorts of things you're used to from a normally running system. The more clever way is to tell your system to limit the available resources for processes belonging to a certain cgroup.

So we will put an ArangoDB server process (arangod) into a cgroup, the rest of your system won't be in.

LinuxCgroups - What’s That?

Definition from Wikipedia:

cgroups (abbreviated from control groups) is a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes.

Cgroups were introduced in 2006 and their first real usage example was that you were able to compile a Linux kernel with many parallel compilation processes without sacrificing the snappiness of the user interface – continue browsing, emailing etc. while your sysem compiles with all available resources.

Cgroups are available wherever you run a recent Linux kernel, including Docker Machine on Mac and Windows if you have root access to the host VM.

I/O Saturation

A basic resource you can run out of is disk I/O. The available bandwidth to your storage can be defined by several bottlenecks:

  • the bus your storage is connected to - SATA, FC-AL, or even a VM where the hypervisor controls your available bandwidth
  • the physical medium, be it spinning disk, SSD, or be it abstracted away from you by a VM

In a cooperative cloud environment you may find completely different behavior compared to bare metal infrastructure which is not virtualized or shared. The available bandwidth is shared between you and other users of this cloud. For example, AWS has a system of Burst Credits where you are allowed to have a certain amount of high speed I/O operations. However, once these credits dry up, your system comes to a grinding hold.

I/O Throttling via Cgroups

Since it may be hard to reach the physical limitations of the SUT, and – as we already discussed – other odd behavior may occur when loading the machine hard to its limits, simply lowering the limit for the processes in question is a good thing.

To access these cgroups you most likely need to have root access to your system. Either login as root for the following commands, or use sudo.

Linux cgroups may limit I/O bandwidth per physical device in total (not partitions), and then split that further for individual processes. So the easiest way ahead is to add a second storage device to be used for the ArangoDB database files.

At first you need to configure the bandwidth of the "physical" device; search its major and minor node ID by listing its device file:

ls -l /dev/sdc
brw-rw---- 1 root disk 8, 32 Apr 18 11:16 /dev/sdc

(We picked the third disk here; your names may be different. Check the output of mount to find out.)

We now mount a partition from sdc so we can access it with arangod:

/dev/sdc1 on /limitedio type ext4 (rw,relatime)

Now we alter the /etc/arangodb3/arangod.conf so it will create its database directory on this disk:

[database]
directory = /limitedio/db/

Here we pick the major number (8) and minor number (32) from the physical device file:


echo "8:32  1073741824" > /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device
echo "8:32  1073741824" > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device

This permits a full gigabyte per second for the complete device.

We now can sub-license I/O quota for sdc into a CGroup we name limit1M which will get 1 MB/s:


mkdir -p /sys/fs/cgroup/blkio/limit1M/
echo "8:32  1048576" > /sys/fs/cgroup/blkio/limit1M/blkio.throttle.write_bps_device 
echo "8:32  1048576" > /sys/fs/cgroup/blkio/limit1M/blkio.throttle.read_bps_device 

We want to jail one arangod process into the limit1M cgroup, we inspect its welcome message for its PID:

2019-01-10T18:00:00Z [13716] INFO ArangoDB (version 3.4.2 [linux]) is ready for business. Have fun!

We add this process with the PID 13716 to the cgroup limit1M by invoking:

echo 13716 > /sys/fs/cgroup/blkio/limit1M/tasks

Now this arangod process will be permitted to read and write with 1 MB/s to any partition on sdc. You may want to compare the throughput you get using i.e. arangobench or arangoimport.

Real numbers

Depending on pricing and scaling cloud providers give you varying limits in throughput. It appears the worst case is Google at 3 MB/s (as of this posting).

So you may use your notebook with a high-end M2-SSD, and get an estimate whether certain cloud instances may handle the load of your application.

More info...

Happy Holidays from ArangoDB!

2018 has been a fantastic year for the ArangoDB project. The community has welcomed many new members, customers, supporters and friends. Together we’ve reached new “heights” – accomplished goals, shipped a big brand-new release and improved ArangoDB on all fronts. Read more

More info...

Deploying ArangoDB 3.4 on Kubernetes

It has been a few months since we first released the Kubernetes operator for ArangoDB and started to brag about it. Since then, quite a few things have happened.

For example, we have done a lot of testing, fixed bugs, and by now the operator is declared to be production ready for three popular public Kubernetes offerings, namely Google Kubernetes Engine (GKE), Amazon Elastic Kubernetes Service (EKS) and Pivotal Kubernetes Service (PKS) (see here for the current state of affairs). Read more

More info...

ArangoDB 3.4 GA
Full-text Search, GeoJSON, Streaming & More

The ability to see your data from various perspectives is the idea of a multi-model database. Having the freedom to combine these perspectives into a single query is the idea behind native multi-model in ArangoDB. Extending this freedom is the main thought behind the release of ArangoDB 3.4.

We’re always excited to put a new version of ArangoDB out there, but this time it’s something special. This new release includes two huge features: a C++ based full-text search and ranking engine called ArangoSearch; and largely extended capabilities for geospatial queries by integrating Google™ S2 Geometry Library and GeoJSON.  Read more

More info...

RC1 ArangoDB 3.4 – What’s new?

For ArangoDB 3.4 we already added 100,000 lines of code, happily deleted 50,000 lines and changed over 13,000 files until today. We merged countless PRs, invested months of problem solving, hacking, testing, hacking and testing again and are super excited to share the feature complete RC1 of ArangoDB 3.4 with you today. Read more

More info...

Gartner Report: Top-Rated Operational Database Management Systems

Firstly, a huge thank you to all our customers that took the time to review ArangoDB for the Gartner Peer Insights “Voice of the Customer”: Operational Database Management Systems Market report. Without your help and assistance, the continued improvements and enhancements we make to our software wouldn’t be possible. Read more

More info...

Get the latest tutorials,
blog posts and news: