Using The Linux Kernel and Cgroups to Simulate Starvation
When using a database like ArangoDB it is also important to explore how it behaves once it reaches system bottlenecks, or which KPIs (Key Performance Indicators) it can achieve in your benchmarks under certain limitations. One can achieve this by torturing the system by effectively saturating the resources using random processes.
This however will drown your system effectively – it may hinder you from capturing statistics, do debugging, and all other sorts of things you’re used to from a normally running system. The more clever way is to tell your system to limit the available resources for processes belonging to a certain cgroup.
So we will put an ArangoDB server process (
arangod) into a cgroup, the rest of your system won’t be in.
Cgroups – What’s That?
Definition from Wikipedia:
cgroups (abbreviated from control groups) is a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes.
Cgroups were introduced in 2006 and their first real usage example was that you were able to compile a Linux kernel with many parallel compilation processes without sacrificing the snappiness of the user interface – continue browsing, emailing etc. while your sysem compiles with all available resources.
Cgroups are available wherever you run a recent Linux kernel, including Docker Machine on Mac and Windows if you have root access to the host VM.
A basic resource you can run out of is disk I/O. The available bandwidth to your storage can be defined by several bottlenecks:
- the bus your storage is connected to – SATA, FC-AL, or even a VM where the hypervisor controls your available bandwidth
- the physical medium, be it spinning disk, SSD, or be it abstracted away from you by a VM
In a cooperative cloud environment you may find completely different behavior compared to bare metal infrastructure which is not virtualized or shared. The available bandwidth is shared between you and other users of this cloud. For example, AWS has a system of Burst Credits where you are allowed to have a certain amount of high speed I/O operations. However, once these credits dry up, your system comes to a grinding hold.
I/O Throttling via Cgroups
Since it may be hard to reach the physical limitations of the SUT, and – as we already discussed – other odd behavior may occur when loading the machine hard to its limits, simply lowering the limit for the processes in question is a good thing.
To access these cgroups you most likely need to have
root access to your system. Either login as root for the following commands, or use
Linux cgroups may limit I/O bandwidth per physical device in total (not partitions), and then split that further for individual processes. So the easiest way ahead is to add a second storage device to be used for the ArangoDB database files.
At first you need to configure the bandwidth of the “physical” device; search its major and minor node ID by listing its device file:
ls -l /dev/sdc brw-rw---- 1 root disk 8, 32 Apr 18 11:16 /dev/sdc
(We picked the third disk here; your names may be different. Check the output of
mount to find out.)
We now mount a partition from sdc so we can access it with
/dev/sdc1 on /limitedio type ext4 (rw,relatime)
Now we alter the
/etc/arangodb3/arangod.conf so it will create its database directory on this disk:
[database] directory = /limitedio/db/
Here we pick the
major number (8) and
minor number (32) from the physical device file:
echo "8:32 1073741824" > /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device echo "8:32 1073741824" > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device
This permits a full gigabyte per second for the complete device.
We now can sub-license I/O quota for sdc into a CGroup we name
limit1M which will get 1 MB/s:
mkdir -p /sys/fs/cgroup/blkio/limit1M/ echo "8:32 1048576" > /sys/fs/cgroup/blkio/limit1M/blkio.throttle.write_bps_device echo "8:32 1048576" > /sys/fs/cgroup/blkio/limit1M/blkio.throttle.read_bps_device
We want to jail one
arangod process into the
limit1M cgroup, we inspect its welcome message for its PID:
2019-01-10T18:00:00Z  INFO ArangoDB (version 3.4.2 [linux]) is ready for business. Have fun!
We add this process with the PID
13716 to the cgroup
limit1M by invoking:
echo 13716 > /sys/fs/cgroup/blkio/limit1M/tasks
arangod process will be permitted to read and write with 1 MB/s to any partition on sdc. You may want to compare the throughput you get using i.e.
Depending on pricing and scaling cloud providers give you varying limits in throughput. It appears the worst case is Google at 3 MB/s (as of this posting).
So you may use your notebook with a high-end M2-SSD, and get an estimate whether certain cloud instances may handle the load of your application.
Get the latest tutorials, blog posts and news: