Using The Linux Kernel and Cgroups to Simulate Starvation
When using a database like ArangoDB it is also important to explore how it behaves once it reaches system bottlenecks, or which KPIs (Key Performance Indicators) it can achieve in your benchmarks under certain limitations. One can achieve this by torturing the system by effectively saturating the resources using random processes.
This however will drown your system effectively – it may hinder you from capturing statistics, do debugging, and all other sorts of things you’re used to from a normally running system. The more clever way is to tell your system to limit the available resources for processes belonging to a certain cgroup.
So we will put an ArangoDB server process (arangod
) into a cgroup, the rest of your system won’t be in.
Cgroups – What’s That?
Definition from Wikipedia:
cgroups (abbreviated from control groups) is a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes.
Cgroups were introduced in 2006 and their first real usage example was that you were able to compile a Linux kernel with many parallel compilation processes without sacrificing the snappiness of the user interface – continue browsing, emailing etc. while your sysem compiles with all available resources.
Cgroups are available wherever you run a recent Linux kernel, including Docker Machine on Mac and Windows if you have root access to the host VM.
I/O Saturation
A basic resource you can run out of is disk I/O. The available bandwidth to your storage can be defined by several bottlenecks:
- the bus your storage is connected to – SATA, FC-AL, or even a VM where the hypervisor controls your available bandwidth
- the physical medium, be it spinning disk, SSD, or be it abstracted away from you by a VM
In a cooperative cloud environment you may find completely different behavior compared to bare metal infrastructure which is not virtualized or shared. The available bandwidth is shared between you and other users of this cloud. For example, AWS has a system of Burst Credits where you are allowed to have a certain amount of high speed I/O operations. However, once these credits dry up, your system comes to a grinding hold.
I/O Throttling via Cgroups
Since it may be hard to reach the physical limitations of the SUT, and – as we already discussed – other odd behavior may occur when loading the machine hard to its limits, simply lowering the limit for the processes in question is a good thing.
To access these cgroups you most likely need to have root
access to your system. Either login as root for the following commands, or use sudo
.
Linux cgroups may limit I/O bandwidth per physical device in total (not partitions), and then split that further for individual processes. So the easiest way ahead is to add a second storage device to be used for the ArangoDB database files.
At first you need to configure the bandwidth of the “physical” device; search its major and minor node ID by listing its device file:
ls -l /dev/sdc
brw-rw---- 1 root disk 8, 32 Apr 18 11:16 /dev/sdc
(We picked the third disk here; your names may be different. Check the output of mount
to find out.)
We now mount a partition from sdc so we can access it with arangod
:
/dev/sdc1 on /limitedio type ext4 (rw,relatime)
Now we alter the /etc/arangodb3/arangod.conf
so it will create its database directory on this disk:
[database]
directory = /limitedio/db/
Here we pick the major
number (8) and minor
number (32) from the physical device file:
echo "8:32 1073741824" > /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device
echo "8:32 1073741824" > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device
This permits a full gigabyte per second for the complete device.
We now can sub-license I/O quota for sdc into a CGroup we name limit1M
which will get 1 MB/s:
mkdir -p /sys/fs/cgroup/blkio/limit1M/
echo "8:32 1048576" > /sys/fs/cgroup/blkio/limit1M/blkio.throttle.write_bps_device
echo "8:32 1048576" > /sys/fs/cgroup/blkio/limit1M/blkio.throttle.read_bps_device
We want to jail one arangod
process into the limit1M
cgroup, we inspect its welcome message for its PID:
2019-01-10T18:00:00Z [13716] INFO ArangoDB (version 3.4.2 [linux]) is ready for business. Have fun!
We add this process with the PID 13716
to the cgroup limit1M
by invoking:
echo 13716 > /sys/fs/cgroup/blkio/limit1M/tasks
Now this arangod
process will be permitted to read and write with 1 MB/s to any partition on sdc. You may want to compare the throughput you get using i.e. arangobench
or arangoimport
.
Real numbers
Depending on pricing and scaling cloud providers give you varying limits in throughput. It appears the worst case is Google at 3 MB/s (as of this posting).
- https://cloud.google.com/compute/docs/disks/performance
- https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
So you may use your notebook with a high-end M2-SSD, and get an estimate whether certain cloud instances may handle the load of your application.
Get the latest tutorials, blog posts and news: