Static binaries for a C++ application
ArangoDB is a multi-model database written in C++. It is a sizable application with an executable size of 38MB (stripped) and quite some library dependencies. We provide binary packages for Linux, Windows and MacOS, and for Linux we cover all major distributions and their different versions, which makes our build and delivery pipeline extremely cluttered and awkward. At the beginning of this story, we needed approximately 12 hours just to build and publish a release, if everything goes well. This is the beginning of a tale to attack this problem.
Motivation and rationale
Motivated by what we see in the world of Go where one can easily produce completely static binaries with no external dependencies that run on any Linux variant out there, we were asking ourselves whether a similar feat could be pulled off for a C++ program as well. The benefits of such a universal Linux executable are manifold:
- the same executable runs on any variant of Linux
- there are no external dependencies on libraries
- we only build the executable once and can wrap the same for multiple binary packages, which greatly speeds up the build and publication process
- the build environment can be confined to a single Docker image
- our customer support can much more easily reproduce issues because there are less variations in deployments
- core files can be analyzed across Linux variants
- we can provide smaller and more secure Docker images in the end
- the whole process is more robust and breaks less often because we depend on fewer distributions making no breaking changes
There are a few disadvantages, which we should not fail to disclose:
- security upgrades in any library we use are not applied automatically, but we have to wrap a new release
- we might use fewer different compilers for testing and warnings
- binaries are slightly larger
- processes running our executable cannot share physical RAM for library code with other processes
If you do not care about a discussion of these arguments, you can just jump to the next section to read about the technical details.
I do not want to spend much time on discussing the benefits, because I find them quite compelling without further comment. So I just give the list of Linux distribution versions we currently support: Docker image, Centos6 (=RedHat6), Centos7 (=RedHat7), Fedora25, Debian8, Debian9, Ubuntu17.04 (=Ubuntu17.10), Ubuntu16.04, Ubuntu14.04, Ubuntu12.04, OpenSuse13.2. For all of them we build two binary packages (Community and Enterprise edition). This is 22 times! And imagine the benefits for our support team, who know from “Version 3.3.3” *exactly* which binary is running, regardless of Linux distribution. Note that we only supply packages for the `x86_64` (or `amd64`) architecture.
Anyway, a few words about the above mentioned disadvantages are in order, and why I think these are insignificant for us in comparison to the benefits. Argument 1: Rollout of security updates in any library. Yes, if there is an important security update of a library we are using, then we have to act (for example `libssl`), build a new version and release it. However, we release patch level upgrades much more frequently (approximately twice per month) than the libraries we use. Furthermore, if someone upgrades a library, it is not guaranteed that they restart all processes using them! In particular a database server might remain running for a long time. So it is actually beneficial when we release an update and the automatic upgrade procedure catches it and restarts the database server with the new version. And releasing a new update is just getting much less painful and faster due to the static binaries. This covers argument 1.
Argument 2: Compiler testing. If all developers would always compile their test binaries statically and all with the same Docker image and C++ compiler, then argument 2. would actually be an argument. It is beneficial to compile and test on a wide range of different compilers and versions to spot problems early. But the developers can continue to do so. I think it is a good idea to use a consistent build environment for all CI tests and the released packages, and leave it to the developers to try different compiler versions.
Argument 3: Binaries are slightly larger. It actually turns out to be nearly irrelevant. Our main database server executable is some 38MB – when linked against shared libraries. The static one is larger by less than one MB, which is negligible. For the smaller executables the difference is more prominent, some smaller tools use 4MB with shared libraries and 6MB when statically linked. I think what helps here is that `libmusl` is generally smaller and that the rest of our code (due to RocksDB and V8) is much larger than the external libraries. You might argue now that we should link against shared libraries for RocksDB and V8, but this is near impossible due to the frequent changes in the API, at least for V8. It is much more robust to control the exact version we bundle. And anyway, who cares about a few MB in executable size these days, we are messing around with multi-hundred MB Docker images!
This brings me to Argument 4: processes running our executable cannot share physical RAM for library code with other processes, which is not very strong here, either. Since the shared libraries constitute only a small part of our executable size anyway, the RAM benefits from sharing pages with other processes are pretty slim. In particular for a database server process, which regularly uses hundreds of megabytes if not gigabytes of RAM, those few MBs of shared libraries do not actually play any significant role.
Obviously, depending on the type of program you want to deploy, your mileage may vary, but for us as a database manufacturer the case is pretty clear. A final argument might be that we will never make it into one of the prominent Linux distributions like Debian with this policy. Interestingly, for us as small and agile team, we release updates so often that any version in a stable distribution is outdated very quickly anyway.
`glibc` cannot be used, `libmusl` and Alpine Linux to the rescue
Interestingly, although one can build a completely static binary when linking against `glibc`, it is pretty pointless. The reason is that `glibc` loads certain modules dynamically at runtime in any case. This is to support the pluggable authentication modules (PAM) and the system wide switches for host lookup (`nsswitch.conf`), which is used when calling things like `gethostbyname`. Therefore, even if your executable is completely static, you still need the correct version of `glibc` installed on your system to run these executables successfully.
This makes it necessary to use an alternative C library like `libmusl`. However, this is not as easy as it seems. You need then not only all the C++ libraries (STL and friends) being built against that other C library, but also a complete tool chain with compiler, linker and
binutils. Fortunately, there is a Linux distribution which does all the heavy lifting for us, which is Alpine Linux. So the basic idea is, simply build your executable on Alpine and add `-static` to the final link step for the executables. What could possibly go wrong?
In fact, quite a lot…
Challenges when building on Alpine Linux with `libmusl`
One quickly creates a Docker image based on Alpine Linux, adds the first few obviously needed packages and hopes for the best. My initial package list was:
apk update && apk add g++ bison flex make cmake ccache python \
libldap openssl-dev git linux-vanilla-dev linux-headers vim \
boost-dev ctags man
The following subsections describe the challenges and how I overcame them.
It does not compile
Compiling with a different C- and C++-library should be seamless, but is not. The prevalence of `glibc` on Linux has lead to a situation in which one sometimes uses `glibc`-specific extensions of the various standards without even noticing. Compiling with an alternative library brings these cases to light. In this section I will describe the concrete issues we found when compiling ArangoDB with `libmusl` on Alpine Linux.
The first issue was a case in which something which is a global variable in `glibc` turns out to be a macro in `libmusl`. Namely, we had code like this:
#define SYSLOG_NAMES
#include
and then we were using `facilitynames` for a list of strings naming the syslog facilities. What is worse, we were using it in a way that broke when `facilitynames` was a macro in `libmusl`. This was easy to fix, but it already shows that trouble is ahead if one uses undocumented
features, and then even makes assumptions about something being a macro or not being a macro.
The second issue was that we had used the pthread attribute `PTHREAD_MUTEX_ERRORCHECK_NP`, which is – as its name suffix “`_NP`” suggests – not portable. And indeed it broke compilation under `libmusl`, since there it does not exist. This could also be fixed easily by just removing the call to the pthreads library, it was anyway a rarely used debugging feature which is not worth the loss in portability. Second lesson: Do not use non-portable code if at all possible.
The last compilation problem I found was the use of `mallinfo`, which is a `glibc` extension to obtain memory allocation information. As useful as `mallinfo` can be, it probably does not help if one uses an alternative allocation library like `tcmalloc` or `jemalloc`. Furthermore, it does not exist in `libmusl` and thus cannot be used in our static executable. The obvious idea here is to make its use dependent on the C library being used. Since we already cannot use it under Windows and MacOS, this is not a big deal. I fixed this by putting an `#ifdef` around its use checking whether we are on Linux and use `glibc` (macro `__GLIBC__` is defined).
Interestingly, there is intentionally no macro to test for `libmusl`! The idea is that `libmusl` does not contain any specific extensions but only standardized calls, and therefore it should never be necessary to have `libmusl`-specific code. I have to admit that I indeed did not need such a feature.
This fixed all the compiler errors, there are still a few compiler warnings coming from the version of `libboost` we compile with, but these can safely be ignored.
It does not link
After these compiler errors were fixed, there were problems in the link stage.
A rather trivial one was that one had to specify `-lssl` a second time in the final link stage. This happens with static libraries, since the order in which such library arguments are specified to the linker matters. So this was easily fixed. This might not even have anything to do with Alpine, I did not investigate the details.
Slightly more difficult was that the version of `libldap` (`openldap`) bundled with Alpine (3.7) is linked against `libressl` (instead of `libssl` which is `openssl`), but we needed `libssl` itself for other reasons. So I had to remove the `libressl` and `libldap` package from the system and compile `openldap` myself, linked against `libssl`. I do this in the preparation of the build Docker image, so this does not increase the build time at all in the end. The lesson learned here is that different distributions sometimes use different libraries for the same purpose and thus linking can be a challenge. Well, in general, linking is a dark art, it seems.
The final problem at the link stage was our use of libraries to produce backtraces. I noticed that we used the `glibc` builtin stuff without using proper `cmake` recognition of the libraries needed. Since in `libmusl`, one needs a separate library (`libexecinfo`) for backtraces, the solution is simply to do what we should have done in the first place, to use a `cmake` setup to detect a backtrace library and add the necessary libraries in the link stage. This has the additional benefit that I could remove a hack for Windows that we had. Again, one should learn that one should use the proper tools to detect the necessary libraries.
It does not run
Finally, I had a completely static binary and thought that all is good and I am through. I could not have been wronger! I fired up the executable – and it immediately crashed. What?
Using the debugger, I found out that it never reached my `main` function. It actually already crashed during the relocation phase!
The investigation that followed took me nearly two days, but finally I got to the bottom of this first problem: We have some hand-crafted assembler code for the `x86_64` architecture to compute CRC32 checksums quickly using Intel’s SSE4.2 extensions. It uses a runtime check when it first runs for the case that a very old processor does not support these extensions (using the CPUID instruction). The assembler code contained a few “absolute” addresses like jump tables and the like. The assembler translates this into an object file which contains addresses relative to its beginning together with relocation information. At runtime – even in a completely static executable – this relocation information is used to adjust the relative addresses to the actual absolute addresses which depend on the position to which the executable is actually loaded.
This worked beautifully under Ubuntu, but failed miserably under Alpine, with a crash during the runtime relocation. It turned out that this has nothing to do with static linking itself. The only difference is that the `gcc` compiler in Alpine by default creates executables with `-pie`
(position independent executable). This is good for security, because it allows address space randomization, but bad, since something in the generation of the relocation table of assembler code that uses absolute addresses does not work. I found two workarounds: One is to simply switch off `-pie` by giving `-nopie` to the final link stage of the executables. The other is to change the assembler code to only use relative addressing modes. I went for the latter because it was easy and now allows `-pie` and address space randomization.
The second problem was that for whatever reason compiling against the `jemalloc` memory allocator did not work, I circumvented this by using the standard `malloc`/`free` implementation of `libmusl`. I might get back to this problem and see whether it can be fixed in a more satisfying way. But since we had several issues with `jemalloc` anyway, this might not even be a bad thing.
During my various experiments with static executables I also tried the `clang` compiler bundled with Alpine 3.7 (`clang` 5.0.0 at the time) but I never got it to produce working fully static executables, so I gave up on this.
The final problem at runtime was also very puzzling for a day or so. My executables finally ran, but whenever I opened the web based UI of ArangoDB, the database server crashed with a segfault, somewhere deep in V8-executed JavaScript code, which is notoriously difficult to debug. I should have noticed this a lot quicker, but this is how open-ended investigations go, one investigates for a long time, only to find in the end that one could have found out what is going on much quicker.
It turned out that the default stack sizes in `libmusl` are way to small for our purposes. `glibc` by default allocates 8MB for each stack for each thread. For `libmusl`, this has been lowered to 80kB for each thread. This works for many programs, but ArangoDB embeds Google’s V8 JavaScript engine and when we call JavaScript code from within C++ we quite often end up with pretty deep stacks. The workaround was to add some code like this to our main thread initialization routine basically as first thing in `main`:
#ifndef _WIN32
#ifndef __APPLE__
#ifndef __GLIBC__
// Increase default stack size for libmusl:
pthread_attr_t a;
memset(&a, 0, sizeof(pthread_attr_t));
pthread_attr_setstacksize(&a, 8*1024*1024); // 8MB as in glibc
pthread_attr_setguardsize(&a, 4096); // one page
pthread_setattr_default_np(&a);
#endif
#endif
#endif
This sets the default stack size for all further threads to 8MB, we do not execute JavaScript in the main thread, therefore it is OK to leave its stack size small. This is of course another use of non-portable code (`pthread_setattr_default_np`), this time deployed to make ArangoDB as application more portable, what an irony!
After this adjustment everything just worked fine and I had completely static executables.
Finally, I chose a sensible explicit setting for processor architecture specific optimizations. After all, the executable is supposed to run on as many systems as possible, so I do not want to use the latest and greatest Intel extensions. However, I do want SSE4.2 support, so I went for compiling with `-march=nehalem`, which seems to be a good compromise. Yes, there are older 64bit processors out there which do not support this, but we still have the runtime recognition for our special assembler code and it is unlikely that the compiler will ever use SSE4.2 specific optimizations elsewhere. If it does, I can always compile a separate, completely generic binary for these processors.
Implementation with a Docker image
To make building convenient, I created a Docker image using Alpine Linux. In the end, I ended up with this package list, adding some convenience stuff:
apk update && apk add groff g++ bison flex make cmake ccache python \
libldap openssl-dev git linux-vanilla-dev linux-headers vim \
boost-dev ctags man gdb fish openssh db-dev libexecinfo-dev libexecinfo
I then compiled the `openldap` library `libldap` myself using the installed `openssl` library and installed it in the Docker image.
Furthermore, I added some convenience build scripts to run `cmake`, `make` and `make install` and my “static binary factory” is up and running.
I mount a work directory into the Docker container which contains the source code as well as another directory to keep the `ccache` compiler cache data, such that subsequent builds can benefit from earlier ones.
That is, I can now build with the following Docker command:
docker run -it -v $(pwd):/work \
-e UID=$(id -u) \
-e GID=$(id -g) \
neunhoef/alpinebuildarangodb /scripts/buildAlpine.fish
This assumes you are running `bash` and contains one more trick: I hand my current user ID and group ID in environment variables into the container. The reason for this is that in the container, the compilation is executed as user `root`. Therefore, all the generated or touched files are owned by `root` in the end, which is rather inconvenient. Therefore, my compilation scripts do a `chown -R $UID:$GID` on the work directory in the end, such that after the termination of the Docker container all files belong to me.
Note that I learned to use and love the fish shell recently, so all my build scripts are written in `fish`. Similar to the rest of the content of this article, this is probably rather controversial. My sole argument is: I can finally remember how to write shell programs with `fish`, contrary to my long experience with writing `bash` and related scripts.
Finally here is the part of the compile script which sets up the `ccache` in the Docker container:
cd /work
mkdir -p .ccache.alpine
set -x CCACHE_DIR /work/.ccache.alpine
ccache -M 30G
The `set -x` sets an environment variable. Then all that is needed is to tell `cmake` to use `/usr/lib/ccache/bin/g++` as the C++ compiler.
All the code (and a lot more for our test runs and so on) are public in this repository. The stuff to build the Alpine Docker image is in the `buildAlpine.docker` directory and the actual build script is in `scripts/buildAlpine.fish`.
An unforeseen benefit: Windows Subsystem for Linux
After all this I got bold: I actually tried to run the static Linux executable on Windows with the Windows Subsystem for Linux. To my great surprise, this actually worked. There will be more information on this in a subsequent article.
Outlook
The next step is now to simplify our build and release process by attacking the binary packages. This is already ongoing and I will report on this in a later article. The plan is to create “universal” deb packages, which can be installed on *any* Debian based variant of Linux. Furthermore, I want to create a universal RPM package and a binary tar ball which can be executed wherever unpacked, as well as an Alpine Linux based relatively small Docker image.
There are challenges ahead, like for example the various startup systems like system-v init and systemd, but so far things look good. As a teaser, I actually managed to install my universal Debian package on the Ubuntu variant of the Windows Subsystem for Linux!
Stay tuned, this will be a lot of fun and probably provoke a lively discussion.
2 Comments
Leave a Comment
Get the latest tutorials, blog posts and news:
Hello,
I am surprised you did not get issues with the libstdc++ that uses glibc extensions, including the shared libgcc for the stack unwinding. The g++ you install and use in you Alpine uses it. Do they end up in your static binary? I am looking for the same portability but for a shared library and got stuck at that point. I am heading to using the libc++ from llvm to try to get rid of these.
Hi,
as far as I understand the Alpine Linux people have created packages for gcc and libstdc++ which work with libmusl and thus without using glibc extensions. We are also not using glibc extensions. Therefore, we seem to be fine. The static linking installs everything we need into the static executable.
At the time when this article was written, I did some experiments with clang++ on Alpine to produce static executables, but this did not work at the time. Therefore I went the gcc route and this has worked beautifully so far.
Does this answer your question?
Cheers,
Max.