home shape

Bulk Insert Benchmark Tool | ArangoDB 2012

To easily conduct bulk insert benchmarks with different NoSQL databases, we wrapped a small benchmark tool in PHP. The tool can be used to measure the time it takes to bulk upload data into MongoDB, CouchDB, and ArangoDB using the databases’ bulk documents APIs.

The tool can also measure datafile sizes after the bulk load. The tool will upload documents to the databases in chunks, without concurrency (remember, this is PHP). It will report the total time needed, plus the amount of time needed for the database operations only (some of the total time might be spent in data generation etc., this is reported separately).

The package can be downloaded from  https://github.com/jsteemann/BulkInsertBenchmark. It contains the necessary database adapters for MongoDB, CouchDB, and MongoDB. Adapter for other databases can be added easily by writing a new specialised class and adding it to the package. The package also contains some data generators that read data from a JSON file or randomly generate data. Again, adding new generators should be easy if the ones supplied don’t fit your needs.

You’ll need PHP version 5.3 or higher to run it, plus the native PHP driver for MongoDB if you want to include MongoDB in the results. The PHP driver for MongoDB can be installed with the following command:

sudo pecl install mongo

To make use of the package, it must have a configuration file. That file tells it what databases to test and their configuration parameters. Furthermore, the configuration file contains the test cases, i.e. what data to test with. An example configuration is bundled and can be found in the file config-example.php.

As a first step, you can copy the file config-example.php to config.php and adjust it. When you’re done, run the benchmark by typing

php run.php

on the command line. By default, the benchmark tool with print it’s output on screen and also into a CSV file with name “results.csv”, but this can easily be adjusted by editing config.php.

Jan Steemann

Jan Steemann

After more than 30 years of playing around with 8 bit computers, assembler and scripting languages, Jan decided to move on to work in database engineering. Jan is now a senior C/C++ developer with the ArangoDB core team, being there from version 0.1. He is mostly working on performance optimization, storage engines and the querying functionality. He also wrote most of AQL (ArangoDB’s query language).

Leave a Comment

Get the latest tutorials, blog posts and news: