Bulk Inserts Comparison: MongoDB, CouchDB, ArangoDB ’12

September 4 2012,/General, Performance

In the last couple of posts, we have been looking at ArangoDB’s insert performance when using individual document insert, delete, and update operations. This time we’ll be looking at batched inserts. To have some reference, we’ll compare the results of ArangoDB to what can be achieved with CouchDB and MongoDB.

Test setup

We have used the bulk insert benchmark tool to generate results for MongoDB, CouchDB, and ArangoDB. The benchmark tool uses the HTTP bulk documents APIs for CouchDB and ArangoDB, and the binary protocol for MongoDB (as MongoDB does not have an HTTP bulk API). The benchmark tool was run on the same machine as the database servers so network latency can be ruled out as an influence factor. The test machine specifications are:

Linux Kernel 2.6.37.6-0.11, cfq scheduler
64 bit OS
8x Intel(R) Core(TM) i7 CPU, 2.67 GHz
12 GB total RAM
SATA II hard drive (7.200 RPM, 32 MB cache)

The total “net insert time” (time spent in the benchmark tool for sending to request to the database and waiting for the database response, i.e. excluding the time needed to generate the document data) is reported for several datasets in the following charts.

The database versions used for tests were:

MongoDB 2.1.3, with preallocation
CouchDB 1.2, with delayed_commits, without compression
ArangoDB 1.1-alpha, with waitForSync=false

The datasets tested can be categorised in three groups: small, medium, and big. The small datasets tested were:

Dataset name	Description	Number of documents
uniform_1000	One attribute plus unique „_id“ value	1,000
uniform_10000	same, but 10,000 documents	10,000
names_10000	person records containing names and address, artificially created with source data from US census bureau, ZIP code and state lists	10,000

The medium datasets tested were:

Dataset name	Description	Number of documents
enron	enron e-mail corpus, published by Federal Energy Commission	41,299
names_100000	person records containing names and address, artificially created with source data from US census bureau, ZIP code and state lists	100,000
names_300000	same, but 300,000 documents	300,000
wiki_50000	Wikipedia articles	50,000

The big datasets tested consisted of:

Dataset name	Description	Number of documents
uniform_1000000	One attribute plus unique „_id“ value	1,000,000
uniform_10000000	same, but 10,000,000 documents	10,000,000
aol	search engine queries published by AOL	3,459,421
accesslogs	Apache web server access logs	1,357,246

Results, small datasets

For the smallest dataset (uniform_1000), the results were almost on par, with CouchDB being slightly faster than MongoDB than ArangoDB. For the other small datasets tested, MongoDB was slightly faster than ArangoDB, and both being notably faster than CouchDB.

Results, bigger datasets

Results, medium datasets

For the medium datasets, MongoDB was fastest for the first two sets tested, and ArangoDB was fastest for the other two sets. CouchDB was slightly slower for two of the datasets, and substantially slower for the two other.

Results, bigger datasets

Results, big datasets

With the bigger datasets tested, ArangoDB had the lowest bulk insert times. MongoDB was slightly slower for three of the cases tested, and substantially longer for the other case (uniform_10000000). CouchDB consisrently had the highest insertion time.

Results, bigger datasets

Conclusion

With the datasets tested, ArangoDB was on par with MongoDB (with MongoDB being slightly faster in some cases and ArangoDB in others). CouchDB was notably slower than MongoDB and ArangoDB, except in one case.

Caveats

These are benchmarks for specific datasets. The dataset volumes and types might or might not be realistic, depending on what you plan to do with a database. Results might look completely different for other datasets.

In addition, the benchmarks compare the HTTP API of CouchDB and ArangoDB against the binary protocol of MongoDB, which gives MongoDB a slight efficiency advantage. However, real-world applications will also use Mongo’s binary protocol so this is an advantage that MongoDB does have in real life (though it comes with the disadvantage that the protocol is not human-readable).

Furthermore, there are of course other aspects that would deserve observation, e.g. datafile size, memory usage. These aspects haven’t been looked at in this post.

So please be sure to run your own tests in your own environment before adopting the results.

Jan Steemann

After more than 30 years of playing around with 8 bit computers, assembler and scripting languages, Jan decided to move on to work in database engineering. Jan is now a senior C/C++ developer with the ArangoDB core team, being there from version 0.1. He is mostly working on performance optimization, storage engines and the querying functionality. He also wrote most of AQL (ArangoDB’s query language).

September 4 2012,Jan Steemann

Fireside Chat – Powering GenAI: The Critical Foundations for Scale. Watch Now

Bulk Inserts Comparison: MongoDB, CouchDB, ArangoDB ’12

Test setup

Results, small datasets

Results, medium datasets

Results, big datasets

Conclusion

Caveats

Jan Steemann

Leave a Comment Cancel Reply

Tags

Quick Links

Info

About Us

Stay In Touch