Comparing ArangoDB with CouchDB and MongoDB
The folks over at MongoDB have an article on their site on Comparing MongoDB and CouchDB.
They write:
“We are getting a lot of questions “how are mongo db and couch different?” It’s a good question: both are document-oriented databases with schemaless JSON-style object data storage. Both products have their place — we are big believers that databases are specializing and “one size fits all” no longer applies.”
The same applies to ArangoDB, we meet a lot of people especially developers using MongoDB who are interested in ArangoDB and ask us how it is different than other popular nosql databases. In this article we would like to add our answers to the questions asked.
(more…)
ArangoDB 1.1 Feature Preview: Batch Request API | ArangoDB 2012
Clients normally send individual operations to ArangoDB in individual HTTP requests. This is straightforward and simple, but has the disadvantage that the network overhead can be significant if many small requests are issued in a row.
To mitigate this problem, ArangoDB 1.1 offers a batch request API that clients can use to send multiple operations in one batch to ArangoDB. This method is especially useful when the client has to send many HTTP requests with a small body/payload and the individual request results do not depend on each other.
(more…)
ArangoDB 2012: Gain Factor of 5 with Batch Updates
ArangoDB 1.1 will come with a new API for batch requests. This batch request API allows clients to send multiple requests to the ArangoDB server inside one multipart HTTP request. The server will then decompose the multipart request into the individual parts and process them as if they were sent individually. The communication layer can sustain up-to 800.000 requests/second – but absolute numbers strongly depend on the number of cores, the type of the requests, network connections and other factors. More important are the relative numbers: Depending on your use-case you can reduce insert/update times by 80%.
(more…)
ArangoDB 1.01 Released: What’s New? | ArangoDB 2012
This version is deprecated. Download the new version of ArangoDB
Quick note: ArangoDB 1.01 is available. This is a bugfix release. Check the “ArangoDB Google group” for the changelog . By the way – a lot of interesting discussions on ArangoDB, its feature roadmap and how it works in detail, are taking place there. Binaries are always available in the download section.
ArangoDB 2012: Performance Across Different Journal Sizes
As promised in one of the previous posts, here are some performance results that show the effect of different journal sizes for insert, update, delete, and get operations in ArangoDB.
Why journal size could matter
The journal file sizes determine how large a single datafile in ArangoDB is. The smaller that parameter is, the more datafiles need to be created, initially prefilled, closed, compacted etc.. These operations do have some overhead per file, and they occur more often the more datafiles are being used.
The journal size can be configured at startup with the parameter "--database.maximal-journal-size". It will then affect new collections only. It can also be set on a per-collection level when a collection is created, and will then only affect that particular collection.
The test setup
The results shown in the charts were generated by a test that started a new ArangoDB database instance with a specific journal size (using the --database.maximal-journal-size startup option), and then used the ArangoDB HTTP API to insert a number of documents from a file into the collection in the database. The inserts were done individually and not in batches. That means to insert 10,000 documents, 10,000 individual HTTP requests were executed. These inserts were done with varying concurrency levels from 1 (no concurrency) to 64 clients. The total test time on the client side was recorded.
After the data was inserted, the documents were retrieved individually, again using the HTTP API and with varying concurrency levels. That operation time was recorded as well. Following, all existing documents were updated individually via HTTP calls, and the total update time was recorded. Finally, document deletion time was measured by importing the documents again and then deleting them individually via the HTTP API. Deletion time was recorded.
The ArangoDB version used in these tests was 1.1-alpha. The waitForSync option was turned off so the data inserted was not forced to disk via msync after each insert, but flushed to disk asynchronously.
We used a slightly modified version of httpress as the HTTP test client. ArangoDB and the test client were running on the same physical server. The test machine had the following specs:
- Linux Kernel 2.6.37.6-0.11, cfq scheduler
- 64 bit OS
- 8x Intel(R) Core(TM) i7 CPU, 2.67 GHz
- 12 GB total RAM
- SATA II hard drive (7.200 RPM, 32 MB cache)
Performance, 10,000 operations
When working with only 10,000 documents, the effects of using different journal sizes were minimal and well within the margin of error. With only 10,000 operations, there were also a few outliers (deviations of 0.1 seconds) at higher concurrency that happened during some of the tests but that were not reliably reproducible. They shouldn't be taken too serious given the short overall duration of the tests, and given the test setup (client and server on the same machine, competing for the same resources).
Insert performance
Total execution times were well below one second for all tested concurrency levels when inserting 10,000 documents individually. Different journal sizes did not have any substantial effect here.
Delete performance
The same can be said for delete operations: with only 10,000 documents, the effect of different journal sizes is too small to be relevant. The total execution time is below 0.7 seconds in all cases as delete is a relatively cheap operation.
Update performance
With just 10,000 update operations, the total execution time is again below 1 second in all cases, and there is no clear indication that journal size matters.
Get performance
When retrieving documents, the journal size should not matter at all, provided the OS has buffered the data in RAM already. And indeed, no substantial difference can be observed when executing 10,000 get operations with different journal sizes.
Performance, 100,000 operations
The picture changes slightly when increasing the number of operations. When working with 100,000 documents instead of just 10,000, it can be observed that smaller journal sizes lead to slightly longer execution times.
Insert performance
It can be seen that bigger journal file sizes lead to slightly better execution times, though the difference is not very high. It might well be worth trading in a little performance for some saved disk space in some cases.
Delete performance
When executing 100,000 delete operations, using the default journal size of 32 MB has a slight performance benefit over using smaller journals. The benefit is smaller than what was observed for insert operations, and this can be explained by the fact that delete is a relatively cheap operation that writes only very small amounts of data (basically just a deletion marker).
Update performance
When performing 100,000 individual update operations, we again see that journal size matters a bit, and that smaller journals lead to slightly longer execution times. This is about the same distribution as in the insert operation case.
Get performance
Looking finally at get operations, it can be seen that different journal sizes still do not have any effect on the overall execution time. This is expected as get operations do not need to access the disk if the data is in RAM already. Different journal sizes have no effect at all here.
Conclusion
Different journal sizes in ArangoDB don't have an effect for data-retrieval operations (i.e. get), but they can have some performance impact for data-modification operations (i.e. insert, delete, update). You may use the journal size parameter to trade in a little data-modification performance for some disk space savings.
Please note that the performance results shown above were measured with some particular datasets. As not all data is created equal, the performance impact may vary with the data. So please be sure to measure with your own datasets.
To conclude: whether or not you should adjust the journal size and for which collections depends on your workload, performance requirements, and available hardware. Please also be sure to check the previous post that shows the disk space usages. Please also note that adjusting the maximum journal size does modify the maximum size of documents that can be saved into a collection. Picking a very low journal size obviously is not a good idea if you plan on saving big documents. But as the journal size can be adjusted on a per-collection level you can still fine-tune the settings according to your needs.
Get 20% Off: NoSQL Matters Barcelona | ArangoDB 2012
We are on the road again and are invited to give a talk at the “nosql matters” in Barcelona. This is a one day conference in an amazing looking venue (UNESCO world heritage).
Now the conference team offered us a couple of promo codes for “nosql matters” on October, 6th. Katja, one of the organizers, writes:
“there might be some friends, colleagues, contacts or even your followers on twitter who are interested in hearing your talk at NoSQL matters Barcelona. Therefore we would like to give them the opportunity to buy price reduced tickets. With the promotion code BCNSchoenert_7959 you can give 5 of them the chance to buy a ticket with 20% discount.”
So, here we are. Grap the code and get your ticket. We are looking forward to meeting you in Spain.
Bulk Inserts Comparison: MongoDB, CouchDB, ArangoDB ’12
In the last couple of posts, we have been looking at ArangoDB’s insert performance when using individual document insert, delete, and update operations. This time we’ll be looking at batched inserts. To have some reference, we’ll compare the results of ArangoDB to what can be achieved with CouchDB and MongoDB.
(more…)
Bulk Insert Benchmark Tool | ArangoDB 2012
To easily conduct bulk insert benchmarks with different NoSQL databases, we wrapped a small benchmark tool in PHP. The tool can be used to measure the time it takes to bulk upload data into MongoDB, CouchDB, and ArangoDB using the databases’ bulk documents APIs.
(more…)
ArangoDB 2012: Additional Results for Mixed Workload
In a comment to the last post, there was a request to conduct some benchmarks with a mixed workload that does not test insert/delete/update/get operations in isolation but when they work together.
To do this, I put together a quick benchmark that inserts 10,000 documents, and after each insert either
- directly fetches the inserted document (i.e. insert / get),
- updates the inserted documents and retrieves it (i.e. insert / update / get), or
- deletes it (i.e. insert / delete)
The three cases are alternated deterministically, meaning each case occurs with the same frequency and in the same order. It’s probably still not the best ever test case, but at least it reflects a mixed read and write workload.
The document ids used in the test were monotically increasing integers, starting from some base value. That means no random values were used.
The test was repeated for 100,000 documents as well. The dataset still fully fits in RAM. The tests were run in the same environment as the previous tests so one can compare them.
The results are in line with the results shown in the previous post. Here’s the chart with the results of the 10,000 documents benchmark:
And here are the tests result for the 100,000 documents benchmark:
Data Modeling in a Schema-Free Environment | ArangoDB 2012
We just came back from FroSCon, a large, international open source conference near Bonn, Germany. Jan Steemann, one of the core developers of ArangoDB, had a talk on modelling data in a schema-free world. Jan was given the largest room in the conference for this talk, fortunately a lot of people showed up and even stayed ;-).
You can find Jan’s presentation below.
Get the latest tutorials,
blog posts and news:
Thanks for subscribing! Please check your email for further instructions.