home shape

ArangoDB 1.1 Feature Preview: Batch Request API | ArangoDB 2012

Clients normally send individual operations to ArangoDB in individual HTTP requests. This is straightforward and simple, but has the disadvantage that the network overhead can be significant if many small requests are issued in a row.

To mitigate this problem, ArangoDB 1.1 offers a batch request API that clients can use to send multiple operations in one batch to ArangoDB. This method is especially useful when the client has to send many HTTP requests with a small body/payload and the individual request results do not depend on each other.

Clients can use ArangoDB’s batch API by issuing a multipart HTTP POST request to the URL /_api/batch handler. The handler will accept the request if the Content-Type is multipart/form-data and a boundary string is specified. ArangoDB will then decompose the batch request into its individual parts using this boundary. This also means that the boundary string itself must not be contained in any of the parts. When ArangoDB has split the multipart request into its individual parts, it will process all parts sequentially as if it were a standalone request. When all parts are processed, ArangoDB will generate a multipart HTTP response that contains one part for each part operation result. For example, if you send a multipart request with 5 parts, ArangoDB will send back a multipart response with 5 parts as well.

The server expects each message part to start with exactly the following literal: Content-Type: application/x-arango-batchpart, followed by two Windows linebreaks (i.e. \r\n\r\n). Any deviation will lead to the part being rejected or incorrectly interpreted. The part request payload, formatted as a regular HTTP request, must follow this literal directly.

Note that the literal Content-Type: application/x-arango-batchpart technically is the header of the MIME part, and the HTTP request (including its headers) is the body part of the MIME part.

An actual part request should start with the HTTP method, the called URL, and the HTTP protocol version as usual, followed by arbitrary HTTP headers. Its body should follow after the usual \r\n\r\n literal. Part requests are therefore regular HTTP requests, only embedded inside a multipart message. This might sound complicated at first, however, it has the advantage that any HTTP request can transparently be embedded as part request inside a multipart message.

The following example will send a batch with 3 individual document creation operations. The boundary used in this example is XXXpartXXX. The complete request is:

curl -X POST \
     --data-binary @- \
     --header "Content-Type: multipart/form-data; boundary=XXXpartXXX" \
     http://localhost:8529/_api/batch
--XXXpartXXX
Content-Type: application/x-arango-batchpart

POST /_api/document?collection=xyz&createCollection=true HTTP/1.1

{"a":1,"b":2,"c":3}
--XXXpartXXX
Content-Type: application/x-arango-batchpart

POST /_api/document?collection=xyz HTTP/1.1

{"a":1,"b":2,"c":3,"d":4}
--XXXpartXXX
Content-Type: application/x-arango-batchpart

POST /_api/document?collection=xyz HTTP/1.1

{"a":1,"b":2,"c":3,"d":4,"e":5}
--XXXpartXXX--

The server will then respond with one multipart message, containing the overall status and the individual results for the part operations. The overall status should be 200 except in case there was an error while inspecting and processing the multipart message. The overall status therefore does not indicate the success of each part operation, but only indicates whether the multipart message could be handled successfully.

Each part operation will return its own status value. As the part operation results are regular HTTP responses (just included in one multipart response), the part operation status is returned as a HTTP status code. The status codes of the part operations are exactly the same as if you called the individual operations standalone. Each part operation might also return arbitrary HTTP headers and a body/payload:

HTTP/1.1 200 OK
connection: Keep-Alive
content-type: multipart/form-data; boundary=XXXpartXXX
content-length: 1055

--XXXpartXXX
Content-Type: application/x-arango-batchpart

HTTP/1.1 202 Accepted
location: /_api/document/101059/9514299
content-type: application/json; charset=utf-8
etag: "9514299"
content-length: 53

{"error":false,"_id":"101059/9514299","_rev":9514299}
--XXXpartXXX
Content-Type: application/x-arango-batchpart

HTTP/1.1 202 Accepted
location: /_api/document/101059/9579835
content-type: application/json; charset=utf-8
etag: "9579835"
content-length: 53

{"error":false,"_id":"101059/9579835","_rev":9579835}
--XXXpartXXX
Content-Type: application/x-arango-batchpart

HTTP/1.1 202 Accepted
location: /_api/document/101059/9645371
content-type: application/json; charset=utf-8
etag: "9645371"
content-length: 53

{"error":false,"_id":"101059/9645371","_rev":9645371}
--XXXpartXXX--

In the above example, the server returned an overall status code of 200, and each part response contains its own status value (202 in the example):

When constructing the multipart HTTP response, the server will use the same boundary that the client supplied. If any of the part responses has a status code of 400 or greater, the server will also return an HTTP header x-arango-errors containing the overall number of part requests that produced errors.

Here’s a batch request that will produce an error:

curl -X POST \
     --data-binary @- \
     --header "Content-Type: multipart/form-data; boundary=XXXpartXXX" \
     http://localhost:8529/_api/batch
--XXXpartXXX
Content-Type: application/x-arango-batchpart

POST /_api/document?collection=nonexisting

{"a":1,"b":2,"c":3}
--XXXpartXXX
Content-Type: application/x-arango-batchpart

POST /_api/document?collection=xyz

{"a":1,"b":2,"c":3,"d":4}
--XXXpartXXX--

In this example, the overall response code is 200, but as some of the part request failed (with status code 404), the x-arango-errors header of the overall response is 1:

HTTP/1.1 200 OK
x-arango-errors: 1
content-type: multipart/form-data; boundary=XXXpartXXX
content-length: 711

--XXXpartXXX
Content-Type: application/x-arango-batchpart

HTTP/1.1 404 Not Found
content-type: application/json; charset=utf-8
content-length: 111

{"error":true,"code":404,"errorNum":1203,"errorMessage":"collection \/_api\/collection\/nonexisting not found"}
--XXXpartXXX
Content-Type: application/x-arango-batchpart

HTTP/1.1 202 Accepted
location: /_api/document/101059/9841979
content-type: application/json; charset=utf-8
etag: "9841979"
content-length: 53

{"error":false,"_id":"101059/9841979","_rev":9841979}
--XXXpartXXX--

Please note that the feature is available in ArangoDB version 1.1, which is still in development. If you want to, you can give it try already, but it should not be used in production until 1.1 is released officially.

Jan Steemann

Jan Steemann

After more than 30 years of playing around with 8 bit computers, assembler and scripting languages, Jan decided to move on to work in database engineering. Jan is now a senior C/C++ developer with the ArangoDB core team, being there from version 0.1. He is mostly working on performance optimization, storage engines and the querying functionality. He also wrote most of AQL (ArangoDB’s query language).

1 Comments

  1. FRANKMAYER.NET on October 7 2012, at 4:49 pm

    Very useful addition! Will try it out soon. Thanks!

Leave a Comment





Get the latest tutorials, blog posts and news: