home shape

Diffing Two Documents in AQL: ArangoDB Data Comparison

I just stumbled upon a comment in the ArangoDB blog asking how to create a diff of two documents with AQL.

Though there is no built-in AQL function to diff two documents, it is easily possible to build your own like in the following query.

Read more on how to diff two documents in AQL.

Jan Steemann

Jan Steemann

After more than 30 years of playing around with 8 bit computers, assembler and scripting languages, Jan decided to move on to work in database engineering. Jan is now a senior C/C++ developer with the ArangoDB core team, being there from version 0.1. He is mostly working on performance optimization, storage engines and the querying functionality. He also wrote most of AQL (ArangoDB’s query language).

2 Comments

  1. CoDEmanX on May 26 2015, at 6:21 pm

    I wonder if a custom AQL function written in JS for document diffing would be slower than your pure AQL query… BTW: there’s a json-patch format https://tools.ietf.org/html/rfc6902 and a diff format used here: https://github.com/benjamine/jsondiffpatch

    • jsteemann on May 26 2015, at 6:59 pm

      I was aware of JSON-patch, but I wasn’t sure what the intention of the to-be-generated diffs was. For simply detecting whether or not two documents differ and to what extent, a simple solution as demonstrated may already be sufficient. If the goal however is to create patches that can be sent somewhere via HTTP PATCH, JSON-patch will be the natural choice.
      I wasn’t aware of jsondiffpatch yet, and after looking into it, I still prefer JSON-patch.

      But which way to go really depends on what is to be achieved with the generated diffs.

      Performance-wise a custom AQL function may be faster if the documents
      are small. For example, the following custom function was about twice as
      fast as the AQL-only solution for the two example documents I used in the post:

      require(“org/arangodb/aql/functions”).register(“my::diff”, function (doc1, doc2) {
      var result = {
      missing: { },
      changed: { },
      added: { }
      };
      Object.keys(doc1).forEach(function(key) {
      if (! doc2.hasOwnProperty(key)) {
      result.missing[key] = doc1[key];
      }
      else if (JSON.stringify(doc1[key]) !== JSON.stringify(doc2[key])) {
      result.changed[key] = { old: doc1[key], ‘new’: doc2[key] };
      }
      });
      Object.keys(doc2).forEach(function(key) {
      if (! doc1.hasOwnProperty(key)) {
      result.added[key] = doc2[key];
      }
      });
      return result;
      };

      The JavaScript solution can save one iteration over all attributes because it can use if/then/else, which AQL does not provide.
      It may look different for other types of documents (especially bigger ones) and if new types of AQL optimizations are added.

Leave a Comment





Get the latest tutorials, blog posts and news: