Agile development vs. schema enforcement – a paradox resolved

The fans of modern and agile software development usually propose to use schemaless database engines to allow for greater flexibility, in particular during the early rapid prototyping phase of IT projects. The more traditionally minded insist that having a strict schema that is enforced by the persistence layer throughout the lifetime of a project is necessary to ensure quality and security.
In this post I would like to explain briefly, why I believe that both groups are completely right and why this is not so paradoxical as it sounds at first glance. I am one of the developers of ArangoDB, which is a multi-model NoSQL database, by which I mean an engine that is a document store, a key/value store as well as a graph database with a query language that allows to use and indeed mix all three data models in queries.

As a document store, ArangoDB is schemaless, which is usually very convenient in the beginning of a software project, where the actual schema is not yet completely clear and subject to frequent changes. Obviously, at any given time in a project, the developers actually have a concrete schema in mind, the only problem is, that it undergoes frequent changes, in particular when using a more agile software development style. With a schemaless database one can tackle these changes in many different ways:

  • one can migrate (or indeed erase) the data for every change
  • one can make the application client code aware of multiple versions of the schema and teach it to work well with different document types
  • one can migrate the data lazily with each update or replacement of a document.

None of these approaches is “right” or “wrong“, but different approaches might be the best in different situations.

Later in the development cycle of most applications the schema becomes more and more fixed and undergoes less changes. In these later phases the classical arguments for schema validation apply again and often security and stability concerns counter flexibility arguments.

Therefore, ArangoDB can then be turned into a strict schema-enforcing persistence engine, because its HTTP API can be extended by user code written in JavaScript that is executed in the database server with direct access to the data. One can gradually evolve the way the data store is used by client code and move it slowly over to special, user defined routes that enforce the by now stable database schema, in particular for the write operations. As a consequence, a lot of client code can be simplified, because suddenly one can rely on a strict schema that is enforced by the API, once all write operations are covered.

In the end, when one has customized the whole API for the app, one can even switch off the standard database API, which further increases security and cleanliness. With this final step one has arrived at a software architecture that implements data-centric microservices in an application-specific way directly in the database server, which is good against bugs, good for performance (complex queries can be run close to the data), good for the simplicity of the application design and good for maintainability. Even the devops like this because the microservices can be deployed and updated independently.

We summarize by saying that the extensibility of ArangoDB by user defined JavaScript code can help to strike a perfect compromise between schemaless flexibility in the early and a secure, reliable, well-designed and schema-enforcing API in the late phases of the development.

Max Neunhöffer

Max is one of the C/C++ developers working on the ArangoDB core. In particular, he is responsible for the sharding extension and additionally converts the latest ideas from database science into C/C++ code. Furthermore, he enjoys to give public talks about the technical aspects of the ArangoDB development.

Leave a Comment

Get the latest tutorials, blog posts and news: