home shape

Using GraphQL with ArangoDB: A NoSQL Database Solution

GraphQL is a query language created by Facebook for modern web and mobile applications as an alternative to REST APIs. Following the original announcement alongside Relay, Facebook has published an official specification and reference implementation in JavaScript. Recently projects outside Facebook like Meteor have also begun to embrace GraphQL.

Users have been asking us how they can try out GraphQL with ArangoDB. While working on the 2.8 release of our NoSQL database we experimented with GraphQL and published an ArangoDB-compatible wrapper for GraphQL.js. With the general availability of ArangoDB 2.8 you can now use GraphQL in ArangoDB using Foxx services (JavaScript in the database).

A GraphQL primer

GraphQL is a query language that bears some superficial similarities with JSON. Generally GraphQL APIs consist of three parts:

The GraphQL schema is implemented on the server using a library like graphql-sync and defines the types supported by the API, the names of fields that can be queried and the types of queries that can be made. Additionally it defines how the fields are resolved to values using a backend (which can be anything from a simple function call, a remote web service or accessing a database collection).

The client sends queries to the GraphQL API using the GraphQL query language. For web applications and JavaScript mobile apps you can use either GraphQL.js or graphql-sync to make it easier to generate these queries by escaping parameters.

The server exposes the GraphQL API (e.g. using an HTTP endpoint) and passes the schema and query to the GraphQL implementation, which validates and executes the query, later returning the output as JSON.

New to multi-model and graphs? Check out our free ArangoDB Graph Course.

GraphQL vs REST

Whereas in REST APIs each endpoint represents a single resource or collection of resources, GraphQL is agnostic of the underlying protocols. When used via HTTP it only needs a single endpoint that handles all queries.

The API developer still needs to decide what information should be exposed to the client or what access controls should apply to the data but instead of implementing them at each API endpoint, GraphQL allows centralising them in the GraphQL schema. Instead of querying multiple endpoints, the client can pick and choose from the schema when defining the query and filter the response to only contain the fields it actually needs.

For example, the following GraphQL query:

query {
 user(id: "1234") {
   name
   friends {
     name
   }
 }
}

could return a response like this:

{
 "data": {
   "user": {
     "name": "Bob",
     "friends": [
       {
         "name": "Alice"
       },
       {
         "name": "Carol"
       }
     ]
   }
 }
}

whereas in a traditional REST API accessing the names of the friends would likely require additional API calls and filtering the responses to certain fields would either require proprietary extensions or additional endpoints.

GraphQL Demo Service

If you are running ArangoDB 2.8 you can install the Foxx service demo-graphql from the Store. The service provides a single HTTP POST endpoint /graphql that accepts well-formed GraphQL queries against the Star Wars data set used by GraphQL.js.

It supports three queries:

  • hero(episode) returns the human or droid that was the hero of the given episode or the hero of the Star Wars saga if no episode is specified. The valid IDs of the episodes are "NewHope", "Empire", "Jedi" and "Awakens" corresponding to episodes 4, 5, 6 and 7.
  • human(id) returns the human with the given ID (a string value in the range of "1000" to "1007"). Humans have an id, name and optionally a homePlanet.
  • droid(id) does the same for droids (with IDs "2000", "2001" and "2002"). Droids don’t have a homePlanet but may have a primaryFunction.

Both droids and humans have friends (which again can be humans or droids) and a field appearsIn mapping them to episodes (which have an id, title and description).

For example, the following query:

{
 human(id: "1007") {
   name
   friends {
     name
   }
   appearsIn {
     title
   }
 }
}

returns the following JSON:

{
 "data": {
   "human": {
     "name": "Wilhuff Tarkin",
     "friends": [
       {
         "name": "Darth Vader"
       }
     ],
     "appearsIn": [
       {
         "title": "A New Hope"
       }
     ]
   }
 }
}

It’s also possible to do deeply nested lookups like “what episodes have the friends of friends of Luke Skywalker appeared in” (but note that mutual friendships will result in some duplication in the output):

{
 human(id: "1000") {
   friends {
     friends {
       appearsIn {
         title
       }
     }
   }
 }
}

Additionally it’s possible to make queries about the API itself using __schema and __type. For example, the following tells us the “droid” query returns something of a type called "Droid":

{
 __schema {
   queryType {
     fields {
       name
       type {
         name
       }
     }
   }
 }
}

And the next query tells us what fields droids have (so we know what fields we can request when querying droids):

{
 __type(name: "Droid") {
   fields {
     name
   }
 }
}

GraphQL: The Good

GraphQL shifts the burden of having to specify what particular subset of information should be returned to the client. Unlike traditional REST based solutions this is built into the language from the start: a client will only see information they explicitly request, they don’t have to know about anything they’re not already interested in.

At the same time a single GraphQL schema can be written to represent the entire global state graph of an application domain without having to hard-code any assumptions about how that data will be presented to the user. By making the schema declarative GraphQL avoids the necessary duplication and potential for subtle bugs involved in building equally exhaustive HTTP APIs.

GraphQL also provides mechanisms for introspection, allowing developers to explore GraphQL APIs without external documentation.

GraphQL is also protocol agnostic. While REST directly builds on the semantics of the underlying HTTP protocol, GraphQL brings its own semantics, making it easy to re-use GraphQL APIs for non-HTTP communication (such as Web Sockets) with minimal effort.

GraphQL: The Bad

The main drawback of GraphQL as implemented in GraphQL.js is that each object has to be retrieved from the data source before it can be queried further. For example, in order to retrieve the friends of a person, the schema has to first retrieve the person and then retrieve the person’s friends using a second query.

Currently all existing demonstrations of GraphQL use external databases with ORMs or ODMs with complex GraphQL queries causing multiple consequent network requests to an external database. This added cost of network latency, transport overhead, serialization and deserialization makes using GraphQL slow and inefficient compared to an equivalent API using hand-optimized database queries.

This can be mitigated by inspecting the GraphQL Abstract Syntax Tree to determine what fields will be accessed on the retrieved document. However, it doesn’t seem feasible to generate efficient database queries ad hoc, foregoing a lot of the optimizations otherwise possible with handwritten queries in databases.

Conclusion

Although there doesn’t seem to be any feasible way to translate GraphQL requests into database-specific queries (such as AQL), the impact of having a single GraphQL request result in a potentially large number of database requests is much less significant when implementing the GraphQL backend directly inside the database.

While RESTful HTTP APIs are certainly here to stay and GraphQL like any technology has its own trade-offs, the advantages of having a standardized yet flexible interface for accessing and manipulating an application’s global state graph are undeniable.

GraphQL is a promising fit for schema-free databases and dynamically typed languages. Instead of having to spread validation and authorization logic across different HTTP endpoints and native database format restrictions a GraphQL schema can describe these concerns. Thus guaranteeing that sensitive fields are not accidentally exposed and the data formats remain consistent across different queries.

We’re excited to see what the future will hold for GraphQL and encourage you to try out GraphQL in the database with ArangoDB 2.8 and Foxx today. Have a look at the demo-graphql from the Store. If you have built or are planning to build applications using GraphQL and ArangoDB, let us know in the comments.

Alan Plum avatar 1418721602 92x92

Alan Plum

Alan is an experienced web developer who feels equally at home in the backend and frontend. At ArangoDB he works on everything regarding JavaScript, with a special focus on Foxx.

10 Comments

  1. Christian Pekeler on February 17 2016, at 5:38 pm

    Nice! Glad to see this.

  2. kynao on February 19 2016, at 3:06 am

    Can’t the bad side be rebalanced into the good with the help of Foxx? If no, that’s a breaked relation relation with GraphQL before it even begins

    • Alan Plum on February 19 2016, at 11:30 am

      Yes. By resolving the GraphQL schema inside Foxx you can avoid the network overhead. It’s still not as efficient as using optimized AQL queries but it puts ArangoDB at a unique advantage compared to other databases that can’t resolve GraphQL internally.

  3. LVarayut on March 1 2016, at 1:02 pm

    Would it be possible to generate a `schema.json` using GraphQL introspection with `graphql-sync`? I’m using Relay in the front-end and it requires the `schema.json` which is the compiled version of the `schema.js`. So, if I defined the `schema.js` in a Foxx service, I had to compile it to `schema.json` and pass it to the front-end.

    • Alan Plum on March 1 2016, at 2:01 pm

      The schema.json Relay wants is just the output of this:

      const util = require(‘graphql/utilities’);
      const schemaDotJson = util.printSchema(graphql(Schema, util.introspectionQuery));

      graphql-sync doesn’t currently have the utilities sub-module but you should be able to just install graphql-js alongside it and use the utilities from there (and the graphql function from graphql-sync).

      graphql-sync is fully compatible with graphql-js, it just provides a copy of the outer most layer of the main API that removes the promise logic.

  4. s.molinari on March 1 2016, at 1:21 pm

    I have thought about this and I believe having a direct GraphQL API into the database means all the business and data access logic necessary to build the types and to control the available data to be sent through the API would also need to be on the database server. This just doesn’t seem feasible (despite how good the functional capabilities of any database might have).

    In other words, I feel an application layer is still needed between the GraphQL API and the database.

    Scott

  5. Bram Nieuwenhuize on March 1 2016, at 5:50 pm

    I’m not sure what this sentence is articulating:

    `For example, in order to retrieve the friends of a person, the schema has to first retrieve the person and then retrieve the person’s friends using a second query.` (in The Bad section)

    What does ‘retrieve’ mean in this sentence? As I understand it, it’s possible to perform whatever you want within the `resolve` function, including a complex traversal to get all required data (friends) at once. How is the GraphQL layer interfering with the database interface itself?

    • Alan Plum on March 1 2016, at 6:36 pm

      You can do whatever you want in the resolve function, but it’s a trade-off between flexibility and efficiency compared to plain old queries.

      Let’s say you want the “friends of friends” of a person:

      You would first retrieve the person from the database (e.g. with something like `persons.document(personId)`) in the resolve function of the person type.

      Then you need to find the person’s friends, so you perform a second query that returns a cursor of the documents for each friend.

      Then you want each friend’s friends. This is where it gets ugly. Let’s say the original person has six friends. This means you now have to perform the same query as in the previous step six times. If the person had twenty friends, you’d be performing it twenty times now.

      With a plain database query (e.g. a static query for the same data written in AQL) you’d only perform a single query and get the entire result set at once.

      In ArangoDB/Foxx this isn’t a huge problem because making a query only has the overhead of switching between (slow) JavaScript and (fast) native logic each time. With VelocyPack in ArangoDB 3.0 this will see even more performance improvements and less overhead.

      But most examples and tutorials out there don’t use ArangoDB and Foxx. Instead they’re using an application server (typically Node) in front of the database (e.g. MongoDB). When GraphQL is executed outside the database this means that for each query I just described you also add the full network overhead of a roundtrip to the database: send the query over the wire, parse the query in the DB, serialize the result, send the serialized result back to the server and parse it to JavaScript.

      In theory it’s possible to parse the incoming GraphQL query in the resolve function to precisely determine what data needs to be loaded to fully resolve the entire query but that means you need to write code that translates any valid GraphQL query to whatever query language your database understands. But depending on what your schema looks like, this can be extremely difficult and brittle: you’re again stuck either closely tying your internal data representation with the API schema or having to spend a lot of time tweaking backend code whenever you want to adjust your API — exactly what GraphQL otherwise allows you to avoid having to worry about.

  6. ArangoDB database on January 10 2018, at 10:02 pm

    Sorry for the very late reply… how does Dgraph work for you? Any feedback would be great

  7. Market News on August 15 2020, at 3:35 am

    I do not know if it’s just me or if perhaps everybody else encountering problems with your blog.
    It looks like some of the text on your posts are running
    off the screen. Can somebody else please provide feedback and let me
    know if this is happening to them as well? This might
    be a issue with my web browser because I’ve had
    this happen previously. Thanks

Leave a Comment





Get the latest tutorials, blog posts and news: