Graph Database Basics
Unlock the Power of Connected Data: An Introductory Guide to Understanding and Implementing Graph Database Basics for Enhanced Analytics and Real-Time Insights.
- A graph consists of nodes, edges, and properties that represent the relationships within the data. A graph database stores graphs and provides built-in functionality for query graphs.
- Edges typically have a direction going from one object to another or multiple objects. Vertices and edges form a network of data points which is called a “graph”.
- In discrete mathematics, a graph is defined as a set of vertices and edges. In computing, it is considered an abstract data type which is really good at representing connections or relations – unlike the tabular data structures of relational database systems, which are ironically very limited in expressing relations.
- Graphs can be of a different nature. Graphs can be undirected, directed or form a so-called Directed Acyclic Graph (DAG).
Stored edges always have a direction
_from one vertex
_to another. Seen from a certain vertex, incoming edges are called
inbound and outgoing edges
outbound. During queries, the stored direction can be ignored by the actual query when deciding to follow any direction.
Typical Query Patterns in a Graph
Graph databases offer specialized algorithms to analyze the relationships of data.
The simplest algorithm is a so-called graph traversal. A graph traversal begins to traverse the graph beginning at a defined start vertex and ends at a defined depth with the end vertex.
When applying filters during a graph traversal on the properties of a vertex or an edge the pattern matching algorithm is being used.
You can also analyze the shortest distance between two given vertices or nodes. This query pattern is called shortest path.
An easy way to imagine a graph is thinking about a social network. In a social network, you have friends and something that is also common is that they may have other friends besides you (Gasp!), you may even be friends with those ‘other’ friends.An easy way to imagine a graph is thinking about a social network. In a social network, you have friends and something that is also common is that they may have other friends besides you (Gasp!), you may even be friends with those ‘other’ friends.
This relationship between you, your friends, and their friends is a part of what forms your social network. These connections can easily be translated into a graph and in fact, it could be very useful to structure a social network as a graph. You and your friends could be represented as individual vertices (nodes) and then the things that tie you together or describe your relationship would be an edge or the lines that connect the nodes.
So the simplest edge would be the line that connected you to a friend. However, what if this connection went one step further and described more things about your relationship? You could include details that are common among you, such as the fact that you both love Avocados (who doesn’t!?) and then when you wanted to find friends to join you for the Avocado Festival you could easily query that information. This would allow for things such as suggesting new friends, finding events based on you and your friends matching interests, or even recording important dates such as the date you became friends or other shared life events.
The details that make up the things you like, the things your friends like, and then the things that you share in common could be thought of as the properties of you and your friendships. This concept of modeling your data with descriptive labels is how data is modeled in a property graph.
Property graphs use relevant semantic labels to model your data and its connections. This means that data can be structured in a way that is easily understood by a human. Since the data is modeled using relevant terms it can also be queried in an easy to read way. ArangoDB allows for storing information on the vertices as well as the connecting edges, that’s why you can define the things you and your friends have on the edges while maintaining the personal properties on the individual vertices. We used the example of a social network here but the existence of networks exists everywhere and if you would like a full dive into a real-world example using the example of airport and flight data be sure to take the next step with our Graph Course for Freshers that takes you from zero knowledge to advanced queries.
Using Graphs in ArangoDB
Unlike many NoSQL databases, ArangoDB is a native multi-model database. You can store your data as key/value pairs, graphs or documents and access any or all of your data using a single declarative query language. You can combine different models in one query. And, due to its native multi-model approach, you can build high performance applications and scale horizontally with all three data models.
ArangoDB as a Graph Database
The graph capabilities of ArangoDB are similar to a property graph database but add more flexibility in terms of data modeling as vertices and edges are both full JSON documents.
For each document, a unique
_id attribute is stored automatically. To build a relation (i.e., an edge) between two documents (i.e., vertices), both
_id attributes are stored in a special edge document known as
_to attributes, forming a directed connection between two arbitrary vertices. Edges are then stored in a special edge collection.
ArangoDB enables efficient and scalable graph query performance by using a special hash index on
_to attributes (i.e., an edge index). This allows for constant lookup times. Using an edge index, ArangoDB can process graph queries very efficiently.
Vertices and edges are both full JSON documents and can hold arbitrary data. By this approach combined with the edge index, ArangoDB is one of the few graph databases capable of horizontal scaling. Each edge and vertex can contain complex data in the form of nested properties, and all graph functions are deeply integrated into the ArangoDB Query Language, (AQL).
Graph Database Features
ArangoDB supports document, graph, and key/value data models. Due to this natively integrated support, users can also take the result of a JOIN operation, geospatial query, text search or any other access pattern as a starting point for further graph analysis and vice versa – all in one query, if needed. This is an advantage of a native multi-model database like ArangoDB.
A graph can be visualized and manipulated directly within the ArangoDB WebUI. The WebUI provides many configurations for displaying edges and vertices. Here is a view of the IMDB dataset with its search depth set to 4, results limited to 300, the edge visualization type has been set to curved, and with custom vertex and edge labels. This gives a quick view of genres, movies in those genres, and actors who played in those movies.
A nice feature of the Graph Viewer is the ability to select a node and set it as your start node. Here we chose James Cameron as the start node and now can see the movies he was involved in and then, depending on the depth set, further relationships from there. So, for this example, we see that he directed both Avatar and Titanic, which in this dataset are both classified as Action movies, and we can also other Action movies.
We provide this functionality out of the box to make visualizing your data easy. If you’re interested in learning how to access the graph capabilities of ArangoDB, the ArangoDB Graph Course is a great place to start.
Scaling with Graphs
As your application grows, chances are the size of your graph will grow along with it. In order to make sure graph traversals stay as performant as possible, even when being sharded across multiple servers in a cluster, ArangoDB provides a solution in the form of SmartGraphs.
The primary hit to performance comes from network latency. As shown below, when doing a traversal with data sharded across a cluster, multiple back and forth network hops are usually necessary and this can have a large impact on performance. For larger datasets this is a common situation and SmartGraphs reduces the needed network hops by intelligently sharding data.
In many data-sets there are highly interconnected communities, but few connections between these communities. For instance, a set covering your customers, regions or any other logic you apply to organize your graph at the application layer can in turn be used in sharding the graph through the cluster.