ArangoML Series: Intro to NetworkX Adapter
Estimated reading time: 3 minutes
This post is the fifth in a series of posts introducing the ArangoML features and tools. This post introduces the NetworkX adapter, which makes it easy to analyze your graphs stored in ArangoDB with NetworkX.
In this post we:
- Briefly introduce NetworkX
- Explore the IMDB user rating dataset
- Showcase the ArangoDB integration of NetworkX
- Explore the centrality measures of the data using NetworkX
- Store the experiment with arangopipe
This notebook is just a slice of the full-sized notebook available in the ArangoDB NetworkX adapter repository. It is summarized here to better fit the blog post format and provide a quick introduction to using the NetworkX adapter.
Posts in this series:
ArangoML Part 1: Where Graphs and Machine Learning Meet
ArangoML Part 2: Basic Arangopipe Workflow
ArangoML Part 3: Bootstrapping and Bias Variance
ArangoML Part 4: Detecting Covariate Shift in Datasets
ArangoML Series: Intro to NetworkX Adapter
ArangoML Series: Multi-Model Collaboration
Going forward the ArangoML series will typically not be numbered. We still want to provide machine learning content for both new and experienced developers, just without the expectation that each post is somehow connected.
NetworkX
NetworkX is a robust graph analysis package that is still actively maintained by a large community of developers. NetworkX defines itself as:
“… a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.”
If your data has complex relationships that you want to query in an ad-hoc manner, the graph data model is a good fit. Running ad-hoc queries for analytic purposes on data with complex relationships can be more efficient with a graph representation. With a relational representation, you can only optimize query performance for queries that are known beforehand (with indexes). Adhoc queries could require multiple joins and have poor performance. In contrast, a query on a graph database for such data starts at a node and traverses a few edges. Most graphs have the so-called small-world effect; Most nodes can be reached in such graphs from most other nodes with a small number of edge traversals. An ad-hoc query starting at any node can reach the node of interest in a few hops. ArangoDB has many built-in graph analysis functions (Graph Module, AQL) that execute ad-hoc queries and develop data transformations easy. The NetworkX adapter now makes it easy to mine this data, obtain insights, and develop analytic applications using NetworkX.
NetworkX is well established in the data science community and is the de-facto format for many third-party tools and libraries. Being able to use your ArangoDB graph data as a NetworkX graph unlocks convenient access to tools that work with NetworkX graphs, such as:
Offering an integration with NetworkX supercharges your graph analytics capabilities and provides the best of both worlds for data scientists. You can now benefit from the fast and flexible storage and query capabilities of ArangoDB while easily converting your ArangoDB graphs into NetworkX graphs when you need to, thanks to our NetworkX-Adapter.
In the following notebook, we will see how to generate a NetworkX graph from an ArangoDB graph. Once we have a NetworkX graph, we will look at the common task of finding centrality measures within the graph. We will wrap everything up by storing this experiment in arangopipe.
Check it out on githubHear More from the Author
Continue Reading
Upcoming ArangoDB 3.7 and Storage Engines
Get the latest tutorials, blog posts and news: