ArangoDB NetworkX Persistence Layer
FAQ
Why did NVIDIA and ArangoDB partner to build this integration?
This integration helps you save the graphs you create with NetworkX into ArangoDB so you can easily store, manage, and analyze them later, thus saving a lot of time and eliminating repetitive steps. It’s handy because ArangoDB is good at handling extremely large, complex graphs and lets you query them in different ways, including alongside other types of data like documents, full-text, and/or key-value pairs. There is also an opportunity for exponential performance improvements when using cuGraph to accelerate analytic processing of the persisted data.
Who is the ideal user of the integration?
If you’re someone who works with graphs using NetworkX and needs a reliable and secure place to store and manage them, this integration is for you. It’s great for data scientists, data engineers, software engineers, and machine learning engineers and scientists who want to take their graph work to the next level by using ArangoDB’s graph database features.
Are there any costs involved?
Using this integration doesn’t have extra costs if you’re sticking with the open-source Community version of ArangoDB. However, if you need more advanced features or support, you might need to consider other ArangoDB versions:
- ArangoDB’s Enterprise Edition, which has a cost associated with it. Users of this integration receive special discounts on ArangoDB Enterprise. To inquire about this special pricing, please contact us here.
- ArangoGraph, ArangoDB’s Managed Service offering. There is a 14-day free trial that you can use in conjunction with this integration, after which you can pay month-to-month with a credit card with extremely reasonable pricing.
Also, consider any costs for the infrastructure you run it on, like cloud servers.
Are there any version restrictions for NetworkX, cuGraph, or ArangoDB?
The ArangoDB NetworkX Persistence Layer requires the following dependencies:
- Python >= 3.10
- NetworkX >=3.3
- ArangoDB >= 3.10
- [Optional] NetworkX-cuGraph
- NVIDIA GPU, Volta architecture or later, with compute capability 7.0+
- CUDA 11.2, 11.4, 11.5, 11.8, or 12.0
- Python version 3.10, or 3.11
How do I get started with the integration? What do I need to download and install, and from where? What are the steps?
To get started, you’ll need to have ArangoDB installed (you can get it from the ArangoDB website). You’ll also need to have Python and NetworkX installed, which you can get through pip. The steps are pretty straightforward:
- Install ArangoDB on premise or get started with the ArangoGraph managed service.
- Set up your database.
- Install the NetworkX package.
- Install the NetworkX-ArangoDB package.
- Optional: Install the NetworkX-cuGraph package for GPU-accelerated analytics.
- Use the NetworkX-ArangoDB package to move your graphs from NetworkX into ArangoDB.
What are the infrastructure and/or computing environment requirements or best practices to configure, run, and support this integration?
You’ll need a machine or server that can handle ArangoDB and whatever size of graph data you’re working with, or AranogDB’s ArangoGraph managed service. If you’re working with extremely large datasets, you might want to consider using a server with plenty of RAM and CPU power, or even GPU support if you’re using cuGraph. Also, make sure your environment is set up with Python and the necessary libraries installed.
The ArangoDB Community Edition can be used on a personal machine to get started quickly with ArangoDB. For example, using the ArangoDB Docker image can be one of the easiest ways to get started.
Has the integration been tested and validated?
Yes, the integration has been tested in various environments to ensure it works well with NetworkX and ArangoDB. However, it’s always a good idea to test it in your own setup to make sure everything runs smoothly.
What if I want to use cuGraph in conjunction with this integration? How do I set that up
To use cuGraph with this integration, you'll need a compatible NVIDIA GPU and the cuGraph library from NVIDIA's RAPIDS suite. Install cuGraph, ensure your system is configured with CUDA, and then transfer your graph data from ArangoDB to cuGraph for GPU-accelerated processing. You can store the results back in ArangoDB once you have run computations in cuGraph. This way you can efficiently handle large graph datasets with the power of GPU acceleration.
What are the performance improvements I should expect with GPU acceleration vs. CPU?
If you’re using cuGraph along with ArangoDB, you can expect significant speed-ups when processing large graphs compared to using just a CPU. The GPU can handle massive amounts of data much faster, which is particularly useful for complex graph algorithms. Even improvements of up to 100x have been observed.
How are the multi-model capabilities of ArangoDB leveraged as part of this integration?
ArangoDB’s multi-model capabilities allow you to store your graph data alongside other types of data, like documents, full-text, or key/value pairs, all in the same database. This means you can run powerful queries that involve not just your graph, but also other related data, making it easier to analyze everything in one place. Learn more here about one of ArangoDB’s most meaningful differentiators - multi-model.
What else can I do with the persisted data once it is in ArangoDB?
Once your graph data is in ArangoDB, you can do a lot with it. You can run powerful AQL queries, combine it with other data types, visualize relationships, and even integrate it with other applications. The data is stored persistently, so you can keep building on it over time without starting from scratch.
Are there any data security considerations or benefits when using ArangoDB as a persistent graph data store?
ArangoDB offers several security features like encryption, access controls, and auditing, which help keep your data secure. If you’re working with sensitive data, these features are definitely something to take advantage of.
How can persisting graph data within ArangoDB enable me to run graph-based applications in a production environment?
Persisting your graph data in ArangoDB makes it easier to build and maintain graph-based applications because your data is stored in a reliable, scalable database. You can integrate it with other parts of your applications, run queries efficiently, and ensure that everything stays up and running smoothly.
How do I scale out this architecture for larger and larger graph datasets?
To scale out, you can take advantage of ArangoDB’s clustering capabilities, which allow you to dynamically distribute your data across multiple nodes. This helps handle larger datasets by balancing the load and ensuring your system remains responsive as you scale up. ArangoDB handles horizontal scaling in a way no other graph database can. In fact, ArangoDB’s unique approach to scaling (both vertically and horizontally) makes it an extremely unique solution for NetworkX persistence when working with very large graph datasets.