How ArangGraphML Leverages Intel’s PyG Optimizations

ArangoGraphML + Intel: Next-level Machine Learning Accelerated

ArangoDB and Intel have announced a groundbreaking partnership to enhance Graph Machine Learning (GraphML) using Intel's high-performance processors. This collaboration, part of the Intel Disruptor Program, will seek to integrate ArangoDB's graph database solutions with Intel's Xeon CPU. This synergy promises to revolutionize data analytics and pattern recognition in complex graph structures, marking a new era in database technology and GraphML advancements.

ArangoGraphML

ArangoGraphML, part of ArangoDB's suite, is an advanced graph machine learning platform designed for efficient data analysis and pattern recognition in complex graph structures, leveraging graph database technology to drive innovation in data intelligence and analytics.

Machine Learning Performance Challenge

The quest for speed in machine learning platforms is unending. By delving into Intel’s PyG optimizations, we aim to harness the power of CPU performance enhancements specifically tailored for Graph Neural Network and PyG workloads. As ArangoGraphML is leveraging PyG, any performance improvement is relevant for us and our customers. This exploration is not only about benchmarking Intel’s PyG optimizations but also about internal testing to measure their impact on our platform.

PyG benchmark

Our focus lies on gauging the performance of GraphML algorithms within our platform using torch.compile. This method allows us to assess the efficiency gains brought about by Intel’s PyG optimizations during the training and inference time, providing insights into the tangible benefits for our users.

Benchmark methodology

To ensure a robust evaluation, we conducted tests under controlled conditions:

  • System Specifications: We have used an AWS EC2 instance specifically t2.2xlarge with 8 vCPUs and 32 GiB RAM.
  • Dataset: We have used ogb-products dataset which is a large-scale undirected and unweighted graph, representing an Amazon product co-purchasing network. The task is to predict the category of a product in a multi-class classification setup, where the 47 top-level categories are used for target labels. This dataset highlights its relevance to real-world scenarios.
  • Batch Size, Hidden Layers, and Number of Layers: We have experimented with different essential hyper-parameters in evaluating the performance of GraphML algorithms.

The outcomes

In our preliminary assessments, we observed a noteworthy increase in performance, achieving a speedup of up to 20%. The gains were evident when comparing the execution times of GraphML algorithms with and without Intel’s PyG optimizations. The results are presented graphically in the chart below and summarized in the accompanying table.

chart

Batch SizeHidden
Channels
Layers ModeMedian Time
per Epoch (in seconds)
Speed up
10242562Eager153.803
10242562Compile134.106
1.15x
512642Eager89.039
512642Compile98.714
1.11x
5121283Eager
5121283Compile
1.12x

Conclusion

With a demonstrated performance boost, we are now leveraging Intel’s PyG optimizations across our platform. This commitment aligns with our dedication to providing users with cutting-edge technology and optimized algorithms for their Graph Neural Network workflows.

As the field of machine learning continues to evolve, ArangoGraphML remains at the forefront, leveraging Intel’s PyTorch Geometric optimizations to ensure our users experience the fastest and most efficient ML platform available.

Stay tuned for further updates on our journey toward excellence in Graph Machine Learning!

More info...

ArangoDB’s Exciting Updates: Introducing Our Developer Hub and GenAI Bots!

Estimated reading time: 3 minutes

At ArangoDB, our commitment to empowering developers and data enthusiasts with cutting-edge tools and resources is unwavering. In line with our commitment to “Graph Done Simple,” we are thrilled to unveil two groundbreaking additions to our arsenal that promise to revolutionize your experience with our multi-model graph database.

Developer Hub: Where Knowledge Meets Accessibility

We’ve always believed in the power of community-driven knowledge sharing, and we are proud to present our brand-new Developer Hub, accessible at developer.arangodb.com. This hub is a testament to our dedication to creating an ecosystem that empowers you with the knowledge and resources you need.

(more…)
More info...

Evolving ArangoDB’s Licensing Model for a Sustainable Future

Estimated reading time: 3 minutes

ArangoDB as a company is firmly grounded in Open Source. The first commit was made in October 2011, and today, we are very proud of having over 13,000 stargazers on GitHub. We believe that the ArangoDB community should be able to enjoy all of the benefits of using ArangoDB, and we have always offered a completely free community edition in addition to our paid enterprise offering.

With the evolving landscape of database technologies and the imperative need to ensure ArangoDB remains sustainable, innovative, and competitive, we’re introducing some changes to our licensing model. These alterations will help us continue our commitment to the community, fuel further development, and assist businesses in obtaining the best from our platform.
These alterations are based on changes in the broader database market.

(more…)
More info...

ArangoGraph Now Available on AWS Marketplace

Estimated reading time: 1 minute

Today we are excited to announce that ArangoGraph, the ArangoDB Managed Service, is available for purchase in the AWS Marketplace. With this announcement, ArangoGraph can now be purchased directly via both AWS and GCP.

The AWS Marketplace provides an extensive catalog of software solutions for users to easily explore, test, buy, and deploy on AWS. If you’re an AWS customer, here’s what this announcement means for you:

(more…)
More info...

Bridging Knowledge and Language: ArangoDB Empowers Large Language Models for Real-World Applications

Estimated reading time: 5 minutes

Understanding Large Language Models (LLMs) and Knowledge Graphs

Today, two very different technology concepts have become prominent in data analysis and predictive analytics: Knowledge Graphs and Large Language Models (LLMs). These domains each have their unique benefits, and influence the ways that we engage with and derive meaningful insights from constantly expanding and complex datasets.  They are like the Odd Couple – better together than on their own!

(more…)
More info...

Three Ways to Scale your Graph

Estimated reading time: 10 minutes

As businesses grow and their data needs increase, they often face the challenge of scaling their database systems to keep up with the increasing demand.

What happens when your single server machine is no longer sufficient to store your graph that has grown too large? Or when your instance can no longer cope with the increasing amount of user requests coming in?

Read more
More info...

May 2023: What’s the Latest with ArangoDB?

Estimated reading time: 4 minutes

Welcome to the May ArangoDB newsletter. Thank you for reading! 📖 

Here are some of the things we’re excited to share with you this month:

Read more
More info...

Graph and Entity Resolution Against Cyber Fraud

Estimated reading time: 4 minutes

With the growing prevalence of the internet in our daily lives, the risks of malware, ransomware, and other cyber fraud are rising. The digital nature of these attacks makes it very easy for fraudsters to scale by creating thousands of accounts, so even if one is identified, they can continue their attacks.
In this blog post, we will discuss how graph and entity resolution (ER) can help us battle these risks across different industries such as healthcare, finance, and e-commerce (for example, the US healthcare system alone can save $300 billion a year with entity resolution). You will also receive hands-on experience with entity resolution on ArangoDB.

Read more
More info...

Combat Fraud with Graph

Estimated reading time: 5 minutes

Fraud is one of the most significant issues facing businesses today. While companies have always faced fraud, detecting fraudulent activity has become even more challenging due to increased online transactions. Globally, fraud results in more than $3.7 trillion in annual losses (Murphy, 2022). Fraud comes in numerous forms, including but not limited to money laundering, identity theft, account takeover, and payment fraud. Due to the variety of ways companies can face fraud, they must have a system to protect themselves and their customers.

Read more
More info...

February 2023: What’s the Latest with ArangoDB?

Estimated reading time: 4 minutes

Welcome to the ArangoDB newsletter for February 2023. Thank you for reading! 📖 

Here are the things we’re most excited about this month:

Read more
More info...

Get the latest tutorials,
blog posts and news: