The world is a graph: How Fix reimagines cloud security using a graph in ArangoDB
'Guest Blog'
Estimated reading time: 5 minutes
In 2015, John Lambers, a Corporate Vice President and Security Fellow at Microsoft wrote “Defenders think in lists. Attackers think in graphs. As long as this is true, attackers win.ˮ
The original problem in cloud security is visibility into my assets. If security engineers donʼt know what cloud services are running, they canʼt protect an environment. Unfortunately, first generation cloud security products were built with a list mindset, i.e. “rows and columnsˮ. They generate a list of assets and their configurations – but show no context of the relationships between connected cloud services, such as as a connection that would allow lateral movement between two disparate cloud assets.
Cloud security as a graph
A graph database like ArangoDB provides a powerful way to represent and analyze complex relationships in cloud security.
A graph is the easiest way to understand how one entity in my cloud interacts with another. By representing cloud assets as nodes in a graph and the relationships between them as vertices, I can now gain a better understanding of the nested connections in my cloud infrastructure.
By thinking about cloud resources in terms of ancestors and descendants, a cloud security engineer can solve problems in a way a table canʼt. The graph is an easier way to visualize the relationships between users and any of my cloud resources such as compute instances, functions, storage buckets and databases.
- Ancestors: The graph helps me understand the root of a security issue. What is the highest ancestor where an issue was introduced? Because I need to go all the way up and fix the problem at its origin.
- Descendants: The other way around is understanding descendants and blast radius. If I have an Internet-exposed compute instance, where an attacker is maybe able to get credentials off that instance, how many hops can that attacker go in? How much of my infrastructure is exposed due to this initial compromise?
In a cloud-native world, these graph traversal capabilities are fundamental for cloud security. Going forward, any operating model for cloud security should be built on a graph. With Fix, weʼre building such a modern cloud security tool, and weʼre building it with ArangoDB.
But first, a list!
Now that we covered the benefits of using a graph for cloud security, letʼs start with a list. Yes, a list – because sometimes, viewing my cloud assets in a graph might not be the most intuitive or useful thing.
For example, I may just want a list of my compute instance inventory across my AWS accounts. As a cloud security engineer, I want a baseline inventory of resources. I don’t really need a picture for that, I just want the list. And maybe I want to download it in a spreadsheet so I can slice and dice it, with metadata for each particular instance like create date, number of vCPUs and memory. A list is the best way to represent that information.
But if a list is enough, why collect data in a graph in the first place?
Because transformation from a graph to a table is trivial. The other way around, not so much. The graph lets you express things in a way that if you had the same data in a flat table, it would become intractable, with many different tables, foreign key relationships, and creating all kinds of joints all over the place. It just becomes too difficult to reason about.
The hard part is collecting data from cloud APIs and putting it into a graph form. Thatʼs much harder, takes time and is easy to get wrong. There are enough opportunities to make mistakes along the way, and create a representation thatʼs not correct or has bugs. Thatʼs why we believe transparency in how a cloud security product collects data matters. Both ArangoDB and Fix are open source. Our code shows how we collect and store data from cloud APIs in ArangoDB.
Graph-based analysis of cloud resources
The analysis layer of a graph is powerful because it can provide insights that tables cannot. One recent trend in security is that software engineers also take on security engineering tasks. They look after the security of their infrastructure, beyond infrastructure-as-code templates.
While Fix offers out-of-the-box visualizations and pre-built checks of compliance rules, weʼve also built a search syntax on top of the ArangoDB Query Language (AQL). With ArangoDB and AQL, I can store and query rich nested JSON-like document together with their vertices. Itʼs also easier to add and query metadata to the vertices – such as configuration data for a cloud resource. By building our syntax on top of AQL, weʼve made Fix human-friendly. Developers can easily run ad-hoc checks of the security posture of their infrastructure.
For example, activating flow logs in your VPCs is considered a security best practice by AWS. The search below finds all AWS VPCs where flow flogs are deactivated.
is(aws_vpc) with(empty, --> is(aws_ec2_flow_log))
Breaking it down, the search:
- first, finds all resources of the kind “aws_vpcˮ, no matter in which account or region they may run.
- then, filters for the VPCs without a direct relationship (successor) to an “aws_ec2_flow_logˮ resource.
A simple one line statement.
The same query expressed in SQL would require joining different tables with nested select statements, multiple where-clauses and case statements. It would be dozens of lines long and require an engineer to have knowledge of the table architecture and column names.
The power of a graph is that it lets you explore many-to-many relationships in a very easy way, in a way that a traditional row-based database just canʼt. By making security data from cloud resources available in a graph, software engineers with security responsibilities can gain visibility into the environment and reduce risks.
A graph provides context, context is king
The partnership between Fix and the ArangoDB team has brought our customers new security insights only made possible by the multi-dimensional relations of cloud resources stored in a graph. With ArangoDB, using graphs is no longer a complex computer science and operational challenge. For Fix, ArangoDB provides a graph database as a building block that makes it easy to store and query the relationships in your data.
Fix uses ArangoDB to analyze billions of relationships – in every cloud. With ArangoDB, weʼve been able to build a system that can ingest data at scale. One of our retail users ingests data from tens of thousands of cloud accounts in minutes, and then runs any type of analytics in a fraction of a second. The context of the graph helps security engineers to precisely answer questions and identify, prioritize and remediate risks – the “trifectaˮ of cloud security.
The precision, speed, and explainability of finding risks to your business is simply not possible without using a graph. When defenders can think in graphs, attackers lose.
Reintroducing the ArangoDB-RDF Adapter
Introducing ArangoDB’s Data Loader : Revolutionizing Your Data Migration Experience
Estimated reading time: 7 minutes
At ArangoDB, our commitment to empowering companies, developers, and data enthusiasts with cutting edge tools and resources remains unwavering. Today, we’re thrilled to unveil our latest innovation, the Data Loader, a game-changing feature designed to simplify and streamline the migration of relational databases to ArangoGraph. Let’s dive into what makes Data Loader a must-have tool for your data migration needs.
(more…)How ArangGraphML Leverages Intel’s PyG Optimizations
ArangoGraphML + Intel: Next-level Machine Learning Accelerated
ArangoDB and Intel have announced a groundbreaking partnership to enhance Graph Machine Learning (GraphML) using Intel's high-performance processors. This collaboration, part of the Intel Disruptor Program, will seek to integrate ArangoDB's graph database solutions with Intel's Xeon CPU. This synergy promises to revolutionize data analytics and pattern recognition in complex graph structures, marking a new era in database technology and GraphML advancements.
ArangoGraphML
ArangoGraphML, part of ArangoDB's suite, is an advanced graph machine learning platform designed for efficient data analysis and pattern recognition in complex graph structures, leveraging graph database technology to drive innovation in data intelligence and analytics.
Machine Learning Performance Challenge
The quest for speed in machine learning platforms is unending. By delving into Intel’s PyG optimizations, we aim to harness the power of CPU performance enhancements specifically tailored for Graph Neural Network and PyG workloads. As ArangoGraphML is leveraging PyG, any performance improvement is relevant for us and our customers. This exploration is not only about benchmarking Intel’s PyG optimizations but also about internal testing to measure their impact on our platform.
PyG benchmark
Our focus lies on gauging the performance of GraphML algorithms within our platform using torch.compile. This method allows us to assess the efficiency gains brought about by Intel’s PyG optimizations during the training and inference time, providing insights into the tangible benefits for our users.
Benchmark methodology
To ensure a robust evaluation, we conducted tests under controlled conditions:
- System Specifications: We have used an AWS EC2 instance specifically t2.2xlarge with 8 vCPUs and 32 GiB RAM.
- Dataset: We have used ogb-products dataset which is a large-scale undirected and unweighted graph, representing an Amazon product co-purchasing network. The task is to predict the category of a product in a multi-class classification setup, where the 47 top-level categories are used for target labels. This dataset highlights its relevance to real-world scenarios.
- Batch Size, Hidden Layers, and Number of Layers: We have experimented with different essential hyper-parameters in evaluating the performance of GraphML algorithms.
The outcomes
In our preliminary assessments, we observed a noteworthy increase in performance, achieving a speedup of up to 20%. The gains were evident when comparing the execution times of GraphML algorithms with and without Intel’s PyG optimizations. The results are presented graphically in the chart below and summarized in the accompanying table.

Batch Size | Hidden Channels | Layers | Mode | Median Time per Epoch (in seconds) | Speed up |
---|---|---|---|---|---|
1024 | 256 | 2 | Eager | 153.803 | |
1024 | 256 | 2 | Compile | 134.106 | |
1.15x | |||||
512 | 64 | 2 | Eager | 89.039 | |
512 | 64 | 2 | Compile | 98.714 | |
1.11x | |||||
512 | 128 | 3 | Eager | ||
512 | 128 | 3 | Compile | ||
1.12x |
Conclusion
With a demonstrated performance boost, we are now leveraging Intel’s PyG optimizations across our platform. This commitment aligns with our dedication to providing users with cutting-edge technology and optimized algorithms for their Graph Neural Network workflows.
As the field of machine learning continues to evolve, ArangoGraphML remains at the forefront, leveraging Intel’s PyTorch Geometric optimizations to ensure our users experience the fastest and most efficient ML platform available.
Stay tuned for further updates on our journey toward excellence in Graph Machine Learning!
ArangoDB’s Exciting Updates: Introducing Our Developer Hub and GenAI Bots!
Estimated reading time: 3 minutes
At ArangoDB, our commitment to empowering developers and data enthusiasts with cutting-edge tools and resources is unwavering. In line with our commitment to “Graph Done Simple,” we are thrilled to unveil two groundbreaking additions to our arsenal that promise to revolutionize your experience with our multi-model graph database.
Developer Hub: Where Knowledge Meets Accessibility
We’ve always believed in the power of community-driven knowledge sharing, and we are proud to present our brand-new Developer Hub, accessible at developer.arangodb.com. This hub is a testament to our dedication to creating an ecosystem that empowers you with the knowledge and resources you need.
(more…)Evolving ArangoDB’s Licensing Model for a Sustainable Future
Estimated reading time: 3 minutes
ArangoDB as a company is firmly grounded in Open Source. The first commit was made in October 2011, and today, we are very proud of having over 13,000 stargazers on GitHub. We believe that the ArangoDB community should be able to enjoy all of the benefits of using ArangoDB, and we have always offered a completely free community edition in addition to our paid enterprise offering.
With the evolving landscape of database technologies and the imperative need to ensure ArangoDB remains sustainable, innovative, and competitive, we’re introducing some changes to our licensing model. These alterations will help us continue our commitment to the community, fuel further development, and assist businesses in obtaining the best from our platform.
These alterations are based on changes in the broader database market.
ArangoGraph Now Available on AWS Marketplace
Estimated reading time: 1 minute
Today we are excited to announce that ArangoGraph, the ArangoDB Managed Service, is available for purchase in the AWS Marketplace. With this announcement, ArangoGraph can now be purchased directly via both AWS and GCP.
The AWS Marketplace provides an extensive catalog of software solutions for users to easily explore, test, buy, and deploy on AWS. If you’re an AWS customer, here’s what this announcement means for you:
(more…)Bridging Knowledge and Language: ArangoDB Empowers Large Language Models for Real-World Applications
Estimated reading time: 5 minutes
Understanding Large Language Models (LLMs) and Knowledge Graphs
Today, two very different technology concepts have become prominent in data analysis and predictive analytics: Knowledge Graphs and Large Language Models (LLMs). These domains each have their unique benefits, and influence the ways that we engage with and derive meaningful insights from constantly expanding and complex datasets. They are like the Odd Couple – better together than on their own!
(more…)Three Ways to Scale your Graph
Estimated reading time: 10 minutes
As businesses grow and their data needs increase, they often face the challenge of scaling their database systems to keep up with the increasing demand.
What happens when your single server machine is no longer sufficient to store your graph that has grown too large? Or when your instance can no longer cope with the increasing amount of user requests coming in?
Read moreMay 2023: What’s the Latest with ArangoDB?
Estimated reading time: 4 minutes
Welcome to the May ArangoDB newsletter. Thank you for reading! 📖
Here are some of the things we’re excited to share with you this month:
- Our upcoming webinar on ArangoDB 3.11
- Combatting fraud with graph
- How Finite State uses ArangoDB to address cyber threats
- The latest case study with Global Relay
- Our five new driver tutorials in ArangoDB University
- ArangoGraph, our cloud-based graph data and analytics platform
- Rewarding you with a $25 Amazon gift card
Get the latest tutorials,
blog posts and news:
Thanks for subscribing! Please check your email for further instructions.