The world is a graph: How Fix reimagines cloud security using a graph in ArangoDB
'Guest Blog'
Estimated reading time: 5 minutes
In 2015, John Lambers, a Corporate Vice President and Security Fellow at Microsoft wrote “Defenders think in lists. Attackers think in graphs. As long as this is true, attackers win.ˮ
The original problem in cloud security is visibility into my assets. If security engineers donʼt know what cloud services are running, they canʼt protect an environment. Unfortunately, first generation cloud security products were built with a list mindset, i.e. “rows and columnsˮ. They generate a list of assets and their configurations – but show no context of the relationships between connected cloud services, such as as a connection that would allow lateral movement between two disparate cloud assets.
Cloud security as a graph
A graph database like ArangoDB provides a powerful way to represent and analyze complex relationships in cloud security.
A graph is the easiest way to understand how one entity in my cloud interacts with another. By representing cloud assets as nodes in a graph and the relationships between them as vertices, I can now gain a better understanding of the nested connections in my cloud infrastructure.
By thinking about cloud resources in terms of ancestors and descendants, a cloud security engineer can solve problems in a way a table canʼt. The graph is an easier way to visualize the relationships between users and any of my cloud resources such as compute instances, functions, storage buckets and databases.
- Ancestors: The graph helps me understand the root of a security issue. What is the highest ancestor where an issue was introduced? Because I need to go all the way up and fix the problem at its origin.
- Descendants: The other way around is understanding descendants and blast radius. If I have an Internet-exposed compute instance, where an attacker is maybe able to get credentials off that instance, how many hops can that attacker go in? How much of my infrastructure is exposed due to this initial compromise?
In a cloud-native world, these graph traversal capabilities are fundamental for cloud security. Going forward, any operating model for cloud security should be built on a graph. With Fix, weʼre building such a modern cloud security tool, and weʼre building it with ArangoDB.
But first, a list!
Now that we covered the benefits of using a graph for cloud security, letʼs start with a list. Yes, a list – because sometimes, viewing my cloud assets in a graph might not be the most intuitive or useful thing.
For example, I may just want a list of my compute instance inventory across my AWS accounts. As a cloud security engineer, I want a baseline inventory of resources. I don’t really need a picture for that, I just want the list. And maybe I want to download it in a spreadsheet so I can slice and dice it, with metadata for each particular instance like create date, number of vCPUs and memory. A list is the best way to represent that information.
But if a list is enough, why collect data in a graph in the first place?
Because transformation from a graph to a table is trivial. The other way around, not so much. The graph lets you express things in a way that if you had the same data in a flat table, it would become intractable, with many different tables, foreign key relationships, and creating all kinds of joints all over the place. It just becomes too difficult to reason about.
The hard part is collecting data from cloud APIs and putting it into a graph form. Thatʼs much harder, takes time and is easy to get wrong. There are enough opportunities to make mistakes along the way, and create a representation thatʼs not correct or has bugs. Thatʼs why we believe transparency in how a cloud security product collects data matters. Both ArangoDB and Fix are open source. Our code shows how we collect and store data from cloud APIs in ArangoDB.
Graph-based analysis of cloud resources
The analysis layer of a graph is powerful because it can provide insights that tables cannot. One recent trend in security is that software engineers also take on security engineering tasks. They look after the security of their infrastructure, beyond infrastructure-as-code templates.
While Fix offers out-of-the-box visualizations and pre-built checks of compliance rules, weʼve also built a search syntax on top of the ArangoDB Query Language (AQL). With ArangoDB and AQL, I can store and query rich nested JSON-like document together with their vertices. Itʼs also easier to add and query metadata to the vertices – such as configuration data for a cloud resource. By building our syntax on top of AQL, weʼve made Fix human-friendly. Developers can easily run ad-hoc checks of the security posture of their infrastructure.
For example, activating flow logs in your VPCs is considered a security best practice by AWS. The search below finds all AWS VPCs where flow flogs are deactivated.
is(aws_vpc) with(empty, --> is(aws_ec2_flow_log))
Breaking it down, the search:
- first, finds all resources of the kind “aws_vpcˮ, no matter in which account or region they may run.
- then, filters for the VPCs without a direct relationship (successor) to an “aws_ec2_flow_logˮ resource.
A simple one line statement.
The same query expressed in SQL would require joining different tables with nested select statements, multiple where-clauses and case statements. It would be dozens of lines long and require an engineer to have knowledge of the table architecture and column names.
The power of a graph is that it lets you explore many-to-many relationships in a very easy way, in a way that a traditional row-based database just canʼt. By making security data from cloud resources available in a graph, software engineers with security responsibilities can gain visibility into the environment and reduce risks.
A graph provides context, context is king
The partnership between Fix and the ArangoDB team has brought our customers new security insights only made possible by the multi-dimensional relations of cloud resources stored in a graph. With ArangoDB, using graphs is no longer a complex computer science and operational challenge. For Fix, ArangoDB provides a graph database as a building block that makes it easy to store and query the relationships in your data.
Fix uses ArangoDB to analyze billions of relationships – in every cloud. With ArangoDB, weʼve been able to build a system that can ingest data at scale. One of our retail users ingests data from tens of thousands of cloud accounts in minutes, and then runs any type of analytics in a fraction of a second. The context of the graph helps security engineers to precisely answer questions and identify, prioritize and remediate risks – the “trifectaˮ of cloud security.
The precision, speed, and explainability of finding risks to your business is simply not possible without using a graph. When defenders can think in graphs, attackers lose.
Three Ways to Scale your Graph
Estimated reading time: 10 minutes
As businesses grow and their data needs increase, they often face the challenge of scaling their database systems to keep up with the increasing demand.
What happens when your single server machine is no longer sufficient to store your graph that has grown too large? Or when your instance can no longer cope with the increasing amount of user requests coming in?
Read moreGraph and Entity Resolution Against Cyber Fraud
Estimated reading time: 4 minutes
With the growing prevalence of the internet in our daily lives, the risks of malware, ransomware, and other cyber fraud are rising. The digital nature of these attacks makes it very easy for fraudsters to scale by creating thousands of accounts, so even if one is identified, they can continue their attacks.
In this blog post, we will discuss how graph and entity resolution (ER) can help us battle these risks across different industries such as healthcare, finance, and e-commerce (for example, the US healthcare system alone can save $300 billion a year with entity resolution). You will also receive hands-on experience with entity resolution on ArangoDB.
Combat Fraud with Graph
Estimated reading time: 5 minutes
Fraud is one of the most significant issues facing businesses today. While companies have always faced fraud, detecting fraudulent activity has become even more challenging due to increased online transactions. Globally, fraud results in more than $3.7 trillion in annual losses (Murphy, 2022). Fraud comes in numerous forms, including but not limited to money laundering, identity theft, account takeover, and payment fraud. Due to the variety of ways companies can face fraud, they must have a system to protect themselves and their customers.
Read moreWhy Should You Care About SOC 2?
And by the way, ArangoDB is SOC 2 compliant!
Estimated reading time: 3 minutes

While driving along California’s Highway 101 and its billboards, compliance and SOC 2 seem to be an omnipresent – yet challenging – topic. But is it really? And if so, why? In this blog post, we want to share why and how ArangoDB has become SOC 2 compliant.
Read moreArangoDB 3.7 – A Big Step Forward for Multi-Model
Estimated reading time: 7 minutes

When our founders realized that data models can be features, we at ArangoDB set ourselves the big goal of developing the most flexible database. With today’s GA release of ArangoDB 3.7, the project reached an important milestone on this journey.
(more…)Building AQL Query Strings: Tips and Best Practices | ArangoDB Blog
I recently wrote two recipes about generating AQL query strings. They are contained in the ArangoDB cookbook by now:
After that, Github user tracker1 suggested in Github issue 1457 to take the ES6 template string variant even further, using a generator function for string building, and also using promises and ES7 async/await.
We can’t use ES7 async/await in ArangoDB at the moment due to lacking support in V8, but the suggested template string generator function seemed to be an obvious improvement that deserved inclusion in ArangoDB.
Basically, the suggestion is to use regular JavaScript variables/expressions in the template string and have them substituted safely.
With regular AQL bind parameters, a query looks like this:
var bindVars = { name: "test" };
var query = `FOR doc IN collection
FILTER doc.name == @name
RETURN doc._key`;
db._query(query, bindVars);
This is immune to parameter injection, because the query string and the bind parameter value are passed in separately. But it’s not very ES6-y.
Efficient Lock-Free Data Structure Protection | ArangoDB Blog
Motivation
In multi-threaded applications running on multi-core systems, it occurs often that there are certain data structures, which are frequently read but relatively seldom changed. An example of this would be a database server that has a list of databases that changes rarely, but needs to be consulted for every single query hitting the database. In such situations one needs to guarantee fast read access as well as protection against inconsistencies, use after free and memory leaks.
Therefore we seek a lock-free protection mechanism that scales to lots of threads on modern machines and uses only C++11 standard library methods. The mechanism should be easy to use and easy to understand and prove correct. This article presents a solution to this, which is probably not new, but which we still did not find anywhere else.
The concrete challenge at hand
Assume a global data structure on the heap and a single atomic pointer P to it. If (fast) readers access this completely unprotected, then a (slow) writer can create a completely new data structure and then change the pointer to the new structure with an atomic operation. Since writing is not time critical, one can easily use a mutex to ensure that there is only a single writer at any given time. The only problem is to decide, when it is safe to destruct the old value, because the writer cannot easily know that no reader is still accessing the old values. The challenge is aggravated by the fact that without thread synchronization it is unclear, when a reader actually sees the new pointer value, in particular on a multi-core machine with a complex system of caches.
If you want to see our solution directly, scroll down to “Source code links“. We first present a classical good approach and then try to improve on it. (more…)
Securing your Foxx with API Keys
ArangoDB’s Foxx allows you to easily build an API to access your data sources. But now this API is either public or restricted to users having an account, but those still get unlimited access.
In many use cases you do not want to expose your data in this fashion, but you want to expose it with a more controllable access pattern and want to restrict the requests one user could issue in a certain time period. Popular examples for these API restrictions are Twitter or Facebook. This allows you to offer all of your data but only in limited chunks, and then possibly charge your customers to increase the chunk limit they can request.
All this is done via API keys, which are bound to a user and has become a common pattern to monetize the data you have collected. (more…)
Get the latest tutorials,
blog posts and news:
Thanks for subscribing! Please check your email for further instructions.