From Native Multi-Model Graph Database to GenAI Data Platform: Arango’s Next Chapter

Authors:

Shekhar Iyer, CEO, Arango and Ravi Marwaha, CPO & CTO, Arango

Multi Model Database to GenAI Data Platform

 

A New Chapter for Arango

In this webinar we explored Arango’s next chapter evolving from a multi-model database into a GenAI data platform built for speed, scale, and flexibility.

For over a decade, Arango has been known as the most scalable, lowest TCO and developer friendly/Application friendly multi-model graph database. But the world has changed, and we can all thank ChatGPT 2022 for that..

 

“Just like the internet reshaped every industry 30 years ago, GenAI will redefine how businesses operate in the next decade.”

– Shekhar Iyer

 

GenAI is no longer a buzzword it’s reshaping how enterprises, startups, and innovators design applications. With this shift comes new requirements:

  • Managing structured and unstructured data together, 
  • Unifying search and vector queries, 
  • Scaling without compromise, and 
  • Delivering results at the speed business demands.

That’s the transformation we’re leading.

success with large and medium enterprises globallyProven success with large and medium enterprises globally

 

Why This Matters Now

Every technology wave comes with skepticism. In the 1990s, people debated whether the internet was hype or reality. Today, nobody doubts its impact. GenAI is at the same inflection point.

But here’s the challenge: 95% of GenAI projects fail (MIT study). The reasons?

  • Data silos delay progress.
  • Fragile integrations break under load.
  • Legacy infrastructure can’t scale to the needs of LLMs and agentic AI.

Customers from the world’s largest enterprises like NVIDIA and HPE to fast-moving startups tell us they need something different. They need a unified foundation.

Data Lifecycle Data ArchitectureArangoDB : Data Lifecycle & Data Architecture

 

“If we only talk about bits and bytes, we’ve missed the point our goal is accelerating business value.”

– Shekhar Iyer

 

At Arango, we believe the answer lies in 3 core principles:

  1. Simplicity: Natively multi-model, supporting vector, graph, search, document, and key-value.
  2. Scalability: Horizontally and vertically, with GPU acceleration built in.
  3. Deployment Flexibility: On-prem, embedded, or cloud with cost-effective, enterprise-ready models.

Key Differentiation ValueKey Differentiation & Value

 

Solving Complexity with One Unified Platform

For developers and architects, the reality of building GenAI apps today is daunting. You’re stitching together SQL, NoSQL, graph, and vector databases all scaling differently, each with its own APIs and languages. That’s fragile, costly, and slow. With Arango GenAI data platform, we want to increase your developer productivity to 10x and increase your speed to market by 10x and we plan to continually work with you to make that possible.

Arango takes a different approach. From day one, we built a native multi-model database. Today, that foundation means we can unify:

  • Vector search for embeddings
  • Graph for context and relationships
  • Document & key-value for flexibility
  • Search for discoverability
  • Time-travel queries for history and compliance
  • GPU acceleration for scale
  • All in a single engine, through one query language
  • Various tools/Frameworks/GenAI toolsets/MCPs/Agentic workflow

 

“If you try to stitch together SQL, NoSQL, vector, graph, and search, along with various tools, MCPs, Frameworks, data pipelines, and be able to maintain, build observability, scalability, you’re managing 20 or so different pieces of technologies that all scale differently. With Arango, it’s unified in one Platform.”

– Ravi Marwaha

 

A Simplistic Agentic ArchitectureA Simplistic Agentic Data Infrastructure (Middle Layer)

 

Introducing the GenAI Suite

We’re also unveiling our GenAI Suite designed to accelerate projects whether you’re building:

  • Chatbots and copilots for employees or customers
  • Front-office and back-office automation powered by GraphRAG and LLMs
  • Agentic workflows that bring intelligence into everyday processes

With prebuilt integrations, GraphRAG blueprints, and support for LLMs and domain-specific models, the GenAI Suite reduces the burden of wiring everything together so you can focus on building value.

 

“Complexity kills ROI. GenAI projects fail when the data isn’t ready, the tools aren’t integrated, and the systems don’t scale and you can’t maintain or manage upgrades and changes.”

– Ravi Marwaha

 

Deployment on Your Terms

Flexibility doesn’t stop at the data model or how many pieces of technology teams have to stitch together. Arango offers multiple deployment options on-prem, embedded in your stack, or fully managed in the cloud all accessible through a single unified interface.

That means whether you’re a startup with a clean slate or a global enterprise with legacy systems, or a startup building a net new application, you can plug Arango into your environment without the integration nightmare and enable scenarios and capabilities that are at least 10x more challenging with Relational database approaches.

 

Why We’re Excited

 

“Simplify, scale, and speed-to-value that’s the mission behind Arango’s evolution into a GenAI data platform.”

– Shekhar Iyer

 

Why ArangoWhy Arango?

 

This is more than a product evolution. It’s about enabling enterprises and innovators to succeed where others fail to move from experimentation to production faster, with less risk, and with greater impact.

For our existing customers who know us for our best-in-class graph capabilities, we’re expanding to meet your GenAI needs. For developers and architects exploring the space: this is your chance to build on a unified, enterprise-ready foundation.

 

Watch the Replay & Connect With Us

This blog is just the beginning.

👉 Watch the full webinar replay to hear the complete Fireside Chat with Shekhar and Ravi.

👉 Contact us to discuss how we can help you get started with the right data foundation before your next GenAI projects.

More info...

An Introduction to Geo Indexes and their performance characteristics: Part I

Starting with the mass-market availability of smartphones and continuing with IoT devices, self-driving cars ever more data is generated with geo information attached to it. Analyzing this data in real-time requires the use of clever indexing data-structures. Geo data in ArangoDB consists of 2 or more dimensions representing (x, y) coordinates on the earth surface. Searching on a single number is essentially a solved problem, but effectively searching on multi-dimensional data can be more difficult as standard indexing techniques cannot be used.
Read more

More info...

Setting up Datacenter to Datacenter Replication in ArangoDB

Please note that this tutorial is valid for the ArangoDB 3.3 milestone 1 version of DC to DC replication!

Interested in trying out ArangoDB? Fire up your cluster in just a few clicks with ArangoDB ArangoGraph: the Cloud Service for ArangoDB. Start your free 14-day trial here

This milestone release contains data-center to data-center replication as an enterprise feature. This is a preview of the upcoming 3.3 release and is not considered production-ready.

In order to prepare for a major disaster, you can setup a backup data center that will take over operations if the primary data center goes down. For a server failure, the resilience features of ArangoDB can be used. Data center to data center is used to handle the failure of a complete data center.

Data is transported between data-centers using a message queue. The current implementation uses Apache Kafka as message queue. Apache Kafka is a commonly used open source message queue which is capable of handling multiple data-centers. However, the ArangoDB replication is not tied to Apache Kafka. We plan to support different message queues systems in the future.

The following contains a high-level description how to setup data-center to data-center replication. Detailed instructions for specific operating systems will follow shortly. Read more

More info...

ArangoDB: Consensus for Enhanced Data Stability

nihil novi nisi commune consensu
nothing new unless by the common consensus

– law of the polish-lithuanian common-wealth, 1505

A warning aforehand: this is a rather longish post, but hang in there it might be saving you a lot of time one day.

Introduction

Consensus has its etymological roots in the latin verb consentire, which comes as no surprise to mean to consent, to agree. As old as the verb equally old is the concept in the brief history of computer science. It designates a crucial necessity of distributed appliances. More fundamentally, consensus wants to provide a fault-tolerant distributed animal brain to higher level appliances such as deployed cluster file systems, currency exchange systems, or specifically in our case distributed databases, etc. Read more

More info...

Running ArangoDB 3.0.0 on DC/OS Cluster

As you surely recognized we´ve released ArangoDB 3.0 a few days ago. It comes with great cluster improvements like synchronous replication, automatic failover, easy up- and downscaling via the graphical user interface and with lots of other improvements. Furthermore, ArangoDB 3 is even better integrated with Apache Mesos and DC/OS. Read more

More info...

DC/OS: Modernizing Distributed Database Management

The mission of ArangoDB is to simplify the complexity of data work. ArangoDB is a distributed native multi-model NoSQL database that supports JSON documents, graphs and key-value pairs in one database engine with one query language. The cluster management is based on Apache Mesos, a battle-hardened technology. With the launch of DC/OS by a community of more than 50 companies all ArangoDB users can easily scale. Read more

More info...

Index Free Adjacency or Hybrid Indexes for Graph Databases

Some graph database vendors propagandize index-free adjacency for the implementation of graph models. There has been some discussion on Wikipedia about what makes a database a graph database. These vendors tried to push the definition of index-free adjacency as foundation of graph databases, but were stopped by the community.
Read more

More info...

Enhanced Deadlock Detection: Improving ArangoDB Performance

The upcoming ArangoDB version 2.8 (currently in devel) will provide a much better deadlock detection mechanism than its predecessors.

The new deadlock detection mechanism will kick in automatically when it detects operations that are mutually waiting for each other. In case it finds such deadlock, it will abort one of the operations so that the others can continue and overall progress can be made. Read more

More info...

Efficient Lock-Free Data Structure Protection | ArangoDB Blog

Motivation

In multi-threaded applications running on multi-core systems, it occurs often that there are certain data structures, which are frequently read but relatively seldom changed. An example of this would be a database server that has a list of databases that changes rarely, but needs to be consulted for every single query hitting the database. In such situations one needs to guarantee fast read access as well as protection against inconsistencies, use after free and memory leaks.

Therefore we seek a lock-free protection mechanism that scales to lots of threads on modern machines and uses only C++11 standard library methods. The mechanism should be easy to use and easy to understand and prove correct. This article presents a solution to this, which is probably not new, but which we still did not find anywhere else.

The concrete challenge at hand

Assume a global data structure on the heap and a single atomic pointer P to it. If (fast) readers access this completely unprotected, then a (slow) writer can create a completely new data structure and then change the pointer to the new structure with an atomic operation. Since writing is not time critical, one can easily use a mutex to ensure that there is only a single writer at any given time. The only problem is to decide, when it is safe to destruct the old value, because the writer cannot easily know that no reader is still accessing the old values. The challenge is aggravated by the fact that without thread synchronization it is unclear, when a reader actually sees the new pointer value, in particular on a multi-core machine with a complex system of caches.

If you want to see our solution directly, scroll down to “Source code links“. We first present a classical good approach and then try to improve on it. (more…)

More info...

Running V8 Isolates in Multi-Threaded ArangoDB

ArangoDB allows running user-defined JavaScript code in the database. This can be used for more complex, stored procedures-like database operations. Additionally, ArangoDB’s Foxx framework can be used to make any database functionality available via an HTTP REST API. It’s easy to build data-centric microservices with it, using the scripting functionality for tasks like access control, data validation, sanitation etc.

We often get asked how the scripting functionality is implemented under the hood. Additionally, several people have asked how ArangoDB’s JavaScript functionality relates to node.js.

This post tries to explain that in detail.

(more…)

More info...

Get the latest tutorials,
blog posts and news: