Generate a Video Knowledge Graph: NVIDIA VSS Blueprint with GraphRAG on ArangoDB

September 29 2025,/Computer Vision / Video Analytics, Conversational AI / NLP, Generative AI

ArangoDB, CA-RAG, GraphRAG, Knowledge Graphs

Nvidia Blog:
How to Integrate Computer Vision Pipelines with Generative AI and Reasoning

Arango’s Role in the NVIDIA Use Case

Stores the knowledge graph: ArangoDB is used to save the graph that’s built from video captions.
Enables fast graph reasoning: With GPU acceleration (CUDA/cuGraph), ArangoDB makes it quick to connect relationships and find patterns across video data.
Supports advanced Q&A: When a user asks a question, an AI agent can traverse the graph in ArangoDB to find answers—even across multiple cameras.
Designed for scale: This setup is especially useful in large, high-throughput environments where many AI models are running at once.

Video Analytics AI Agents - an Entirely new Class of Applications. Unlock knowledge and insights from camera streams and archived videos

NVIDIA Blueprint for video search and summarization (VSS) provides a sample architecture to develop visually perceptive and interactive video analytics visual AI agents. VSS Blueprint from Metropolis combines generative AI, VLMs, LLMs, RAG, and media management services. These AI agents can be deployed throughout factories, warehouses, retail stores, airports, traffic intersections, and more - helping streamlining operations. The VSS 2.4 release makes it easy to enhance vision AI applications with generative AI through a VLM, enabling powerful new features for smart infrastructure.

VSS 2.4 introduces a major upgrade for long‑form video analytics: GraphRAG on ArangoDB. This release brings video‑first Knowledge Graph generation, hybrid retrieval combining vector search, full-text search and graph traversals, and multi‑stream ingestion.

For readers new to VSS, see the broader architecture, APIs, and features like multi‑live stream, burst ingestion, audio transcription, and CV metadata in the earlier post: Advance Video Analytics AI Agents Using the NVIDIA AI Blueprint for Video Search and Summarization.

Diagram showcasing the VSS pipeline for Data Sourcing, Content Preparation, Content Processing, KG Generation, Data Storage, KG Retrieval, and Reporting Figure 1: VSS Dataflow with ArangoDB

GraphRAG for Video Analytics

Vision language models (VLMs) made broad perception possible, but long videos can strain context windows and dilute relevance. VSS 2.4 addresses this by combining:

Semantic ranking of chunks and entities (vector search),
Structured expansion over a Knowledge Graph (relationship‑aware traversal),
Temporal stitching for coherent narratives.

This provides grounded systems that cite the who/what/where/when, maintain temporal continuity, and scale to multi‑stream deployments.

What’s new in VSS 2.4 for Graphs

Introducing ArangoDB support as a multi-model database tuned for video intelligence.
A video‑first graph schema modeling time, hierarchy, and entity relations.
Hybrid retrieval that merges semantic & lexical similarity with hop‑limited graph traversal.
Multi‑stream ingestion that preserves per‑stream metadata for flexible analytics.

What your Knowledge Graph can now capture

VSS 2.4 converts video outputs and metadata into a Knowledge Graph, expanding beyond plain text to improve retrieval precision and explainability.

Core data points and attributes
- Documents: Logical containers for videos/sessions.
- Chunks: Timestamped segments with captions/transcripts and embeddings.
- Entities: Named people, equipment, locations, and concepts with types/descriptions extracted from Chunks.
- Communities: Batch‑level or thematic summaries of sequential chunks for macro reasoning.
Temporal structure
- Start/end times per chunk for precise windows.
- CHUNK → NEXT_CHUNK → CHUNK edges to stitch adjacent moments into a narrative
Relational context
- CHUNK → HAS_ENTITY → ENTITY edges to ground mentions in each chunk.
- ENTITY → LINKS_TO → ENTITY edges to connect entities with typed relations.
- CHUNK → PART_OF → DOCUMENT edges to bind chunks to documents.
- IN_SUMMARY/SUMMARY_OF edges to align summaries with their evidence.
Operation Metadata
- Stream IDs and Camera IDs for multi-stream analytics, scoping, and auditability.
- Asset references (e.g frame directories) to trace evidence.
Multimodal signals
- Dense captions and (optional) audio transcripts for complementary cues.
- Embeddings on chunks, entities, and summaries to enable semantic search and clustering.

For instance, given a video of a bridge captured for structural inspection (refer to the View Examples section here), these facets allow the system to answer questions like the ones below with timestamps, cameras, and entities referenced for transparency.

What structural issues are visible across the video and which areas are most affected?
Are there any immediate safety risks based on the visible condition of the bridge’s metal and concrete components?
How does the level of rust and corrosion change throughout the video, and what sections require urgent maintenance?
Does the surrounding environment appear to be impacting the bridge’s structural integrity?
Is the bridge overall stable and usable, or does it show signs of potential failure without intervention?

Frames of the VSS Bridge Video, and their equivalent Chunks stored in ArangoDB

Figure 2: Video Chunks stored in ArangoDB

Breakdown: How to Ingest Video Data

1. Segmentation and metadata (Figure 2): Split long videos into timestamped chunks; attach session/stream/camera IDs, offsets, and asset references.

2. Entity/relationship extraction (Figure 3 & 4): Identify entities (people, equipment, places, concepts) and typed relations; bind entities to the chunks that mention them. Restrict to custom entity & relationship types if specified by the user.

3. Temporal + hierarchy links (Figure 3): Connect the first chunk to its parent document; link adjacent chunks to form a time chain; keep per-chunk provenance.

4.Communities/summarization (Figure 5): Create higher-level community summaries of chunks; link supporting chunks to summaries and summaries back to documents.

5. Graph persistence (Figure 6 & 7): Store chunks, entities, documents, and communities with typed edges (has-entity, links-to, part-of, next-chunk, in-summary, summary-of, etc.).

6. Embeddings + vector indexing: Embed chunks/entities/summaries; build cosine-based vector indices sized to corpus and embedding dimension; optionally enable hybrid (keyword + vector).

7. Hygiene: Normalize entities, reduce duplicates, resolve similar triplets.

Figure 3: Generating a Knowledge Graph from Video Chunks (green) with Entities (yellow), Communities (magenta), and Documents (blue)

Figure 4: Mapping Chunks (green) to their source Document (blue) and Communities (magenta)

Mapping Chunks to their Community Summaries Figure 5: Mapping Chunks (green) to their Community Summaries (magenta)

A Knowledge Graph generated by the VSS Pipeline, stored in ArangoDB Figure 6: Visualizing a sample VSS Graph

ArangoDB Document samples of Entities & Relationships extracted by the LLM through the VSS Pipeline

ArangoDB Document samples of Entities & Relationships extracted by the LLM through the VSS Pipeline Figure 7: Sample Entities & Relationships

Breakdown: How to Retrieve Data

1. Select profile: Chunk-centric (time-localized), entity-centric (who/what/where), or GNN-ready (structured graph payload).

2. Embed: Convert the question into the same vector space as chunks/entities.

3. Rank (vector): Select top‑K candidates by cosine similarity; optionally combine with keyword scoring for terms and names.

4. Expand (graph): Add nearby evidence with limited-hop traversal:

From chunks to mentioned entities (has-entity).
Between entities via typed relations (links-to).
To summaries for macro context, sibling chunks, and provenance.

5. Stitch (temporal): Pull pre/post neighbors along the time chain for coherent narratives; apply time/camera filters as needed.

6. Pack context: Deduplicate and order evidence by score/time; include text snippets, entities/relations, timestamps, and stream/camera IDs.

7. Output formats: Text-centric context for summarization/Q&A, or a GNN-ready graph (nodes, relation types, edge indices, descriptions).

8. Tuning: Adjust top‑K, hop radius (typically 0–2), chunk size, and filters to balance recall, latency, and specificity.

How this fits the broader VSS updates

The original VSS post introduced GA features such as multi‑live stream, burst mode ingestion, a customizable CV pipeline, and audio transcription. These modalities feed the GraphRAG pipeline so the agent can:

Fuse visual information and audio transcriptions to improve precision,
Use object/tracking metadata to clarify which entities are involved,
Maintain per‑stream separation while supporting cross‑stream queries.

Together, these enable the temporal reasoning, multi‑hop reasoning, anomaly awareness, and scalability discussed in the CA‑RAG section of the original post, but now reinforced by a robust Knowledge Graph.

The VSS Architecture diagram Figure 8: NVIDIA AI Blueprint for Video Search And Summarization

Get started

Read the VSS Blueprint overview for APIs and deployment options (API Catalog, Launchables, Docker/Helm, Cloud): Advance Video Analytics AI Agents Using the NVIDIA Blueprint for Video Search and Summarization.

Learn more about ArangoDB (https://arangodb.com) and try VSS 2.4 with ArangoDB on your own videos. Other resources include:
Explore code, examples, and deployment recipes:
- VSS Blueprint
  - Documentation: https://docs.nvidia.com/vss/latest/
  - Github: https://github.com/NVIDIA-AI-Blueprints/video-search-and-summarization
- CA-RAG
  - Documentation: https://nvidia.github.io/context-aware-rag/
  - Github: https://github.com/NVIDIA/context-aware-rag

Anthony Mahanna

Anthony Mahanna is a software engineer & technical lead for Arango’s GenAI Data Platform, where he applies Graph Analytics, GraphML, and GraphRAG to solve graph-driven AI problems. Anthony joined Arango full-time in July 2023 after previously interning while attending university. He holds a B.Sc (Hons) in Computer Science from the University of Ottawa, Canada.

September 29 2025,Anthony Mahanna

Webinar on Fireside chat with Chief Product and Technology Officer. Watch Now

Generate a Video Knowledge Graph: NVIDIA VSS Blueprint with GraphRAG on ArangoDB

Arango’s Role in the NVIDIA Use Case

GraphRAG for Video Analytics

What’s new in VSS 2.4 for Graphs

What your Knowledge Graph can now capture

Breakdown: How to Ingest Video Data

Breakdown: How to Retrieve Data

How this fits the broader VSS updates

Get started

Anthony Mahanna

Leave a Comment Cancel Reply

Tags

Quick Links

Info

About Us

Stay In Touch