home shape
logo decoded health

Decoded Health: Transforming Healthcare
with ML Models, Ontologies, and Graphs

Helping doctors consistently deliver quality care in less time

right blob img min

The Scenario: Build a unified knowledge graph of doctor/patient interactions

Decoded Health is a venture-backed startup that seeks to make healthcare more efficient and effective by automating patient conversations and augmenting physicians’ clinical workflows. Decoded Health is spun out of SRI International, the same research organization that spawned Apple Siri. By combining machine learning with human-in-the-loop processes, Decoded Health lets doctors serve more patients while at the same time deepening physician-patient relationships.

Decoded Health needed to scale patient engagement with automated conversations, but in a manner that went beyond rudimentary chatbot systems that often leave patients frustrated. To do this, Decoded Health needed to build a system that determines the patient’s intent and uses information on the patient’s medical history and health status. From there, Decoded Health had to perform patient-specific screenings and ask clinically-informed questions that can quickly differentiate among different conditions. Before physicians meet with patients, Decoded Health needed to provide physicians with relevant information on their patient’s chief complaint, history of present illness (HPI), and notable findings from their electronic health record (EHR).

To accomplish this goal, Decoded Health built a virtual medical resident, called Quinn. Embedded within Quinn is medical knowledge and how patients describe their ailments, via a training process similar to those used to teach medical residents.

This approach required Decoded Health to provide Quinn with a vast amount of domain expertise, including medical vocabulary and processes that doctors understand, and how to read patients correctly as they communicate their conditions and preferences in colloquial terms. Putting all this together, Quinn can formulate a medical opinion on treatment steps, communicate with clinicians and patients in terms they understand, and help the clinician run the patient encounter.

Decoded-Health -Knowledge-Graph ArangoDB-copy
right blob min

The Requirements: Underpin a massive medical knowledge graph

To create Quinn, Decoded Health needed the following capabilities in a graph database:

Complex knowledge graph: Decoded Health needed to create a vast knowledge graph, also called an ontology. This ontology has over five million clinical concepts and consists of multiple parts:

1. Terms used by doctors and patients and how they map to each other; for example, the medical term “cesarean” and its colloquial term, “c-section.”

2. How those clinical and layperson terms might vary across geographies; for instance, the British medical term “caesarean” differs from its American equivalent.

3. A conversation graph to model doctor/patient conversation flows, encapsulating patient intent classification, context switching, and dynamic questions, along with how to strategically route the patient through the correct dialog. This graph covers a wide range of scenarios, from a patient trying to book an appointment to someone with an acute injury.

4. A clinical knowledge graph that encodes medical understanding.

5. A transaction graph representing patient touchpoints in a manner that is more flexible than a relational database.


Time-based: Decoded Health needed to capture patient encounters as they occur over time, which is highly relevant to medical analysis and recommendations. For example, with Covid-19, treatments have changed over time as scientists have learned more about the disease, new drugs and vaccines have become available, and the coronavirus has mutated. Another example: physicians need to be able to see a patient’s timeline of treatments – in medical terms, their longitudinal care plan – and understand the state of patient care at any point in time.

Explainability: Doctors need to explain to patients why they need a particular treatment since the stakes can be high for them and their families. Decoded Health needs to provide physicians with explainable references and present why a particular piece of knowledge was true at a given time. This information is crucial for infectious diseases, such as Covid-19, caused by fast-mutating viruses and bacteria.

Extensibility: Medical treatments change over time, as does the process by which clinicians discuss them with patients. For instance, as the coronavirus pandemic progressed, Decoded Health could easily extend Quinn with frequently-changing knowledge of Covid symptoms, treatments, and conversational guidelines.

Machine learning: Decoded Health needed to build Quinn with graph-driven clinical inference, which takes information about a patient and produces a ranked list of possible conditions for that patient, ordered by the probability of them having that condition. Decoded Health needed to map a neural network mapped to their knowledge graph.

Scalability: Decoded Health partners with one of the largest health staffing providers in the United States, so their graph queries needed to run in a scalable manner.

Integration: Decoded Health needed to ensure that Quinn’s knowledge graph followed standards from medical coding systems – such as CPT and ICD codes – to ease the integration with external systems.

Security & Compliance: ArangoDB offers a tier with the option for a BAA (business associate agreement), providing Decoded Health peace of mind with a secure partner. A business associate agreement establishes a legally-binding relationship between HIPAA-covered entities and business associates to ensure complete protection of PHI (protected health information).

right blob img min

Why ArangoDB: Time-based knowledge graph that’s easily extensible

Speed: Using a traditional relational database to model temporal patterns, such as patient care over time or rapidly changing best practices during a pandemic, required a large number of joins, which heavily impacted query performance. In contrast, ArangoDB was able to handle queries with many joins effortlessly.

Graph-based Time Travel: To enable doctors to travel to any point in time to see a patient’s condition and care, Decoded Health needed to label each node and edge in their graph with two timestamps. One timestamp for the creation time of a node or edge; the other, when it became invalid or expired. ArangoDB made this easy, with documentation explaining how to implement time travel with a graph database. In particular, in ArangoDB, each node and edge is a JSON document that can easily accommodate new fields, such as multiple timestamps. Also, AQL (ArangoDB Query Language) has extensive functionality for queries with timestamps.

Scalability: Decoded Health’s time-travel capability means that nodes and edges are never deleted; instead, they’re simply expired. This practice means that the size of Quinn’s knowledge graph is continually expanding; however, ArangoDB was able to accommodate this growth in graph size.

No-code: Decoded Health allowed subject matter experts to add their own dialog modules directly to Quinn using the ArangoGraph editor, freeing developers from having to interpret that workflow and write custom code to encapsulate it. These dialog modules include how to classify a patient’s intent and sentiment, mappings of consumer (patient) vocabulary to clinical terms, and how to route the conversation appropriately. This no-code approach allowed domain experts to be more productive and freed up developers to work on other, more technical projects.


Machine learning (ML) integration: Once Decoded Health built the knowledge graph above, they could map it to the neural network that forms the core of Quinn’s brain.

The Implementation: Modular services, monolithic graph

Decoded Health stores all its knowledge graphs in one massive graph stored in ArangoGraph, the managed cloud service from ArangoDB. Using one large graph lets Decoded Health avoid querying multiple databases, yielding significant performance improvements. Running on ArangoGraph frees time for Decoded Health’s technical team since there is one less database to manage. They appreciated the fact that ArangoGraph handles backups and is highly performant.

Using ArangoGraph, Decoded Health built distributed microservices accessible via API gateways. These microservices encapsulate various pieces of logic, such as collecting a patient’s insurance information or updating their demographic information. They also pull subgraphs from ArangoDB into memory, enabling them to rapidly perform computations on those subgraphs and meet their performance SLAs (service level agreements).

This architecture has one cache layer, built with Redis, and one graph layer, on ArangoGraph. By having one cache layer tied to just one graph layer, both layers are easily kept in sync. With this cache layer, Decoded Health can reduce the load on the database incurred from running complex graph queries, further improving performance.

Decoded Health implemented narrowly-scoped graph database writes that can read information from all the knowledge graphs stored in ArangoDB. This narrow write/broad read approach lets them keep transactions light and short while adhering to ACID principles. They built a transaction library to work with ArangoDB, using in-memory graphs similar to Java’s JPA or Hibernate. This library allowed them to mark graph nodes and edges as dirty and requiring an update, then write all those entities to disk all at once, as appropriate.

Screenshot 2023 02 24 at 16
right blob img min

The Results

By using ArangoDB as a foundational element of Quinn, Decoded Health can offer the following benefits:

Doctor efficiency: Decoded Health lets doctors serve four times more patients, from a typical 2,000 to 8,000, thanks to knowledge graphs stored in ArangoDB that surface relevant information and streamline patient conversations.

Speed: Decoded Health could execute queries on time-based patterns much faster than with a relational database, enabling clinicians to complete their patient encounters faster and better understand patient medical conditions and treatments over time.

Quality: Decoded Health can deliver care recommendations based on best practices from the medical community, thanks to the knowledge encoded in ArangoDB. By enabling doctors to complete their patient encounters in less time, physicians can be better rested and deliver higher-quality care.

Consistency: Decoded Health helps physicians consistently deliver high-quality healthcare, regardless of location – whether in their home, while traveling, or at a doctor’s office – since all doctors are getting the same recommendations, informed by the same patient conversation record and knowledge graph stored in ArangoDB.

Flexibility: Patient touchpoints are stored in a graph database that is much more flexible than a traditional relational database, with its rigid schema of rows and columns.

Rich analytics: A range of roles can query the knowledge base or medical record; these roles can include a clinician, CIO, analyst, data steward, data scientist, or data engineer. Queries can include all observations about a particular patient, health care provider, facility, diagnosis, or treatment. Queries can reference standard medical coding systems such as SNOMED, ICD, LOINC, and CPT. This broad and flexible query capability is made possible by ArangoDB’s flexible query capabilities.

Comprehensive: Quinn’s knowledge graph spans over 90% of the most common urgent care conditions, thanks to ArangoDB’s scalability.

Developer efficiency. By combining graph, document, and search capabilities into one, ArangoDB provides a more streamlined developer experience that enables them to be more productive.

“We found ArangoGraph ideal because we did not need to maintain it. Security, monitoring, logging and backups are handled, and the performance is great.”

- Kevin Bayes, Decoded Health

For more details, you can watch Kevin Bayes and Anna Spyker’s presentation from ArangoDB Summit 2022: