Finite State: Automating Security of Embedded Systems Across Software Supply Chains
ArangoDB helps Finite State meet internal, customer, and regulatory use cases with a single database
Results:
- Streamlined, more reliable database architecture
- Reduced architecture costs
- Support for internal, customer, and regulatory use cases
- Enhanced data scale and security
- Simplified, faster queries across billions of data points
- New vulnerability discovery and validation capabilities
The Scenario: A complex, expensive database architecture for multiple use cases
Finite State enables product security teams to protect the devices people rely on daily — from intelligent doorbells to critical energy infrastructure and medical systems — through the company’s software threat, vulnerability, and risk management platform. By analyzing every piece of information in device firmware, such as third-party code, components, chip sets, and configuration settings, Finite State powers next-generation software supply chain risk reduction at scale.
“Now that software is built of components, often supplied by third parties, controlling and assessing everything that’s compiled in connected products is crucial to product lifecycle management,” explains Finite State Founder and CEO Matt Wyckhouse. “We free product teams from costly, slow, and cumbersome manual testing with our automated platform.”
Illuminating vulnerabilities in connected devices is a complex process that involves acquiring compiled software binaries, firmware, and vendor-supplied code, unpacking it, and passing it through the Finite State Next Generation Platform’s cloud-based analytics plug-ins. Results, including a software bill of materials (SBOM), overall risk score, known vulnerabilities, common weaknesses, hard-coded credentials, and compliance and mitigation guidance — all enriched with threat and exploit intelligence — are housed in databases and then presented to customers in a web user interface for easy searching and querying.
The first iteration of the Finite State platform included several databases: one containing vulnerability and threat intelligence, another for user interactions and artifacts generated from analysis, and a third with a knowledge base of specific software packages. The initial design was based on data from the analysis — a mishmash of types, sizes, structures, and contents from large datasets, such as symbols in binaries, small datasets like passwords, and varied items like vulnerability information and software components.
As the platform’s architecture evolved, the Finite State team built different database technologies to support various use cases. They used Amazon S3 for file storage, AWS Glue for structured data and relational searching, Elasticsearch for customers’ freeform searches, and Postgres to house findings and their associated relationships.
Although this setup worked well initially, it soon presented challenges. Because each database and storage mechanism had different access patterns, company engineers had to continually copy data from one source to another for different needs — which was time-intensive. With no consistent data model, it was painful to retrieve information from the databases, format it, and present it to customers. The complex architecture was costly to maintain and scale.
The Requirements: Support for disparate and repeatable use cases, flawless reliability, and easy maintenance and scaling
In 2022, the Finite State product team took a fresh look at how to address the different use cases while building a more reliable platform with better access patterns, fewer technologies, and a streamlined structure for easy scaling and maintenance. The first step was to analyze customer use cases.
On the analysis side, Finite State examines raw artifacts to determine security findings and write new queries for customers. These activities call for efficient read/write performance on large datasets and the ability to match query patterns. Customers, on the other hand, perform freeform searches on vulnerability findings. Their priorities are fast performance and the ability to match logic search patterns.
“We were looking for one unified database solution,” says Wyckhouse. “After spending a lot of time investigating different approaches, ArangoDB was the only solution that would cater to all our use cases. All the previous technologies — Postgres, Elasticsearch, S3, AWS Glue, and AWS Athena — could be replaced with ArangoDB as a centralized data source. We concluded that we could support all our stakeholders using the same database.” For Finite State, this approach held out the promise of a more streamlined and reliable architecture.
Before they landed on ArangoDB, Wyckhouse and his team extensively evaluated multiple technologies: Postgres, Elasticsearch, AWS OpenSearch, AWS Aurora Serverless v2, CockroachDB, MongoDB, CitusDB, and Redshift. The team had a wealth of database experience and a long list of must-haves to ensure that customers were satisfied with the platform’s performance:
- A schemaless JSON-based write paradigm
- The ability to simply express queries
- Exceptional search capabilities
- Options to deploy both in the cloud and on-premise
- Cloud-native horizontal scaling
- Solid performance with both OLTP and OLAP queries
- Elimination of client-side joins
Postgres fell short in read/write performance. Elasticsearch would need to be more efficient with relational queries because data would have to be fragmented into many different document stores. AWS OpenSearch, AWS Aurora Serverless v2, CockroachDB, MongoDB, CitusDB, and Redshift also fell short.
Why ArangoDB: Many databases in one, support for both cloud and on-premise, as well as data security
When the team evaluated ArangoDB, they found that it met all the requirements they were looking for, with several bonuses. “ArangoDB functions as many databases in one,” says Wyckhouse. “It has graph capabilities and can be used as a document or a key/value store. Because ArangoDB is schemaless but can do SQL-type joins and searches, we satisfy our analysis team’s requirements, but it also has full-text search for our customers looking for vulnerabilities and insights across their product portfolios.”
“We were looking for one unified database solution. After spending a lot of time investigating different approaches, ArangoDB was the only solution that would cater to all our customer and partner use cases.”
- Matt Wyckhouse, Finite State Founder and CEO
Wyckhouse adds that AQL, the query language for ArangoDB, has many advantages, especially its ability to mix SQL-type and freeform search statements. ArangoDB also supports both cloud and on-premise deployments, as well as data locality. ArangoDB allows Finite State to create separate databases and tenancies for each customer. Explains Wyckhouse, “As a security company, that’s crucial. After all, our customers are sharing their intellectual property with us.”
The Implementation: ArangoDB + cruddl
For the new architecture, the team separated findings data for customers and analysis data for researchers in different ArangoDB databases within the same instance to support different types of querying. To streamline the presentation of data, they abstracted it in ArangoDB from the user interface and APIs, making it agnostic to the underlying ArangoDB queries. This approach freed the team from worrying about how data is written and read, and the pivot to ArangoDB expedited the delivery of findings to the user interface.
After making their selection, the team put ArangoDB through its paces, first testing a complex query involving multiple nested select statements and joins to see how it compared to the previous databases. “We could easily express things we wanted to in ArangoDB,” says Wyckhouse. “It was also much simpler writing the queries. Our analysis team is now learning to write queries in AQL rather than SQL.”
The next test was to compare ArangoDB’s read/write performance to previous databases. It took Postgres and S3 thousands of milliseconds to write a block of data from the firmware analysis plugins. ArangoDB wrote the same data in tens of milliseconds. “ArangoDB is more efficient than our previous write pipeline by almost two orders of magnitude,” Wyckhouse says.
The final step was to examine the data model. Between firmware, products, manufacturers, security findings, and relationships, Finite State had to find a way to simplify a complex data model. The team chose cruddl to sit atop ArangoDB and make additions, deletions, and data manipulation easy and understandable from an information architecture perspective.
The Results: Less complexity, more performance, new product capabilities
Simplified architecture, reduced cost: The unpacking software and analysis plug-ins remain the same in the Finite State Next Generation Platform, but ArangoDB is now the only database needed. “On the database side, we rely entirely on ArangoDB for both the customer-facing and the internal-facing analysis,” says Wyckhouse. This streamlined architecture means increased reliability, with few data layers to maintain and keep in sync and fewer software vendors to pay.
Easy scaling and flawless security: With ArangoDB, scaling has become a non-issue. The company can now support hundreds of database instances across several nodes, with built-in replication and the ability to secure customer intellectual property. This ability to scale securely means Finite State has one less barrier to achieving its revenue goals.
Extreme performance: ArangoDB is more efficient than the previous write pipeline by almost two orders of magnitude.
Fast, simple queries: Queries are now shorter and simpler in AQL, and searching vast amounts of data is quick and easy. ArangoSearch views can span multiple collections of data, either using premade or customized tokenizers. Engineers can optimize search queries with built-in query language directives. For Finite State, this increase in developer productivity meant they could deliver more product capabilities in less time.
“The query language that ArangoDB has for searching has a lot of advantages, especially the ability to mix SQL-type statements with freeform search. This turned out to be a big plus for us,” says Wyckhouse.
New product innovations: By intermixing graph, full-text search, and relational queries over schemaless data, Finite State can express data with multiple models simultaneously. Data can be document- and graph-based, for example. This flexibility enables the team to present data to customers in new ways based on factors such as the timing and sequence of threats.
Wyckhouse has been impressed with what the team can do with ArangoDB to maintain current functionality, but they are more excited about what other customer-facing value they can bring. By leveraging the graph capabilities of ArangoDB for visualization and building data relationships, Wyckhouse believes they can unveil more vulnerabilities in critical parts of customers’ firmware.
“We found that we can rely on ArangoDB’s graph capabilities to make the edges between different data elements show sequencing and expose security vulnerabilities. In short, ArangoDB was just the right technology for us.”
Customer Architecture
Below is the architecture that Finite State built for its internal analysis team and customers. They rebuilt their user interface to improve responsiveness using next.js. ArangoDB is the only database required for building and visualizing relationships that identify vulnerabilities within customer firmware. Cruddl handles read/write operations to and from ArangoDB.
For more details, you can watch Finite State’s presentation from ArangoDB Summit 2022: