ArangoSearch:
Full-text search engine including similarity ranking capabilities

ArangoSearch is a C++ based full-text search engine including similarity ranking capabilities natively integrated into ArangoDB.

ArangoSearch allows users to combine two information retrieval techniques: boolean and generalized ranking retrieval. Search results “approved” by the boolean model can be ranked by relevance to the respective query using the Vector Space Model in conjunction with BM25 or TFIDF weighting schemes.

ArangoSearch is a first-class citizen in ArangoDB. With the debut release of ArangoSearch (ArangoDB 3.4) the following capabilities are supported:

Complex Searches with Boolean Operators
Relevance-Based Matching
Phrase and Prefix Matching
Relevance Tuning on Query-Time
Full combinability of search queries with all supported data models & access patterns
Scalability

Over the upcoming releases, the dedicated ArangoSearch team will complement the supported feature set.

Learn the new search capabilities with the detailed ArangoSearch tutorial

Learn More

The VIEW concept

ArangoSearch uses a special type of materialized view to provide full-text search across multiple collections. Within the definition of a view of type arangosearch, the user specifies entire collections or individual fields to be covered by an inverted index with one or more general text analyzers. The view concept is currently exclusive to ArangoSearch, more general views (SQL like views, materialized views) may be introduced with later versions of ArangoDB.

The view concept is key to ArangoSearch. Developers can create an arbitrary amount of individually configured views. A single ArangoSearch view may contain documents coming from different collections making it possible to perform complex federated searches even over the whole graph.

Similarity & Ranking in Full-text Search Engine ArangoSearch

One of the important advantages of ArangoSearch is the ability to score and sort the query resultset by document relevance, allowing the most relevant documents to be returned prior to less relevant documents. This further allows limiting the result set size to N documents which best match the filter conditions.

For similarity ranking, ArangoSearch uses a Vector Space Model which calculates the term weight for each term via scorer algorithms.

The current view implementation exposes the following scorers (case sensitive):

BM25 – a frequency based scorer based on the BM25 algorithm
TFIDF – a frequency based scorer based on the TFIDF algorithm

Both scorers analyze the term frequency (number of times a term occurs in a document) and the inverse document frequency (a measure to determine a terms “weight” across documents).

Example query from the hands-on tutorial

FOR d IN v_imdb 
  SEARCH ANALYZER(d.description IN TOKENS('amazing action world alien sci-fi science documental galaxy', 'text_en'), 'text_en') 
  SORT BM25(d) DESC 
  LIMIT 10 
RETURN d

Learn More

ArangoSearch for Cluster Usage

ArangoSearch is a distributed search engine capable of handling datasets sharded over multiple machines.

Do you want us to manage your clusters?

Following the general cluster architecture of ArangoDB, ArangoSearch queries are sent to the Coordinator and then planned, optimized and executed. The Coordinator then sends the request to the DBserver responsible for the respective part of the ArangoSearch query. Queries are then processed locally. By this general architecture for distributed query processing in ArangoDB, also ArangoSearch queries can be executed efficiently.

Limitations in the current ArangoSearch version

ArangoDB 3.4 includes the first production-ready release of ArangoSearch. Not all features on our roadmap are already implemented but the Core Team is continuing to extend and optimize the capabilities of text search and ranking for ArangoDB.

Learn More

Eventually read committed

In order to speed up indexing, the ArangoSearch view processes modification requests coming from ArangoSearch link on a batch basis. From time to time an asynchronous job commits accumulated data creating new index segments. Data is being visible right after the commit, so that speaking of transaction isolation, ArangoSearch view is on the eventually read committed level.

To learn more about the limitations of the current version, visit the release notes in the docs.

Learn the new search capabilities with the detailed ArangoSearch tutorial.

Learn More

Fireside Chat – Powering GenAI: The Critical Foundations for Scale. Watch Now

ArangoSearch:
Full-text search engine including similarity ranking capabilities

The VIEW concept

Related content:

Similarity & Ranking in Full-text Search Engine ArangoSearch

ArangoSearch for Cluster Usage

Do you want us to manage your clusters?

Limitations in the current ArangoSearch version

Eventually read committed

Related content:

Quick Links

Info

About Us

Stay In Touch

Fireside Chat – Powering GenAI: The Critical Foundations for Scale. Watch Now

ArangoSearch:Full-text search engine including similarity ranking capabilities

The VIEW concept

Related content:

Similarity & Ranking in Full-text Search Engine ArangoSearch

ArangoSearch for Cluster Usage

Do you want us to manage your clusters?

Limitations in the current ArangoSearch version

Eventually read committed

Related content:

Quick Links

Info

About Us

Stay In Touch

ArangoSearch:
Full-text search engine including similarity ranking capabilities