ArangoDB v3.8 reached End of Life (EOL) and is no longer supported.

This documentation is outdated. Please see the most recent version at docs.arangodb.com

Ranking View Query Results

You can query Views and return the most relevant results first based on their ranking score

ArangoSearch supports the two most popular ranking schemes:

Under the hood, both models rely on two main components:

  • Term frequency (TF): in the simplest case defined as the number of times a term occurs in a document
  • Inverse document frequency (IDF): a measure of how relevant a term is, i.e. whether the word is common or rare across all documents

See Ranking in ArangoSearch in the ArangoSearch Tutorial to learn more about the ranking model.

Basic Ranking

To sort View results from most relevant to least relevant, use a SORT operation with a call to a Scoring function as expression and set the order to descending. Scoring functions expect the document emitted by a FOR … IN loop that iterates over a View as first argument.

FOR doc IN viewName
  SEARCH 
  SORT BM25(doc) DESC
  RETURN doc

You can also return the ranking score as part of the result.

FOR doc IN viewName
  SEARCH 
  RETURN MERGE(doc, { bm25: BM25(doc), tfidf: TFIDF(doc) })

Scoring functions cannot be used outside of SEARCH operations, as the scores can only be computed in the context of a View, especially because of the inverse document frequency (IDF).

Dataset: IMDB movie dataset

View definition:

{
  "links": {
    "imdb_vertices": {
      "fields": {
        "description": {
          "analyzers": [
            "text_en"
          ]
        }
      }
    }
  }
}

AQL Queries:

Search for movies with certain keywords in their description and rank the results using the BM25() function:

FOR doc IN imdb
  SEARCH ANALYZER(doc.description IN TOKENS("amazing action world alien sci-fi science documental galaxy", "text_en"), "text_en")
  SORT BM25(doc) DESC
  LIMIT 10
  RETURN {
    title: doc.title,
    description: doc.description,
    score: BM25(doc)
  }
title description score
AVPR: Aliens vs. Predator - Requiem Prepare for more mayhem as warring aliens and predators return … spectacular action sequences … 35.85710525512695
Moon 44 … battle a familiar foe and an alien enemy. … sci-fi thriller from action director Roland Emmerich … 35.85523223876953
Dark Star A low-budget, sci-fi satire … battle their alien mascot … 28.655567169189453
Starship Troopers 2: Hero of the Federation In the sequel to Paul Verhoeven’s loved/reviled sci-fi film … fighting alien bugs… 28.635963439941406
Push The action packed sci-fi thriller involves a group of young American ex-pats… 28.131816864013672
Casshern Live-action sci-fi movie based on a 1973 Japanese animé of the same name. 28.070863723754883
Puzzlehead In a post apocalyptic world where technology is outlawed, … The resulting Sci-Fi love triangle is a Frankensteinian fable … 25.57171630859375
Cesta do pravěku Most classical sci-fi from K. Zeman. … a wondrous prehistoric world 25.57117462158203
Interstella 5555: The 5tory of the 5ecret 5tar 5ystem A sci-fi japanimation House-musical movie … themes of sci-fi celebrity … 22.481136322021484
Alien Planet The dynamic meeting of solid scienceAlien Planet creates a realistic depiction of creatures on another world, … 21.493724822998047

Do the same but with the TFIDF() function:

FOR doc IN imdb
  SEARCH ANALYZER(doc.description IN TOKENS("amazing action world alien sci-fi science documental galaxy", "text_en"), "text_en")
  SORT TFIDF(doc) DESC
  LIMIT 10
  RETURN {
    title: doc.title,
    description: doc.description,
    score: TFIDF(doc)
  }
title description score
AVPR: Aliens vs. Predator - Requiem Prepare for more mayhem as warring aliens and predators return … spectacular action sequences … 25.193025588989258
Moon 44 … battle a familiar foe and an alien enemy. … sci-fi thriller from action director Roland Emmerich … 25.193025588989258
Interstella 5555: The 5tory of the 5ecret 5tar 5ystem A sci-fi japanimation House-musical movie … themes of sci-fi celebrity … 20.324928283691406
Dark Star A low-budget, sci-fi satire … battle their alien mascot … 19.935544967651367
Starship Troopers 2: Hero of the Federation In the sequel to Paul Verhoeven’s loved/reviled sci-fi film … fighting alien bugs… 19.935544967651367
Casshern Live-action sci-fi movie based on a 1973 Japanese animé of the same name. 19.629377365112305
Push The action packed sci-fi thriller involves a group of young American ex-pats… 19.629377365112305
Puzzlehead In a post apocalyptic world where technology is outlawed, … The resulting Sci-Fi love triangle is a Frankensteinian fable … 18.10955047607422
Cesta do pravěku Most classical sci-fi from K. Zeman. … a wondrous prehistoric world 18.10955047607422
The Day the Earth Stood Still An alien and a robot land on earth after World War II … A classic science fiction film … 15.719740867614746

Query Time Relevance Tuning

You can fine-tune the scores computed by the Okapi BM25 and TF-IDF relevance models at query time via the BOOST() AQL function and also calculate a custom score. In addition, the BM25() function lets you adjust the coefficients at query time.

The BOOST() function is similar to the ANALYZER() function in that it accepts any valid SEARCH expression as first argument. You can set the boost factor for that sub-expression via the second parameter. Documents that match boosted parts of the search expression will get higher scores.

Dataset: IMDB movie dataset

View definition:

{
  "links": {
    "imdb_vertices": {
      "fields": {
        "description": {
          "analyzers": [
            "text_en"
          ]
        }
      }
    }
  }
}

AQL Queries:

Prefer galaxy over the other keywords:

FOR doc IN imdb
  SEARCH ANALYZER(doc.description IN TOKENS("amazing action world alien sci-fi science documental", "text_en")
      OR BOOST(doc.description IN TOKENS("galaxy", "text_en"), 5), "text_en")
  SORT BM25(doc) DESC
  LIMIT 10
  RETURN {
    title: doc.title,
    description: doc.description,
    score: BM25(doc)
  }
title description score
Star Trek Collection Star Trek a futuristic science fiction franchise. … galaxies to explore, and cool skin tight suits to beam up in … 64.87849426269531
Alien Tracker In a galaxy far away, alien criminals organize a spectacular prison break. … Cole is the Alien Tracker … 63.959991455078125
Stitch! The Movie … the galaxy’s most wanted extraterrestrial … Dr. Jumba brought one of his alien “experiments” to Hawaii. 63.39030075073242
The Hitchhiker’s Guide to the Galaxy Mere seconds before the Earth is to be demolished by an alien construction crew … a new edition of “The Hitchhiker’s Guide to the Galaxy.” 63.37282943725586
Stargate: The Ark of Truth … it may be in the Ori’s own home galaxy. … SG-1 travels to the Ori galaxy … in a distant galaxy fighting two powerful enemies. 61.784141540527344
The Ice Pirates … the most precious commodity in the galaxy is water. … unreachable centre of the galaxy … The galaxy is ruled by an evil emperor … 61.78216552734375
Star Wars: Episode III: Revenge of the Sith … leading a massive clone army into a galaxy-wide battle against the Separatists. … to rule the galaxy, the Republic crumbles … 59.79429244995117
Star Wars: Episode II - Attack of the Clones … not only has the galaxy undergone significant change, but so have Obi-Wan Kenobi, Padmé Amidala, and Anakin Skywalker … 55.723636627197266
Macross Plus … a new aircraft (Shinsei Industries’ YF-19 & General Galaxy’s YF-21) for Project Super Nova, to choose the newest successor to the VF-11 55.722259521484375
Star Trek The fate of the galaxy rests in the hands of bitter rivals. One, James Kirk, is a delinquent, thrill-seeking Iowa farm boy. The other, Spock, a Vulcan, … 55.717037200927734

If you are an information retrieval expert and want to fine-tuning the weighting schemes at query time, then you can do so. The BM25() function accepts free coefficients as parameters to turn it into BM15 for instance:

FOR doc IN imdb
  SEARCH ANALYZER(doc.description IN TOKENS("amazing action world alien sci-fi science documental", "text_en")
      OR BOOST(doc.description IN TOKENS("galaxy", "text_en"), 5), "text_en")
  LET score = BM25(doc, 1.2, 0)
  SORT score DESC
  LIMIT 10
  RETURN {
    title: doc.title,
    description: doc.description,
    score
  }
title description score
Stargate: The Ark of Truth … it may be in the Ori’s own home galaxy. … SG-1 travels to the Ori galaxy … in a distant galaxy fighting two powerful enemies. 42.88237380981445
The Ice Pirates … the most precious commodity in the galaxy is water. … unreachable centre of the galaxy … The galaxy is ruled by an evil emperor … 42.88237380981445
Star Wars: Episode III: Revenge of the Sith … leading a massive clone army into a galaxy-wide battle against the Separatists. … to rule the galaxy, the Republic crumbles … 39.27024841308594
Alien Tracker In a galaxy far away, alien criminals organize a spectacular prison break. … Cole is the Alien Tracker … 38.43224334716797
Star Trek Collection Star Trek a futuristic science fiction franchise. … galaxies to explore, and cool skin tight suits to beam up in … 38.42367935180664
Stitch! The Movie … the galaxy’s most wanted extraterrestrial … Dr. Jumba brought one of his alien “experiments” to Hawaii. 37.563819885253906
The Hitchhiker’s Guide to the Galaxy Mere seconds before the Earth is to be demolished by an alien construction crew … a new edition of “The Hitchhiker’s Guide to the Galaxy.” 37.563819885253906
Critters 4 … he gets a message that it would be illegal to extinguish the race from the galaxy. … 32.99643325805664
Alien Agent A lawman from another galaxy must stop an invading force from building a gateway to planet Earth. 32.99643325805664
Star Trek The fate of the galaxy rests in the hands of bitter rivals. One, James Kirk, is a delinquent, thrill-seeking Iowa farm boy. The other, Spock, a Vulcan, … 32.99643325805664

You can also calculate a custom score, taking into account additional fields of the document.

Match movies with the (normalized) phrase star war in the title and calculate a custom score based on BM25 and the movie runtime to favor longer movies:

FOR doc IN imdb
  SEARCH PHRASE(doc.title, "Star Wars", "text_en")
  LET score = BM25(doc) * LOG(doc.runtime + 1)
  SORT score DESC
  RETURN {
    title: doc.title,
    runtime: doc.runtime,
    bm25: BM25(doc),
    score
  }
title runtime bm25 score
Star Wars: Episode II - Attack of the Clones 142 16.900253295898438 83.87333131958185
Star Wars: Episode III: Revenge of the Sith 140 16.900253295898438 83.63529564797363
Star Wars: Episode VI - Return of the Jedi 135 16.900253295898438 83.02511192427228
Star Wars: Episode I - The Phantom Menace 133 16.81275749206543 82.34619279156092
Star Wars: Episode V: The Empire Strikes Back 124 16.900253295898438 81.59972515247492
Star Wars: Episode IV - A New Hope 121 16.81275749206543 80.76884081187906
The Star Wars Holiday Special 97 16.569408416748047 75.97019873160025
Star Wars: The Clone Wars 90 16.569408416748047 74.74227347404823
Star Wars: Revelations 47 16.13064956665039 62.44498690901793
Star Wars Collection null 16.13064956665039 0