Creating Multi-Game Highscore Lists: ArangoDB Techniques
I just came across a question about how to create highscore lists or leaderboards in ArangoDB, and how they would work when compared to Redis sorted sets.
This blog post tries to give an answer on the topic and also detailed instructions and queries for setting up highscore lists with ArangoDB. The additional section “Extensions” explains slightly more advanced highscore list use cases like multi-game highscore lists, joining data and maintaining a “last updated” date.
A highscore list in Redis
Highscore lists are normally used to quickly determine who’s currently at the top, so we obviously need some sorted data structure.
Redis has a specialized datatype named sorted set which can be used for exactly this purpose. A sorted set in Redis is a value consisting of multiple key/value pairs, and that is automatically sorted by values. The sorted set is stored under a key so it can be accessed as a whole.
Here’s how one would create a sorted set named highscores
and populate it with 5 key/value pairs in Redis (using redis-cli):
> ZADD highscores frank 50 jan 20 willi 35 thomas 75 ingo 60
Adding a new entry to a sorted set is done using ZADD
too. Inserting into a Redis sorted set has logarithmic complexity.
Updating a score in the sorted set is done using ZINCRBY
. This command works regardless of whether the to-be-updated key already exists in the sorted set. If it exists, its score will be increased by the specified value, and if it does not exist, it will be created with the specified value:
> ZINCRBY highscores 80 max
(integer) 1
In this case the return value 1
indicates that a new key was added to the set and that it didn’t update an existing one.
Querying the entries with the lowest scores from a Redis sorted set is trivial.
The ZRANGE
command will query the entries in the sorted set from lowest to highest score. As the entries are already stored in sorted order, this is very efficient.
The following command queries the bottom 3 keys from the sorted set:
ZRANGE highscores 0 2
1) "jan"
2) "willi"
3) "frank"
For querying in reverse order, there is ZREVRANGE
. Both commands can be accompanied by the WITHSCORES
flag to also return the associated values (i.e. the scores). Here are the top 3 key/value pairs in the sorted set:
> ZREVRANGE highscores 0 2 WITHSCORES
1) "max"
2) "80"
3) "thomas"
4) "70"
5) "ingo"
6) "60"
For removing an entry from a sorted set there is ZREM
:
> ZREM highscores jan
(integer) 1
There are many more specialized Redis commands for working with sorted sets. The Redis commands prefixed with a Z
are sorted set commands.
A highscore list in ArangoDB
Now let’s try to mimic that with ArangoDB.
In ArangoDB, there is no such thing as a sorted set and no direct equivalent. Instead, data in ArangoDB are stored in collections. Collections are a general-purpose storage mechanism and they are not limited to storing just scores.
We also need a mechanism for keeping highscores sorted. By default, no specific sort order is maintained for data in a collection. To have the collection entries sorted by highscore values, we have to explicitly create a (sorted) skiplist index on some attribute. We’ll use an attribute named score
for this.
The following shell commands create the collection and the index on score
:
db._create("highscores");
db.highscores.ensureIndex({ type: "skiplist", fields: [ "score" ] });
Once the collection is set up, we can switch to AQL for the following operations (though we could achieve the same with Shell commands).
To insert the same initial data as in the Redis case, we can run the following five AQL queries:
INSERT { _key: "frank", score: 50 } IN highscores
INSERT { _key: "jan", score: 20 } IN highscores
INSERT { _key: "willi", score: 35 } IN highscores
INSERT { _key: "thomas", score: 75 } IN highscores
INSERT { _key: "ingo", score: 60 } IN highscores
Note that I have been using the _key
attribute for saving the user id. Using the _key
attribute is normally beneficial because it is the collection’s primary key. It is always present and automatically unique, so exactly what we need for maintaining a highscore list. Note that there are some restrictions for what can be stored inside the _key
attribute, but as long as values are only ASCII letters or digits, there is nothing to worry about.
Inserting into the collection will also automatically populate the indexes. Inserting into a skiplist should have about logarithmic complexity on average (though this is not guaranteed – this is because the skiplist is a probabilistic data structure and internally it will be flipping coins. In theory there is a chance that it becomes badly balanced. But in practice it should be quite close to an average logarithmic complexity).
As we have some initial documents, we can now query the lowest and highest scores. This will also be efficient as the queries will use the sorted index on score
:
FOR h IN highscores
SORT h.score ASC
LIMIT 3
RETURN { user: h._key, score: h.score }
FOR h IN highscores
SORT h.score DESC
LIMIT 3
RETURN { user: h._key, score: h.score }
To store a highscore for a user without knowing in advance whether a value has already been stored before for this user, one can use UPSERT
. The UPSERT
will either insert a new highscore entry, or update an existing one if already present:
UPSERT { _key: "max" }
INSERT { _key: "max", score: 80 }
UPDATE { score: OLD.score + 80 } IN highscores
RETURN { user: NEW._key, score: NEW.score }
If there is already an entry with a key max
, its scores will be increased by 80. If such entry does not exist, it will be created. In both cases, the new score will be returned.
Note: the UPSERT
command has been added in ArangoDB version 2.6.
Finally, removing an entry from a highscore list is a straight-forward remove operation:
REMOVE { _key: "jan" } IN highscores
Extensions
We’ll now build on this simple example and create slightly more advanced highscore list use cases. The following topics will be covered:
- multi-game highscore lists
- joining data
- maintaining a “last updated” date
Multi-game highscore lists
We’ll start with generalizing the single-game highscore list into a multi-game highscore list.
In Redis, one would create multiple sorted sets for handling the highscore lists of multiple games. Multiple Redis sorted sets are stored under different keys, so they are isolated from each other.
Though Redis provides a few commands to aggregate data from multiple sorted sets (ZUNIONSTORE
and ZINTERSTORE
) into a new sorted set, other cross-set operations are not supported. This is not a problem if the client application does not have to perform cross-set queries or cleanup tasks.
In ArangoDB, multi-game highscore lists can be implemented in two variants. In order to decide which variant is better suited, we need to be clear about whether all highscores should be stored in the same collection or if we prefer using multiple collections (e.g. one per game).
Storing highscores for different games in separate collections has the advantage that they’re really isolated. It is easy to get rid of a specific highscore list by simply dropping its collection. It is also easy to get right query-wise.
All that needs to be changed to turn the above examples into a multi-game highscore list solution is to change the hard-coded collection name highscores
and make it a bind parameter, so the right collection name can be injected by the client application easily.
On the downside, the multi-collection solution will make cross-game operations difficult. Additionally, having one collection per game may get out of hand when there are many, many highscore lists to maintain. In case there are many but small highscore lists to maintain, it might be better to put them into a single collection and add a game
attribute to tell the individual lists apart in it.
Let’s focus on this and put all highscores of all games into a single collection.
The first adjustment that needs to be made is that we cannot use _key
for user ids anymore. This is because user ids may repeat now (a user may be contained in more than one list). So we will change the design and make the combination of game
and user
a new unique key:
db._drop("highscores");
db._create("highscores");
db.highscores.ensureIndex({ type: "hash", unique: true, fields: [ "user", "game" ] });
db.highscores.ensureIndex({ type: "skiplist", fields: [ "game", "score" ] });
We can use the unique hash index on user
and game
to ensure there is at most one entry for per user per game. It also allows use to find out quickly whether we already have an entry for that particular combination of game and user. Because we are not using _key
we could now also switch to numeric ids if we preferred that.
The other index on game
and score
is sorted. It can be used to quickly retrieve the leaderboard for a given game. As it is primarily sorted by game
, it can also be used to enumerate all entries for a given game.
The following Shell command populates the multi-game highscores collection with 55,000 highscores:
for (var game = 0; game < 10; ++game) {
for (var user = 0; user < (game + 1) * 1000; ++user) {
db.highscores.save({
game: game,
user: String(user),
score: (game + user) % 997 /* arbitrary score */
});
}
}
The game ids used above are between 0 and 9, though any other game ids would work, too. User ids are stringified numbers.
We can now find out the leaderboard for game 2 with the following adjusted AQL query. The query will use the (sorted) skiplist index:
FOR h IN highscores
FILTER h.game == 2
SORT h.score DESC
LIMIT 3
RETURN { user: h.user, score: h.score }
Removing all scores for a specific game is also efficient due to the the same index:
FOR h IN highscores
FILTER h.game == 5
REMOVE h IN highscores
On a side note: when storing all highscores in the same collection, we could also run cross-game queries if we wanted to. All that needs to be done for this is adjusting the FILTER
conditions in the queries.
Inserting or updating a user score can be achieved using an UPSERT
. Here’s a query to increase the score of user "1571"
in game 2
by a value of 5:
UPSERT { game: 2, user: "1571" }
INSERT { game: 2, user: "1571", score: 5 }
UPDATE { score: OLD.score + 5 } IN highscores
RETURN { user: NEW._key, score: NEW.score }
The same index on [ "user", "game" ]
is used in the following query that will delete the highscore of a given user in a specific game:
FOR h IN highscores
FILTER h.game == 6
FILTER h.user == '3894'
REMOVE h IN highscores
Joining data
Querying the leaderboard for a specific game was easy. However, so far we have only queried user ids and associated scores in games. In reality, we probably want to display some more user information in a leaderboard, for example their screen names.
In Redis, no extra information can be stored in sorted sets. So extra user information must be stored under separate keys. There is no concept of joins in Redis. The scores contained in the sorted set need to be queried by the client application, and extra user information have to be queried by the client application separately.
In ArangoDB, we could store the screen names in the highscores collection along with the highscores so we can easily query them with the leaderboard query. This is also how it would be done in MongoDB due to the absence of joins there.
While this would work, it will create lots of redundant data if the screen names are also used and stored elsewhere.
So let’s pick the option that stores highscores and screen names in separate places, and brings them together only when needed in a leaderboard query.
Let’s store screen names in a collection named users
. The following Shell commands will create the collection and set up 100K users with dummy screen names:
db._create("users");
for (var i = 0; i < 100000; ++i) {
db.users.insert({
_key: String(i),
name: "test user #" + i
});
}
We can now query the highscores plus the screen name in one go:
FOR h IN highscores
FILTER h.game == 2
SORT h.score DESC
LIMIT 3
FOR u IN users
FILTER h.user == u._key
RETURN { user: h.user, name: u.name, score: h.score }
Maintaining a “last updated” date
Finally, let’s try to keep track of when a highscore was last updated. There are a few use cases for this, for example displaying the date and time of when a highscore was achieved or for removing older highscores.
In Redis, the sorted set values are just the numeric scores, so we cannot store anything else (such as a date) inside the sorted sets. We would really need to store the update date for each highscore entry outside the sorted set, either under a separate key, or using a Redis hash. However, this is complex to manage and keep consistent so we won’t do it.
For implementing the automatic expiration, it would be good if we could use the built-in automatic key expiration of Redis. Each key can optionally be given a time-to-live or an expiration date, and it will automatically expire and vanish then without further ado. This may be exactly what we need to remove older highscore entries, but we cannot use it. The reason is that expiration only works for keys at the top level, but not for individual keys inside a sorted set. So we cannot really implement this sanely.
Let’s switch to ArangoDB now. Here we work with arbitrarily structured documents. That means we can store any other attributes along with a highscore. We can store the timestamp of when a highscore was last set or updated in an attribute named date
:
LET now = DATE_NOW()
UPSERT { game: 2, user: "1571" }
INSERT { game: 2, user: "1571", score: 10, date: now }
UPDATE { score: OLD.score + 10, date: now } IN highscores
RETURN { user: NEW._key, score: NEW.score }
The date
attribute can now be used for display purposes already.
We can also use the date
attribute for identifying the oldest entries in the highscore list in case we want the list to be periodically cleaned up.
Obviously we will be indexing date
for this, but we need to decide whether we want to use the same expiration periods for all games, or if we want to use game-specific expirations.
If the expiration date is the same for all games, then we can index just date
:
db.highscores.ensureIndex({ type: "skiplist", fields: [ "date" ] });
If we now want to remove entries older than roughly 2 days, regardless of the associated game, the removal query looks like this:
LET compare = DATE_NOW() - 2 * 86400 * 1000
FOR h IN highscores
FILTER h.date < compare
LIMIT 1000
REMOVE h IN highscores
If we instead want to find (and remove) the oldest entries for individual games, we need to create the index on game
and date
:
db.highscores.ensureIndex({ type: "skiplist", fields: [ "game", "date" ] });
This index allows to efficiently get rid of the oldest entries per game:
LET compare = DATE_NOW() - 2 * 86400 * 1000
FOR h IN highscores
FILTER h.game == 2
FILTER h.date < compare
LIMIT 1000
REMOVE h IN highscores
On a side note: the REMOVE
was limited to the oldest 1000 entries. This was done to make the query return fast. The removal query can be repeated while there are still entries to remove.
3 Comments
Leave a Comment
Get the latest tutorials, blog posts and news:
Thanks for this article.
One of the redis features I’ve used alot is http://redis.io/commands/zrank
Players tend to want to know overall rank (1st, 2nd… etc ) more often than score.
Does arrango have something similar? or would I be forced to count the records every time?
You wont need to count the records everytime … You can actually calculate the ranks yourself.. When you sort by descending the first document will always be the first rank and the other following documents you just have to increment the rank it…. And if you are using range queries then you will have a skip and limit using these you can easily calculate the ranks of all your players..
There is no direct equivalent to ZRANK or ZREVRANK in ArangoDB. This is because the documents in the highscores collection do not have any particular order. They will come out sorted by score only when explicitly querying them with a sort statement in a query. The sort instruction will sort them according to the given criteria or will use an existing index to iterate over the documents in sorted order if there is an index that covers the sort criteria. But even if an index is present, it won’t contain a documents position/rank inside the index or the collection, so there’s no place where to look up an up-to-date rank efficiently.