An introduction to index hints and named indices
In previous releases, index usage in queries was governed entirely by internal decisions in the AQL query optimizer. If a query could theoretically use one of two indices, the optimizer would make a decision based on estimated cost, and there was no way to externally influence that decision. In ArangoDB 3.5 Community and Enterprise, we have introduced index hints as a way to override the optimizer in cases where the optimizer may make a less than optimal decision.
In order to facilitate index hints, and to generally make index management a bit easier, we have added support for naming indices. Previously, indices were only identified numerically, or by their full definition. First, let’s explore how to work with index names.
Working with index names
Suppose we create a test collection.
localhost:8529@test> const test = db._create("test")
Every collection starts with a primary index. Since this is created without any user input, it is instantiated with the fixed name primary
.
localhost:8529@test> test.getIndexes()[0].name primary
Similarly, for edge collections, the edge index is instantiated with the fixed name edge
.
Now let’s see what happens with a secondary index.
localhost:8529@test> test.ensureIndex({type: 'hash', fields: ['value']}) { "deduplicate" : true, "fields" : [ "value" ], "id" : "test/246", "isNewlyCreated" : true, "name" : "idx_1635058787485220864", "selectivityEstimate" : 1, "sparse" : false, "type" : "hash", "unique" : false, "code" : 201 }
We find an auto-generated name idx_1635058787485220864
. This isn’t necessarily the easiest name to remember, and unfortunately, we currently do not support renaming indices. So let’s drop the index, and recreate it with a name of our choosing, by specifying the name
attribute.
localhost:8529@test> test.ensureIndex({type: 'hash', name: 'byValue', fields: ['value']}) { "deduplicate" : true, "fields" : [ "value" ], "id" : "test/332", "isNewlyCreated" : true, "name" : "byValue", "selectivityEstimate" : 1, "sparse" : false, "type" : "hash", "unique" : false, "code" : 201 }
That’s better.
The values used for index names must be unique with respect to the collection. So each collection may have an index with the name foo
, but an individual collection may not have two indices each named foo
.
Once we know a collection name, we may use it to look up the index definition.
localhost:8529@test> test.index("byValue") { "deduplicate" : true, "fields" : [ "value" ], "id" : "test/332", "name" : "byValue", "sparse" : false, "type" : "hash", "unique" : false, "code" : 200 }
localhost:8529@test> db._index("test/byValue") { "deduplicate" : true, "fields" : [ "value" ], "id" : "test/332", "name" : "byValue", "sparse" : false, "type" : "hash", "unique" : false, "code" : 200 }
We can also drop an index by name.
localhost:8529@test> test.dropIndex("byValue") true
localhost:8529@test> db._dropIndex("test/byValue") true
In this way, index names offer an alternative to working with numeric IDs, which we hope will be semantically useful and help make index management easier and more intuitive. Armed with this tool, let’s move on to index hints.
Providing index hints to the AQL query optimizer
Suppose that we have both a hash index and a skiplist index defined on the same field.
localhost:8529@test> test.ensureIndex({type: 'hash', name: 'byValueUnsorted', fields: ['value']}) { "deduplicate" : true, "fields" : [ "value" ], "id" : "test/3456", "isNewlyCreated" : true, "name" : "byValueUnsorted", "selectivityEstimate" : 1, "sparse" : false, "type" : "hash", "unique" : false, "code" : 201 } localhost:8529@test> test.ensureIndex({type: 'skiplist', name: 'byValueSorted', fields: ['value']}) { "deduplicate" : true, "fields" : [ "value" ], "id" : "test/3475", "isNewlyCreated" : true, "name" : "byValueSorted", "selectivityEstimate" : 1, "sparse" : false, "type" : "skiplist", "unique" : false, "code" : 201 }
If we want to run a simple equality lookup on the value field, the optimizer will generally prefer the hash index.
localhost:8529@test> db._explain('FOR doc IN test FILTER doc.value == "foo" RETURN doc') Query String: FOR doc IN test FILTER doc.value == "foo" RETURN doc Execution plan: Id NodeType Est. Comment 1 SingletonNode 1 * ROOT 6 IndexNode 1 - FOR doc IN test /* hash index scan */ 5 ReturnNode 1 - RETURN doc Indexes used: By Name Type Collection Unique Sparse Selectivity Fields Ranges 6 byValueUnsorted hash test false false 100.00 % [ `value` ] (doc.`value` == "foo") Optimization rules applied: Id RuleName 1 use-indexes 2 remove-filter-covered-by-index 3 remove-unnecessary-calculations-2
We can use the OPTIONS
syntax for the FOR
loop to provide an indexHint
to tell the optimizer we want to use the skiplist index instead
localhost:8529@test> db._explain('FOR doc IN test OPTIONS {indexHint: "byValueSorted"} FILTER doc.value == "foo" RETURN doc') Query String: FOR doc IN test OPTIONS {indexHint: "byValueSorted"} FILTER doc.value == "foo" RETURN doc Execution plan: Id NodeType Est. Comment 1 SingletonNode 1 * ROOT 6 IndexNode 1 - FOR doc IN test /* skiplist index scan */ 5 ReturnNode 1 - RETURN doc Indexes used: By Name Type Collection Unique Sparse Selectivity Fields Ranges 6 byValueSorted skiplist test false false 100.00 % [ `value` ] (doc.`value` == "foo") Optimization rules applied: Id RuleName 1 use-indexes 2 remove-filter-covered-by-index 3 remove-unnecessary-calculations-2
If we provide a hint that cannot be used, such as a non-existent index, or simply a non-compatible one, the optimizer will fall back to default behavior.
localhost:8529@test> db._explain('FOR doc IN test OPTIONS {indexHint: "primary"} FILTER doc.value == "foo" RETURN doc') Query String: FOR doc IN test OPTIONS {indexHint: "primary"} FILTER doc.value == "foo" RETURN doc Execution plan: Id NodeType Est. Comment 1 SingletonNode 1 * ROOT 6 IndexNode 1 - FOR doc IN test /* hash index scan */ 5 ReturnNode 1 - RETURN doc Indexes used: By Name Type Collection Unique Sparse Selectivity Fields Ranges 6 byValueUnsorted hash test false false 100.00 % [ `value` ] (doc.`value` == "foo") Optimization rules applied: Id RuleName 1 use-indexes 2 remove-filter-covered-by-index 3 remove-unnecessary-calculations-2
If we want to enforce that the index is used, or the query fails, for instance to guard against the case of a full collection scan in case an index creation fails, we can specify the forceIndexHint
flag.
localhost:8529@test> db._explain('FOR doc IN test OPTIONS {indexHint: "foo", forceIndexHint: true} FILTER doc.value == "foo" RETURN doc') JavaScript exception in file '/home/dan/src/arangodb/arangodb/js/client/modules/@arangodb/arangosh.js' at 99,7: ArangoError 1577: could not use index hint to serve query; {"indexHint":{"forced":true,"type":"simple","hint":["foo"]}} (while optimizing plan) ! throw error; ! ^ stacktrace: ArangoError: could not use index hint to serve query; {"indexHint":{"forced":true,"type":"simple","hint":["foo"]}} (while optimizing plan) at Object.exports.checkRequestResult (/home/dan/src/arangodb/arangodb/js/client/modules/@arangodb/arangosh.js:97:21) at ArangoStatement.explain (/home/dan/src/arangodb/arangodb/js/client/modules/@arangodb/arango-statement.js:123:12) at Object.explain (/home/dan/src/arangodb/arangodb/js/common/modules/@arangodb/aql/explainer.js:1791:21) at Proxy.ArangoDatabase._explain (/home/dan/src/arangodb/arangodb/js/client/modules/@arangodb/arango-database.js:981:38) at :1:4
In case there are multiple indices which could be used to serve a query, we can specify the hint as an array of index names, such that the optimizer will attempt to use them in the order specified. If none can be used, it will fall back to default behavior, or fail if the hint is forced.
localhost:8529@test> db._explain('FOR doc IN test OPTIONS {indexHint: ["foo", "byValueSorted"]} FILTER doc.value == "foo" RETURN doc') Query String: FOR doc IN test OPTIONS {indexHint: ["foo", "byValueSorted"]} FILTER doc.value == "foo" RETURN doc Execution plan: Id NodeType Est. Comment 1 SingletonNode 1 * ROOT 6 IndexNode 1 - FOR doc IN test /* skiplist index scan */ 5 ReturnNode 1 - RETURN doc Indexes used: By Name Type Collection Unique Sparse Selectivity Fields Ranges 6 byValueSorted skiplist test false false 100.00 % [ `value` ] (doc.`value` == "foo") Optimization rules applied: Id RuleName 1 use-indexes 2 remove-filter-covered-by-index 3 remove-unnecessary-calculations-2
Limitations
As mentioned previously, we currently do not support renaming indices. In order to change the name of an index, we must drop and recreate it. Furthermore, for existing installations which will be upgraded to 3.5, existing indices will be automatically named based on their numeric identifier.
Index hints currently may only be used with FOR
loops which iterate over collections. Traversals do not support index hints at this time.