Skip List Indices in AvocadoDB | ArangoDB Blog 2012
Note: We changed the name of the database in May 2012. AvocadoDB is now called ArangoDB.
Last week AvocadoDB got mentioned in “nosql weekly” and the project achieved a huge amount of public interest especially from Japan. Awesome! 🙂
In this context Mr. Fiber asked on twitter what the use of skip list indices in AvocadoDB is. Here’s a short video reply by chief architect martin Schoenert. Got an opinion on this? – we’d love to hear your thoughts on this in the comments.
skip list index from NoSQL matters on Vimeo or on Youtube
6 Comments
Leave a Comment
Get the latest tutorials, blog posts and news:
Would the extra effort in keeping a doubly-linked list give faster search results?
Maybe the extra ram required is too much a price for the marginal speed gain?
Scratch what I asked, just put some thought to it and realised you wont achieve anything by having a doubly-linked list.
I was thinking more on keeping the previous pointer as you traverse the list so as to narrow the search but even that wont help a there is no relationship into which node to start from when switching to the finer resolution list.
I wrote the following comment yesterday but only got around to post it here now. I am aware that you came to the realization that doubly-linked lists will not improve the speed yourself. Still, writing this comment was enough effort so that I do not want to throw it away 😉
Skiplists are not going to give faster search times when you use doubly linked lists.
I think the easiest way to see this is to realise that skiplists are basically multiple linked lists and they behave a lot like ordinary ordered linked lists. And just like ordinary linked lists do not gain in search time from the backward link, neither do skiplists.
If you are willing to use more memory to decrease the search time (the average search time) with
skiplists, you can simply use more pointers on the higher layers. So instead of adding another link
on the next layer with 50% probability (so you have about 1/2 as many links on each layer than on
the layer below) you can add another link on the next layer with e.g. 60% probability. There is a limit beyond which speed will actually become worse again (because you start to spend more time walking the layers downward than you save from not having to scan the nodes on each layer rightward). Actually on a perfectly balanced skiplist (where the nodes on every layer are equidistant), probability 50% is optimal. But on a skiplist that is not perfectly balanced a slightly higher probability is a little bit better (my handwaving argument is that you loose more on
skips that are longer than average than you win on the skips that are shorter than average).
The backward links can still be useful on a plain linked list. Namely if you may receiver a pointer to a node from some other part of your program and need to remove that node from the linked list. Than a backward link will allow you to remove that node in constant time from the linked list. Without the backward link you will need to search for the node, which will take O(N) time. On a skiplist the same holds, i.e. backward links will allow you to remove random access nodes in constant time. But since searching for the node only takes O(log(N)) time, the gain is smaller.
Hope this helps to answer the question. If not please feel free to ask further questions.
Why not use a ‘hits’ counter and ‘watermark’ levels to determine the skiplist nodes?
That way frequent occuring searches would be much faster.
FYI https://highlyscalable.wordpress.com/2012/06/05/fast-intersection-sorted-lists-sse/
Fast Intersection of Sorted Lists Using SSE Instructions
@cbe805f7b5beef4c5fa24a04bb0788ec:disqus : thank you for the link to the very interesting article
“Fast Intersection of Sorted Lists Using SSE Instructions”.
Unfortunately the title is a slight misnomer, it should be
“Fast Intersection of Sorted Arrays Using SSE Instructions”.
That is, the algorithm described really only works on arrays
(sometimes also called vectors).
It does not work on linked lists (basically that is because the
SSE instructions work on arrays/vectors and not on linked
lists). So it does not really work on the skip-lists (which are
basically multiple linked lists with different strides).