background

martin on skip list indices and why we use them in ArangoDB

Note: We changed the name of the database in May 2012. AvocadoDB is now called ArangoDB.

Last week AvocadoDB got mentioned in “nosql weekly” and the project achieved a huge amount of public interest especially from Japan. Awesome! ūüôā

In this context Mr. Fiber asked on twitter what the use of skip list indices in AvocadoDB is. Here’s a short video reply by chief architect martin Schoenert. Got an opinion on this? – we’d love to hear your thoughts on this in the comments.

skip list index from NoSQL matters on Vimeo or on Youtube

Dr. Frank Celler

Frank Celler

Frank is both entrepreneur and backend developer, developing mostly memory databases for two decades. He is the CTO and co-founder of ArangoDB. Try to challenge Frank asking him questions on C, C++ and MRuby. Besides Frank organizes Cologne’s NoSQL group & is an active member of NoSQL community.

6 Comments

  1. NoSQLFreak on April 11, 2012 at 3:29 pm

    Would the extra effort in keeping a doubly-linked list give faster search results?
    Maybe the extra ram required is too much a price for the marginal speed gain?

  2. NoSQLFreak on April 11, 2012 at 11:05 pm

    Scratch what I asked, just put some thought to it and realised you wont achieve anything by having a doubly-linked list.
    I was thinking more on keeping the previous pointer as you traverse the list so as to narrow the search but even that wont help a there is no relationship into which node to start from when switching to the finer resolution list.

  3. martin Schönert on April 12, 2012 at 8:43 pm

    I wrote the following comment yesterday but only got around to post it here now.¬† I am aware that you came to the realization that doubly-linked lists will not improve the speed yourself.¬† Still, writing this comment was enough effort so that I do not want to throw it away ūüėČ

    Skiplists are not going to give faster search times when you use doubly linked lists.

    I think the easiest way to see this is to realise that skiplists are basically multiple linked lists and they behave a lot like ordinary ordered linked lists.  And just like ordinary linked lists do not gain in search time from the backward link, neither do skiplists.

    If you are willing to use more memory to decrease the search time (the average search time) with

    skiplists, you can simply use more pointers on the higher layers.  So instead of adding another link

    on the next layer with 50% probability (so you have about 1/2 as many links on each layer than on

    the layer below) you can add another link on the next layer with e.g. 60% probability.  There is a limit beyond which speed will actually become worse again (because you start to spend more time walking the layers downward than you save from not having to scan the nodes on each layer rightward).  Actually on a perfectly balanced skiplist (where the nodes on every layer are equidistant), probability 50% is optimal.  But on a skiplist that is not perfectly balanced a slightly higher probability is a little bit better (my handwaving argument is that you loose more on

    skips that are longer than average than you win on the skips that are shorter than average).

    The backward links can still be useful on a plain linked list.  Namely if you may receiver a pointer to a node from some other part of your program and need to remove that node from the linked list. Than a backward link will allow you to remove that node in constant time from the linked list. Without the backward link you will need to search for the node, which will take O(N) time.  On a skiplist the same holds, i.e. backward links will allow you to remove random access nodes in constant time.  But since searching for the node only takes O(log(N)) time, the gain is smaller.

    Hope this helps to answer the question. If not please feel free to ask further questions.

  4. Anders on April 29, 2012 at 10:45 am

    Why not use a ‘hits’ counter and ‘watermark’ levels to determine the skiplist nodes?
    That way frequent occuring searches would be much faster.

  5. Xdevelopers on June 9, 2012 at 5:24 am

    FYI https://highlyscalable.wordpress.com/2012/06/05/fast-intersection-sorted-lists-sse/
    Fast Intersection of Sorted Lists Using SSE Instructions

  6. martin Schönert on June 12, 2012 at 1:09 pm

    @cbe805f7b5beef4c5fa24a04bb0788ec:disqus : thank you for the link to the very interesting article
    ¬†“Fast Intersection of Sorted Lists Using SSE Instructions”.

    Unfortunately the title is a slight misnomer, it should be
    “Fast Intersection of Sorted Arrays Using SSE Instructions”.
    That is, the algorithm described really only works on arrays
    (sometimes also called vectors).

    It does not work on linked lists (basically that is because the
    SSE instructions work on arrays/vectors and not on linked
    lists).  So it does not really work on the skip-lists (which are
    basically multiple linked lists with different strides).

Leave a Comment





Get the latest tutorials, blog posts and news: