home shape

Performance Analysis: PyArango Inspecting Transactions

Following the previous blog post on performance analysis with pyArango, where we had a look at graphing using statsd for simple queries, we will now dig deeper into inspecting transactions. At first, we split the initialization code and the test code.

Initialisation code

We load the collection with simple documents. We create an index on one of the two attributes:

#!/usr/bin/python
from pyArango.connection import *
from pyArango.collection import *

conn = Connection(username="root", password="")
        
db = conn["_system"]

if not db.hasCollection('test'):
    testCol = db.createCollection('Collection', name='test')
else:
    testCol = db.collections['test']
    testCol.truncate()

i = 0;
while i < 100000:
    i+=1
    testCol.createDocument({'foo': 'bar', 'count': i, 'counter': i}).save()

testCol.ensureHashIndex(['count'], sparse=False)

Application code

Transaction code

We’re creating a simple ArangoDB Transaction containing 3 queries:

    function(params) {
      var db = require('@arangodb').db;
      var startOne = Date.now();
      var q1 = db._query(`FOR doc IN test FILTER doc.count == @i - 1 RETURN doc`, {i:params.i})
      var startTwo = Date.now()
      var q2 = db._query(`FOR doc IN test FILTER doc.counter == @i - 1 RETURN doc`, {i:params.i})
      var startThree = Date.now()
      var q3 = db._query(`RETURN 1`)
      var end = Date.now();
      return {
        tq1: startTwo - startOne,
        tq2: startThree - startTwo,
        tq3: end - startThree,
        all: end - startOne
      }

 

We put in the counter as a parameter the same way we used it in the last article (Performance analysis with pyArango: Part I). We measure the current time between the 3 queries. As a return value, we calculate the deltas to the invoking program.

Joining the transaction code into the python script

Here we have the complete application code including a transaction that we want to monitor:

#!/usr/bin/python
import sys
import statsd
from pyArango.connection import *
from pyArango.collection import *

statsdc = statsd.StatsClient('127.0.0.1', '8125')
conn = Connection(username="root", password="", statsdClient = statsdc)
db = conn["_system"]
testCol = db.collections['test']

transaction = '''
    function(params) {
      var db = require('@arangodb').db;
      var startOne = Date.now();
      var q1 = db._query(`FOR doc IN test FILTER doc.count == @i - 1 RETURN doc`, {i:params.i})
      var startTwo = Date.now()
      var q2 = db._query(`FOR doc IN test FILTER doc.counter == @i - 1 RETURN doc`, {i:params.i})
      var startThree = Date.now()
      var q3 = db._query(`RETURN 1`)
      var end = Date.now();
      return {
        tq1: startTwo - startOne,
        tq2: startThree - startTwo,
        tq3: end - startThree,
        all: end - startOne
      }
    }
'''

i = 0;
while i < 100000:
    aql = '''
    FOR doc IN test FILTER doc.count == @i - 1 RETURN doc
    '''
    db.AQLQuery(aql, rawResults=True, batchSize=1, count=True, bindVars= {'i' : i})
    times = db.transaction(['test'], transaction, params={'i': i})['result']
    for which in times:
        statsdc.timing(which, times[which])

The python script picks the `result` part containing the return values of the transaction. It then iterates over the values, and sends it to statsd so we can, later on, have a look at it using kcollectd:

We choose 3 graphs, in one we put the Q1 and Q3 times. If you have a look at Q1, you will remember it from our last article – it’s covered by an index. Q3 is really simple, so we wouldn’t expect much to be going on here. The graph seconds that – Q1 is taking up to 2ms, Q3 up to 1.8ms.

But, with Q2 we added a query which is neither dead simple nor backed by an index. You can see that it takes up to 60 ms and dominates the total time of the transaction.

Again we got useful data by instrumenting parts of our application with gauges we feed into statsd.

Learn how to speed up your AQL queries. Get A Primer on Query Performance Optimization in ArangoDB

Whats next

In the next Part III article of these series, we’re going to implement a user scenario to measure possible capacity of a given system. Using that, we are going to see what to look at on clusters. Your questions, comments and feedback is more than welcome on our community slack or twitter @dothebart. Till the next time!

Willi Goesgens avatar 1452092470 92x92

Willi Goesgens

Willi is part of ArangoDB Core team. He primarily works on performance tuning, the release process of ArangoDB and helping the community with his broad experience.

Leave a Comment





Get the latest tutorials, blog posts and news: