ArangoDB | PyArango Performance Analysis – Transaction Inspection
Following the previous blog post on performance analysis with pyArango, where we had a look at graphing using statsd for simple queries, we will now dig deeper into inspecting transactions. At first, we split the initialization code and the test code.
Initialisation code
We load the collection with simple documents. We create an index on one of the two attributes:
#!/usr/bin/python
from pyArango.connection import *
from pyArango.collection import *
conn = Connection(username="root", password="")
db = conn["_system"]
if not db.hasCollection('test'):
testCol = db.createCollection('Collection', name='test')
else:
testCol = db.collections['test']
testCol.truncate()
i = 0;
while i < 100000:
i+=1
testCol.createDocument({'foo': 'bar', 'count': i, 'counter': i}).save()
testCol.ensureHashIndex(['count'], sparse=False)
Application code
Transaction code
We’re creating a simple ArangoDB Transaction containing 3 queries:
function(params) {
var db = require('@arangodb').db;
var startOne = Date.now();
var q1 = db._query(`FOR doc IN test FILTER doc.count == @i - 1 RETURN doc`, {i:params.i})
var startTwo = Date.now()
var q2 = db._query(`FOR doc IN test FILTER doc.counter == @i - 1 RETURN doc`, {i:params.i})
var startThree = Date.now()
var q3 = db._query(`RETURN 1`)
var end = Date.now();
return {
tq1: startTwo - startOne,
tq2: startThree - startTwo,
tq3: end - startThree,
all: end - startOne
}
We put in the counter as a parameter the same way we used it in the last article (Performance analysis with pyArango: Part I). We measure the current time between the 3 queries. As a return value, we calculate the deltas to the invoking program.
Joining the transaction code into the python script
Here we have the complete application code including a transaction that we want to monitor:
#!/usr/bin/python
import sys
import statsd
from pyArango.connection import *
from pyArango.collection import *
statsdc = statsd.StatsClient('127.0.0.1', '8125')
conn = Connection(username="root", password="", statsdClient = statsdc)
db = conn["_system"]
testCol = db.collections['test']
transaction = '''
function(params) {
var db = require('@arangodb').db;
var startOne = Date.now();
var q1 = db._query(`FOR doc IN test FILTER doc.count == @i - 1 RETURN doc`, {i:params.i})
var startTwo = Date.now()
var q2 = db._query(`FOR doc IN test FILTER doc.counter == @i - 1 RETURN doc`, {i:params.i})
var startThree = Date.now()
var q3 = db._query(`RETURN 1`)
var end = Date.now();
return {
tq1: startTwo - startOne,
tq2: startThree - startTwo,
tq3: end - startThree,
all: end - startOne
}
}
'''
i = 0;
while i < 100000:
aql = '''
FOR doc IN test FILTER doc.count == @i - 1 RETURN doc
'''
db.AQLQuery(aql, rawResults=True, batchSize=1, count=True, bindVars= {'i' : i})
times = db.transaction(['test'], transaction, params={'i': i})['result']
for which in times:
statsdc.timing(which, times[which])
The python script picks the `result` part containing the return values of the transaction. It then iterates over the values, and sends it to statsd so we can, later on, have a look at it using kcollectd:
We choose 3 graphs, in one we put the Q1 and Q3 times. If you have a look at Q1, you will remember it from our last article – it’s covered by an index. Q3 is really simple, so we wouldn’t expect much to be going on here. The graph seconds that – Q1 is taking up to 2ms, Q3 up to 1.8ms.
But, with Q2 we added a query which is neither dead simple nor backed by an index. You can see that it takes up to 60 ms and dominates the total time of the transaction.
Again we got useful data by instrumenting parts of our application with gauges we feed into statsd.
Whats next
In the next Part III article of these series, we’re going to implement a user scenario to measure possible capacity of a given system. Using that, we are going to see what to look at on clusters. Your questions, comments and feedback is more than welcome on our community slack or twitter @dothebart. Till the next time!
Get the latest tutorials, blog posts and news: