home shape

Tutorial: ArangoDB with Python

For Python developers there are several drivers available, allowing you to operate on and administer ArangoDB servers and databases from within your applications.

Note: This tutorial was written for ArangoDB 3 series and may not work with older versions.

This tutorial is based on the pyArango driver by Tariq Daouda. You need to install and start ArangoDB on your host. Then install pyArango from the Python Package Index.

$ pip install pyarango --user

Once pip finishes the installation process, you can begin developing ArangoDB application in Python.

background img reverse min

ArangoDB with Python Usage

In order to operate on ArangoDB servers and databases from within your application, you need to establish a connection to the server then use it to open or create a database on that server.

PyArango manages server connections through the conveniently named Connection class.

>>> from pyArango.connection import *
>>> conn = Connection(username="root", password="root_passwd")

When this code executes, it initializes the server connection on the conn variable. By default, pyArango attempts to establish a connection to That is, it wants to initialize a remote connection to your local host on port 8529. If you are hosting ArangoDB on a different server or use a different port, you need to set these options when you instantiate the Connection class.

Creating and Opening Databases

With a connection to the ArangoDB Server, you can open or create a database on the server and begin to operate on it. The createDatabase() method on the server connection handles both operations, returning a Database instance.

>>> db = conn.createDatabase(name="school")

When the school database does not exist, pyArango creates it on the server connection. When it does exist, it attempts to open the database. You can also open an existing database by using its name as a key on the server connection. For instance,

>>> db = conn["school"]
>>> db
ArangoDB database: school

Creating Collections

ArangoDB groups documents and edges into collections. This is similar to the concept of tables in Relational databases, but with the key difference that collections are schema-less.

In pyArango, you can create a collection by calling the createCollection() method on a given database. For instance, in the above section you created a school database. You might want a collection on that database for students.

>>> studentsCollection = db.createCollection(name="Students")
>>> db["Students"]
ArangoDB Collection name: Students, id: 202, type: document, status loaded
background img

Creating Documents

With the database and collection set up, you can begin adding data to them. Continuing the comparison to Relational databases, where a collection is a table, a document is a row on that table. Unlike rows, however, documents are schema-less. You can include any arrangement of values you need for your application.

For instance, add a student to the collection:

>>> doc1 = studentsCollection.createDocument()
>>> doc1["name"] = "John Smith"
>>> doc1
ArangoDoc 'None': {'name': 'John Smith'}
>>> doc2 = studentsCollection.createDocument()
>>> doc2["firstname"] = "Emily"
>>> doc2["lastname"] = "Bronte"
>>> doc2
ArangoDoc 'None': {'firstname': 'Emily', 'lastname': 'Bronte'}

The document shows its _id as “None” because you haven’t yet saved it to ArangoDB. This means the variable exists in your Python code, but not the database. ArangoDB constructs _id values by pairing the collection name with the _key value. It also handles the assignment for you, you just need to set the key and save the document.

>>> doc1._key = "johnsmith"
>>> doc1.save()
>>> doc1
ArangoDoc 'Students/johnsmith': {'name': 'John Smith'}

Rather than enter and save the data for all the students manually, you might want to enter the data through a loop rather than individual calls. For instance,

>>> students = [('Oscar', 'Wilde', 3.5), ('Thomas', 'Hobbes', 3.2), 
... ('Mark', 'Twain', 3.0), ('Kate', 'Chopin', 3.8), ('Fyodor', 'Dostoevsky', 3.1), 
... ('Jane', 'Austen',3.4), ('Mary', 'Wollstonecraft', 3.7), ('Percy', 'Shelley', 3.5), 
... ('William', 'Faulkner', 3.8), ('Charlotte', 'Bronte', 3.0)]
>>> for (first, last, gpa) in students:
...    doc = studentsCollection.createDocument()
...    doc['name'] = "%s %s" % (first, last)
...    doc['gpa'] = gpa 
...    doc['year'] = 2017
...    doc._key = ''.join([first, last]).lower() 
...    doc.save()

Reading Documents

Eventually, you’ll need to access documents in the database. The easiest way to do this is with the _key value.

For instance, the school database now has several students. Imagine it as part of a larger application with more data available on each student and you would like to check the GPA of a particular student:

>>> def report_gpa(document):
...    print("Student: %s" % document['name'])
...    print("GPA:     %s" % document['gpa'])
>>> kate = studentsCollection['katechopin']
>>> report_gpa(kate)
Student: Kate Chopin
GPA:     3.8

Updating Documents

When you read a document from ArangoDB into your application, you create a local copy of the document. You can then operate on the document, making whatever changes you like to it, then push the results to the database using the save() method.

For instance, each semester as the final grades come in from their classes, you need to update the students’ grade point averages on the database. Given that this happens frequently, you might want to create a specific function to handle the update:

>>> def update_gpa(key, new_gpa):
...    doc = studentsCollection[key]
...    doc['gpa'] = new_gpa
...    doc.save()

Listing Documents

Occasionally, you may want to operate on all documents in a given collection. Using the fetchAll() method, you can retrieve and iterate over a list of documents. For instance, say it’s the end of the semester and you want to know which students have a grade point average above 3.5:

>>> def top_scores(col, gpa):
...    print("Top Soring Students:")
...    for student in col.fetchAll():
...       if student['gpa'] >= gpa:
...          print("- %s" % student['name'])
>>> top_scores(studentsCollection, 3.5)
Top Scoring Students:
- Mary Wollstonecraft 
- Kate Chopin
- Percy Shelly
- William Faulkner
- Oscar Wilde

Removing Documents

Eventually, you may want to remove documents from the database. This can be accomplished with the delete() method. For instance, say that the student Thomas Hobbes has decided to move to another city:

>>> tom = studentsCollection["thomashobbes"]
>>> tom.delete()
>>> studentsCollection["thomashobbes"]
KeyError: (
   'Unable to find document with _key: thomashobbes', {
      'code': 404,
	  'errorNum': 1202,
	  'errorMessage': 'document Students/thomashobbes not found',
	  'error': True

AQL Usage

In addition to the Python methods shown above, ArangoDB also provides a query language, (called AQL), for retrieving and modifying documents on the database. In pyArango, you can issue these queries using the AQLQuery() method.

For instance, say you want to retrieve the keys for all documents in ArangoDB:

>>> aql = "FOR x IN Students RETURN x._key"
>>> queryResult = db.AQLQuery(aql, rawResults=True, batchSize=100)
>>> for key in queryResult:
...    print(key)

In the above example, the AQLQuery() method takes the AQL query as an argument, with two additional options:

  • rawResults Defines whether you want the actual results returned by the query.
  • batchSize When the query returns more results than the given value, the pyArango driver automatically asks for new batches.

Bear in mind, the order of the documents isn’t guaranteed. In the event that you need the results in a particular order, add a sort clause to the AQL query.

Inserting Documents with AQL

Similar to document creation above, you can also insert documents into ArangoDB using AQL. This is done with an INSERT statement using the bindVars option for the AQLQuery() method.

>>> doc = {'_key': 'denisdiderot', 'name': 'Denis Diderot', 'gpa': 3.7}
>>> bind = {"doc": doc}
>>> aql = "INSERT @doc INTO Students LET newDoc = NEW RETURN newDoc"
>>> queryResult = db.AQLQuery(aql, bindVars=bind)

Using the RETURN newDoc sets the new document added to the database as the return value. Meaning that, you can now see the results by calling:

>>> queryResult[0]
ArangoDoc 'Students/denisdiderot': {'name': 'Denis Diderot', 'gpa': 3.7}

Updating Documents with AQL

In cases where a document already exists in your database and you would like to modify data in that document, you can use the UPDATE statement. For instance, say that you receive the students’ updated grade point average in a CSV file.

First, check the GPA of one of the students to see the old value:

>>> db["Students"]["katechopin"]
ArangoDoc 'Students/katechopin': {'name': 'Kate Chopin', 'gpa': 3.6}

Then loop through the file updating the GPA’s of each student:

>>> with open("grades.csv", "r') as f:
...    grades = f.read().split(',')
>>> for key,gpa in grades.items():
...    doc = {"gpa": float(gpa)}
...    bind = {"doc": doc, "key": key}
...    aql = "UPDATE @key WITH @doc IN Stdents LET pdated NEW RETRN updated"
...    db.AQLQuery(aql, bindVars=bind)

Lastly, check the student’s GPA again.

>>> db["Students"]["katechopin"]
ArangoDoc 'Students/katechopin': {'name': 'Kate Chopin', 'gpa': 4.0}

Though it’s possible for a student to have the same GPA between semesters, in this case Kate’s GPA went up by a few points.

Removing Documents with AQL

Lastly, you can also remove documents from ArangoDB using REMOVE statements. For instance, imagine that this year of students are done and have graduated, you want to remove them from the database. You can use year property to differentiate between different classes of students, which allows you to use a FILTER clause to keep some and remove others.

>>> bind = {"@collection": "Students"}
>>> aql = """
... FOR x IN @@collection
...   FILTER x.year == 2017
...   REMOVE x IN @@collection
...     LET removed = OLD RETURN removed
... """
>>> queryResult = db.AQLQuery(aql, bindVars=bind)

The FILTER condition only iterates over documents that match the condition. The statement REMOVE x IN deletes the documents (matching that condition). The @@collection (note the @@), defines the bound variable for the name of the collection.

The return value given by AQLQuery() preserves the old documents. So, if you query it directly, it’ll show you the data.

>>> queryResult[0]
ArangoDoc 'Studnets/williamfaulkner': {'name': 'William Faulkner', 'gpa': 3.8, 'year': 2017}

If instead you attempt to retrieve the document from the database, ArangoDB returns an error.

>>> db["Students"]["katechopin"]
ArangoDoc 'Students/katechopin': {'name': 'Kate Chopin', 'gpa': 4.0}
  • Learn more

    Now you know how to work with ArangoDB.