Using Cytoscape with ArangoDB
In this tutorial, we would like to visualize the data of a graph stored in ArangoDB for a human read-able overview.
This overview often helps to get a general understanding of non-artifically created data, or for a third party dataset that was not designed by ourselves.
In this tutorial we have the case of a third party dataset designed by Marius Bäsler in his master thesis.
His goal is to find the origins of parasitism with the help of GLoBIs interaction database.
This data dump can be downloaded here.
The dataset describes several organisms that live either in symbiotic or parasitary relation to one another.
In order to import the dataset we can just restore it into a running ArangoDB with
arangorestore --input-directory /path/to/extracted/dump
After this command succeeded you will end up with two collections:
nodes_otl_suba document collection containing species, genera and families.
edges_otl_suba edge collection, where each edge defines a relation between the nodes.
Now we have the dataset in ArangoDB and are ready to go.
The goal is to export the data in
xgmml format, which is readable by
cytoscape the tool we want to use to visualize the data.
Unfortunately, this format requires that all vertices only have string datatypes.
So we need to normalize our dataset first and convert all attributes of the vertices to string.
Furthermore, each document needs to have identical attributes, which is also done by this step.
NOTE: this step requires some computation and does not scale well for larger datasets, if you have this situation and need some guidance please contact us on Slack, we can help you out there.
In order to do this normalization we are going to execute the following AQL:
LET attrs = ( FOR node IN nodes_otl_sub FOR x IN ATTRIBUTES(node, true) RETURN DISTINCT x ) FOR node IN nodes_otl_sub LET newNode = ZIP(attrs, ( FOR attr IN attrs RETURN TO_STRING(node[attr]) )) UPDATE node WITH newNode IN nodes_otl_sub
In the first step this aql collects a distinct list of attributes available in the dataset.
In the second step, it iterates over all nodes.
Then it will create a new node that has each attribute replaced with a
TO_STRING variant of it’s value.
Note here: If the attribute is not set, it will cause to save the empty string.
And then updates the document in the collection with the new node.
So after this query succeeded all vertices have all attributes and all of them are of type string.
Now we are ready to go for the export.
Exporting the data
To visualize the data we need it in
In order to transform the dataset into this format, we are using the
$> arangoexport --help Usage: arangoexport  Section 'global options' (Global configuration) --collection restrict to collection name (can be specified multiple times) (default: ) --configuration the configuration file or 'none' (default: "") --fields comma separated list of fileds to export into a csv file (default: "") --graph-name name of a graph to export (default: "") --output-directory output directory (default: "/home/mchacki/devel/export") --overwrite overwrite data in output directory (default: false) --progress show progress (default: true) --type type of export. possible values: "csv", "json", "jsonl", "xgmml", "xml" (default: "json") --version reports the version and exits (default: false) --xgmml-label-attribute specify document attribute that will be the xgmml label (default: "label") --xgmml-label-only export only xgmml label (default: false)
This tool natively supports
xgmml format so it is rather straight forward to use it.
For this export, we need to name the collections we want to export, so in our case
Obviously, we need to name the
xgmml format as type.
For easier visualization we like to give the graph a name
xgmml allows defining one attribute as label.
We select the
name for this tutorial.
So in total, our call will look like this:
$> arangoexport --collection nodes_otl_sub --collection edges_otl_sub --type xgmml --graph-name otl --xgmml-label-attribute name
And produces the following output:
Connected to ArangoDB 'http+tcp://127.0.0.1:8529', version 3.2.0, database: '_system', username: 'root' # Export graph with collections nodes_otl_sub, edges_otl_sub as 'otl' # Exporting collection 'nodes_otl_sub'... # Exporting collection 'edges_otl_sub'... Processed 2 collection(s), wrote 128432121 byte(s), 176 HTTP request(s)
After this export succeeded you will have an
export containing a file named
This finally is the
xgmml representation of our dataset.
In order to visualize and analyze the dataset please download Cytoscape. For details of this product please refer to their website. For this tutorial we are just going to use it as a visualization tool.
Cytoscape: import xgmml file
Cytoscape: apply organic layout
Cytoscape: graph overview
Cytoscape: part of the graph zoomed in