Performance analysis with pyArango: Part III Measuring possible capacity with usage Scenarios
So you measured and tuned your system like described in the Part I and Part II of these blog post series. Now you want to get some figures how many end users your system will be able to serve. Therefore you define “scenarios” which will be typical for what your users do. Read more
ArangoDB | Milestone2: ArangoDB 3.3 New Data Replication
We’re pleased to announce the availability of the Milestone 2 of ArangoDB 3.3. There are a number of improvements, please consult the changelog for a complete overview of changes.
This milestone release contains our new and improved data replication engine. The replication engine is at the core of every distributed ArangoDB setup: whether it is a typical master/slave setup between multiple single servers or a full-fledged cluster. During the last month we:
- redesigned the replication protocol to be more reliable
- refactored and modernized the internal infrastructure to better support continuous asynchronous replication
- added a new global asynchronous replication API, to allow you to automatically and continuously mirror an entire ArangoDB single-instance (master) onto another one (or more)
- added support for automatic failover from a master server to one of his replica-slaves, if the master server becomes unreachable
Milestone 1 ArangoDB 3.3: Datacenter to Datacenter Replication
Every company needs a disaster recovery plan for all important systems. This is true from small units like single processes running in some container to the largest distributed architectures. For databases in particular this usually involves a mixture of fault-tolerance, redundancy, regular backups and emergency plans. The larger a data store, the more difficult is it to come up with a good strategy.
Therefore, it is desirable to be able to run a distributed database in one datacenter and replicate all transactions to another datacenter in some way. Often, transaction logs are shipped over the network to replicate everything in another, identical system in the other datacenter. Some distributed data stores have built-in support for multiple datacenter awareness and can replicate between datacenters in a fully automatic fashion.
This post gives an overview over the first evolutionary step of ArangoDB towards multi-datacenter support, which is asynchronous datacenter to datacenter replication.
Setting up Datacenter to Datacenter Replication in ArangoDB
Please note that this tutorial is valid for the ArangoDB 3.3 milestone 1 version of DC to DC replication!
Interested in trying out ArangoDB? Fire up your cluster in just a few clicks with ArangoDB ArangoGraph: the Cloud Service for ArangoDB. Start your free 14-day trial here
This milestone release contains data-center to data-center replication as an enterprise feature. This is a preview of the upcoming 3.3 release and is not considered production-ready.
In order to prepare for a major disaster, you can setup a backup data center that will take over operations if the primary data center goes down. For a server failure, the resilience features of ArangoDB can be used. Data center to data center is used to handle the failure of a complete data center.
Data is transported between data-centers using a message queue. The current implementation uses Apache Kafka as message queue. Apache Kafka is a commonly used open source message queue which is capable of handling multiple data-centers. However, the ArangoDB replication is not tied to Apache Kafka. We plan to support different message queues systems in the future.
The following contains a high-level description how to setup data-center to data-center replication. Detailed instructions for specific operating systems will follow shortly. Read more
Auto-Generate GraphQL for ArangoDB
Currently, querying ArangoDB with GraphQL requires building a GraphQL.js schema. This is tedious and the resulting JavaScript schema file can be long and bulky. Here we will demonstrate a short proof of concept that reduces the user related part to only defining the GraphQL IDL file and simple AQL queries.
The Apollo GraphQL project built a library that takes a GraphQL IDL and resolver functions to build a GraphQL.js schema. Resolve functions are called by GraphQL to get the actual data from the database. I modified the library in the way that before the resolvers are added, I read the IDL AST and create resolver functions. Read more
ArangoDB | PyArango Performance Analysis – Transaction Inspection
Following the previous blog post on performance analysis with pyArango, where we had a look at graphing using statsd for simple queries, we will now dig deeper into inspecting transactions. At first, we split the initialization code and the test code.
Initialisation code
We load the collection with simple documents. We create an index on one of the two attributes: Read more
Performance analysis using pyArango Part I
This is Part I of Performance analysis using pyArango blog series. Please refer here for: Part II (cluster) and Part III (measuring system capacity).
Usually, your application will persist of a set of queries on ArangoDB for one scenario (i.e. displaying your user’s account information etc.) When you want to make your application scale, you’d fire requests on it, and see how it behaves. Depending on internal processes execution times of these scenarios vary a bit.
We will take intervals of 10 seconds, and graph the values we will get there:
- average – all times measured during the interval, divided by the count.
- minimum – fastest requests
- maximum – slowest requests
- the time “most” aka 95% of your users may expect an answer within – this is called 95% percentile
ArangoDB | Geo Demonstration Using Foxx – Location-Aware Applications
Geo data is getting more and more important for today’s applications. The growing number of location-aware services, IoT applications and other solutions using latitude and longitude ask for precise and fast processing of geo data.
Let me show you in this quick demonstration how you can use geo functions and visualize your data using Foxx and leaflet.js. Read more
Sorting number strings numerically
Recently I gave a talk about ArangoDB in front of a community of mathematicians. I advertised that nearly arbitrary data can “easily” be stored in a JSON based document store. The moment I had uttered the word “easily”, one of them asked about long integers. And if a mathematician says “long integer” they do not mean 64bit but “properly long”. He actually wanted to store orders of finite groups. I said one should use a JSON UTF-8 string for this but I should have seen the next question coming because he then wanted that a sorted index would actually sort the documents by the numerical value stored in the string. But most databases – and ArangoDB is no exception here – will compare UTF-8 strings lexicographically (dictionary order). Read more
Webinar: Use ArangoDB Agency as fault-tolerant persistent data store
Join our Sr Distributed System Engineer, Kaveh Vahedipour, to learn more about ArangoDB Agency on September 19th, 2017 (6PM CEST/12PM ET/ 9AM PT) – View the Recording.
Distributed systems have become the standard topology on which modern appliances live. While the advantages of distributing workload for both performance as well as fault-tolerance are obvious, the runtime flexible configuration of such deployment becomes non-trivial.
ArangoDB clusters are no different in that regard. A potentially large database cluster’s configuration is manipulated at runtime by addition, alteration and removal of collections, indexes, and even servers. All servers need to trust in a fault-tolerant centralized configuration tree, which we call “the agency” in arango-speak. Read more