An Introduction to Geo Indexes and their performance characteristics: Part I

Starting with the mass-market availability of smartphones and continuing with IoT devices, self-driving cars ever more data is generated with geo information attached to it. Analyzing this data in real-time requires the use of clever indexing data-structures. Geo data in ArangoDB consists of 2 or more dimensions representing (x, y) coordinates on the earth surface. Searching on a single number is essentially a solved problem, but effectively searching on multi-dimensional data can be more difficult as standard indexing techniques cannot be used.
Read more

More info...

ArangoDB 3.3: DC2DC Replication, Encrypted Backup

Just in time for the holidays we have a nice present for you all - ArangoDB 3.3. This release focuses on replication, resilience, stability and thus on general readiness for your production small and very large use cases. There are improvements for the community as well as for the Enterprise Edition. We sincerely hope to have found the right balance between them.

In the Community Edition there are:

  • Easier server-level replication
  • A resilient active/passive mode for single server instances with automatic failover
  • RocksDB throttling for increased guaranteed write performance
  • Faster collection and shard creation in the cluster
  • Lots of bug fixes (most of them have been backported to 3.2)

In the Enterprise Edition there are:

  • Datacenter to datacenter replication for clusters
  • Encrypted backup and restore

That is, this is all about improving replication and resilience. For us, the two new exciting features are datacenter to datacenter replication and the resilient active-passive mode for single-servers.

Datacenter to datacenter replication

Every company needs a disaster recovery plan for all important systems. This is true from small units like single processes running in some container to the largest distributed architectures. For databases in particular this usually involves a mixture of fault-tolerance, redundancy, regular backups and emergency plans. The larger a data store, the more difficult it is to come up with a good strategy.

Therefore, it is desirable to be able to run a distributed database in one datacenter and replicate all transactions to another datacenter in some way. Often, transaction logs are shipped over the network to replicate everything in another, identical system in the other datacenter. Some distributed data stores have built-in support for multiple datacenter awareness and can replicate between datacenters in a fully automatic fashion.

ArangoDB 3.3 takes an evolutionary step forward by introducing multi-datacenter support, which is asynchronous datacenter to datacenter replication. Our solution is asynchronous and scales to arbitrary cluster sizes, provided your network link between the datacenters has enough bandwidth. It is fault-tolerant without a single point of failure and includes a lot of metrics for monitoring in a production scenario.

Read more on the Datacenter to Datacenter Replication and follow generic installation instructions.

This is a feature available only in the Enterprise Edition.

Server-level replication

We have had asynchronous replication functionality in ArangoDB since release 1.4. But setting it up was admittedly too hard. One major design flaw in the existing asynchronous replication was that the replication is for a single database only.

Replicating from a leader server that has multiple databases required manual fiddling on the follower for each individual database to replicate. When a new database was created on the leader, one needed to take action on the follower to ensure that data for that database got actually replicated. Replication on the follower also was not aware of when a database was dropped on the leader.

This is now fixed in 3.3. In order to set up replication on a 3.3 follower for all databases of a given 3.3 leader, there is now the so-called `globalApplier`. It has the same interface as the existing `applier`, but it will replicate from all database on the leader and not just a single one.

As a consequence, server-global replication can now be set up permanently with a single JavaScript command or API call.

A resilient active/passive mode for single server instances with automatic failover

While it was always possible to set up two servers and connect them via asynchronous replication, the replication setup was not straightforward (see above), and it also did not handle automatic failover. In case of the leader having died, one needed to have some machinery in place to stop replication on the follower and make it the leader. ArangoDB did not provide this machinery, and left it to client applications to solve the failover problem.

With 3.3, this has become much easier. There is now a mode to start two arangod instances as a pair of connected servers with automatic failover.

The two servers are connected via asynchronous replication. One of the servers is the elected leader, and the other one is made a follower automatically. At startup, the two
servers race for leadership. The follower will automatically start replication from the leader for all databases, using the server-global replication (see above).

When the leader goes down, this is automatically detected by an agency instance, which
is also started in this mode. This instance will make the previous follower stop its replication and make it the new leader.

The follower will automatically deny all read and write requests from client applications. Only the replication is allowed to access the follower's data until the follower becomes a new leader.

The arangodb starter does support starting two servers with asynchronous replication and failover out of the box, making the setup even easier.

The arangojs driver for JavaScript, GO, PHP Java drivers for ArangoDB are also in the making to support automatic failover in case the currently used server endpoint responds with HTTP 503. Read more details on the Java driver.

Encrypted backup

This feature allows to create an encrypted backup using arangodump. We use AES256 for the encryption. The encryption key can be read from a file or from a generator program. It works in single server and cluster mode. Together with the encryption at rest this allows to keep all your sensible data encrypted whenever it is not in memory.

Here is an example for encrypted backup:

arangodump --collection "secret" dump --encryption.keyfile ~/SECRET-KEY

As you can see, in order to create an encrypted backup, simply add the --encryption.keyfile option when invoking arangodump. Needless to say, restore is equally easy using arangorestore.

The key must be exactly 32 bytes long (this is a requirement of the AES block cipher we are using). For details see the documentation in the manual.

Note that encrypted backups can be used together with the already existing RocksDB encryption-at-rest feature, but they can also be used for the MMFiles engine, which does not have encryption-at-rest.

This is a feature available only in the Enterprise Edition.

RocksDB throttling

While throttling may sound bad at first, the RocksDB throttling is there for a good reason. It throttles write operations to RocksDB in the RocksDB storage engine, in order to prevent total stalls. The throttling is adaptive, meaning that it automatically adapts to the write rate. Read more about RocksDB throttling.

Faster shard creation in cluster

Creating collections is what all ArangoDB users do. It's one of the first steps carried out. So it should be as quick as possible.

When using the cluster, users normally want resilience, so replicationFactor is set to at least 2. The number of shards is often set to pretty high values (collections with 100 shards).

Internally this will first store the collection metadata in the agency, and then the assigned shard leaders will pick up the change and will begin creating the shards. When the shards are set up on the leader, the replication is kicked off, so every data modification will not only become effective on the leader, but also on the followers. This process has got some shortcuts for the initial creation of shards in 3.3.

Conclusion

The entire ArangoDB team is proud to release version 3.3 of ArangoDB just in time for the holidays! We hope you will enjoy the upgrade. We invite you to take ArangoDB 3.3 for a spin and to let us know what you think via our Community Slack channel or hacker@arangodb.com. We look forward to your feedback!

Download ArangoDB 3.3</a

More info...

Spring is coming! – ArangoDB meets Spring Data

This year we got a lot of requests from our customers to provide Spring Data support for ArangoDB. So we listened and teamed up with one of our bigger customers from the financial sector to develop a Spring Data implementation for ArangoDB. We have also made an extensive demo on how to use Spring Data ArangoDB with an example data set of Game of Thrones characters and locations. So, Spring is not only coming... it is already there!
Avocadogot

What is the Spring Framework?

The Spring Framework is an open source Java application framework which provides an Inversion of Control (IoC) Container to manage Plain-Old-Java-Objects (POJO) through Dependency Injection (DI). The Spring Framework includes a wide range of modules providing several services. One interesting module for us is Spring Data which provides a Spring like programming model for data access. There are already a lot of subprojects of Spring Data which specify data access to different specific database technologies and now ArangoDB joins in.

New to multi-model and graphs? Check out our free ArangoDB Graph Course.

Why did it take some time?

To successfully implement our Spring Data module for ArangoDB we needed a solid base with our team and with our Java driver which runs under the hood. In the last two years, we expanded our team of developers who are well versed in the Java world. With their help we implemented a completely new Java driver with a more intuitive and object oriented API, but also several new features like VelocyStream support, multi-document operations, automatic fallback and, with the current release, built-in load balancing. Especially the load balancing and fallback features were a high priority for our customers running ArangoDB in a cluster setup. With this preliminary work, we started implementing our Spring Data module this year in close teamwork with two great developers from our customer. (Thanks for your great work, guys!)

Features

Spring Data ArangoDB provides a solution for all core concepts of Spring Data. With ArangoTemplate you are able to perform common database operations from managing databases, collections and graphs to single or batch CRUD operations. This includes annotation based object mapping of POJOs to VelocyPack documents (ArangoDBs internal storage format) and exception translation into data access exceptions used in Spring. You are also able to write repository interfaces which will be automatically implemented. Within these repository interfaces, you can implement custom methods from which AQL queries will be derived. With this feature, you can perform a wide range of queries with filter conditions, joins, graph traversals and even geospatial queries. It also supports passing AQL bind parameters as parameters in your method. But you can still write your AQL queries on your own and attach it with the @Query annotation to your custom method.

Example

Configuration

Configure the connection to ArangoDB server and enable Spring Data ArangoDB repositories.


@Configuration
@EnableArangoRepositories(basePackages = { "com.arangodb.spring.demo" })
public class DemoConfiguration extends AbstractArangoConfiguration {
  @Override
  public Builder arango() {
    return new ArangoDB.Builder().host("localhost", 8529).user("root").password(null);
  }

  @Override
  public String database() {
    return "spring-demo";
  }
}

Entities

Create and annotate your objects.


@Document
public class Character {

  @Id
  private String id;
  private Integer age;
  @Relations(edges = ChildOf.class, lazy = true)
  private Collection<Character> childs;

}

@Edge
public class ChildOf {

  @Id
  private String id;
  @From
  private Character child;
  @To
  private Character parent;
  
}

Repositories

Create repository interfaces and implement custom methods.


public interface CharacterRepository extends ArangoRepository<Character> {

  Iterable<Character> findByChildsAgeBetween(int lowerBound, int upperBound);

}

public interface ChildOfRepository extends ArangoRepository<ChildOf> {

}

Usage


@Autowired CharacterRepository characterRepo;
@Autowired ChildOfRepository childOfRepo;

List<Character> characters = ...
characterRepo.save(characters)

List<ChildOf> edges = ...
childOfRepo.save(edges);

Iterable<Character> childsBetween16a20 = repo.findByChildsAgeBetween(16, 20);

Go through the Spring Data ArangoDB demo

More info...

ArangoDB Java Driver: Load Balancing for Performance

The newest release 4.3.2 of the official ArangoDB Java driver comes with load balancing for cluster setups and advanced fallback mechanics.

Load balancing strategies

Round robin

There are two different strategies for load balancing that the Java driver provides. The first and most common strategy is the round robin way. Round robin does, what the name already assumes, a round robin load balancing where a list of known coordinators in the cluster is iterated through. Each database operation uses a different coordinator than the one before. Read more

More info...

ArangoDB Named Best Free Graph Database by G2 Crowd Users

ArangoDB named by G2 Crowd users as the most popular graph database used today.

ArangoDB has been identified as the highest rated graph database, based on its high levels of customer satisfaction and likeliness to recommend ratings from real G2 Crowd users.

ArangoDB received a near perfect 4.9 out of 5 star average for user satisfaction for its free platform across its 24 user reviews. ArangoDB users point to the database’s query language, availability and storage as the three most liked features of the product. Read more

More info...

AWS Neptune: A New Vertex in the Graph World — But Where’s the Edge?

At AWS Re:Invent just a few days ago, Andy Jassy, the CEO of AWS, unveiled their newest database product offerings: AWS Neptune. It’s a fully managed, graph database which is capable of storing RDF and property graphs. It allows developers access to data via SPARQL or java-based TinkerPop Gremlin. As versatile and as good as this may sound, one has to wonder if another graph database will solve a key problem in modern application development and give Amazon an edge over its competition. Read More

More info...

ArangoDB | RocksDB Integration: Performance Enhancement

I have varying levels of familiarity with Google’s original leveldb and three of its derivatives. RocksDB is one of the three. In each of the four leveldb offerings, the code is optimized for a given environment. Google’s leveldb is optimized for a cell phone, which has much more limited resources than a server. RocksDB is optimized for flash arrays on a large servers (per various Rocksdb wiki pages). Note that a flash array is a device of much higher throughput than a SATA or SSD drive or array. It is a device that sits on the processor’s bus. RocksDB’s performance benchmark page details a server with 24 logical CPU cores, 144GB ram, and two FusionIO flash PCI devices. Each FusionIO device cost about $10,000 at the time of the post. So RocksDB is naturally tuned for extremely fast and expensive systems. Here is an example Arangodb import on a machine similar to the RocksDB performance tester: Read more

More info...

ArangoDB | Introduction to Fuerte: ArangoDB C++ Driver

In this post, we will introduce you to our new ArangoDB C++ diver fuerte. fuerte allows you to communicate via HTTP and VST with ArangoDB instances. You will learn how to create collections, insert documents, retrieve documents, write AQL Queries and how to use the asynchronous API of the driver.

Requirements (Running the sample)

Please download and inspect the sample described in this post. The sample consists of a C++ – Example Source Code – File and a CMakeLists.txt. You need to install the fuerte diver, which can be found on github, into your system before compiling the sample. Please follow the instructions provided in the drivers README.md. Read More

More info...

ArangoDB 3.3 Beta Release – New Features and Enhancements

It is all about improving replication. ArangoDB 3.3 comes with two new exciting features: data-center to data-center replication for clusters and a much improved active-passive mode for single-servers. ArangoDB 3.3 focuses on replications and improvements in this area and provides a much better user-experience when setting up a resilient single-servers with automatic failover.

This beta release is feature complete and contains stability improvements with regards to the recent milestone 1 and 2 of ArangoDB 3.3. However, it is not meant for production use, yet. We will provide ArangoDB 3.3 GA after extensive internal and external testing of this beta release. Read More

More info...

ArangoDB | Infocamere Investigation: Graph Databases Case Study

InfoCamere is the IT company of the Italian Chambers of Commerce. By devising and developing up-to-date and innovative IT solutions and services, it connects the Chambers of Commerce and their databases through a network that is also accessible to the public via the Internet. Thanks to InfoCamere, businesses, Public Authorities, trade associations, professional bodies and simple citizens – both in Italy and abroad – can easily access updated and official information and economic data on all businesses registered and operating in Italy.

The Italian Chambers of Commerce are public bodies entrusted to serve and promote Italian businesses through over 300 branch offices located throughout the country. InfoCamere helps them in pursuing their goals in the interest of the business community. On behalf of the Chambers’ System, InfoCamere plays a key-role in implementing the Italian Digital Agenda with respect to the digital transformation process of the national productive system, especially focusing on supporting the digitalization of SMEs.

Guest post by Luca Sinico (Software Developer, InfoCamere)

Read more

More info...

Get the latest tutorials,
blog posts and news: