Max Neunhöffer
Max Neunhöffer
Senior Developer and Architect
Bio:
Max Neunhöffer is a mathematician turned database developer. In his academic career he has worked for many years on the development and implementation of new algorithms in computer algebra. During this time he has juggled a lot with mathematical big data like group orbits containing trillions of points.
In 2013, he shifted his focus to NoSQL databases and is now a core developer and architect of ArangoDB. He is responsible for the distributed aspects of ArangoDB.
He has spoken at international conferences including O’Reilly Software Architecture London, J On The Beach and Strata London.
Talks proposals:
Graph databases allow users to analyze highly interconnected datasets and find patterns within these relationships. Social networks, corporate hierarchies, fraud detection, network analytics, or building whole knowledge graphs are great use cases for graph databases. However, these datasets of nodes and connecting edges change over time. Whether you are a developer, architect or data scientist, you may want to time travel for analyzing the past or even predict tomorrow.
While your graph database may be lacking built-in support for managing the revision history of graph data, this talk will show you how to manage it in a performant manner for general classes of graphs. Best of all, this won’t require any groundbreaking new ideas. We’ll simply borrow a few tools and tricks from existing persistent data structure literature and adapt them for good performance within the graph database software. This will help enable new ways to manipulate and exploit graph data and hopefully power new and exciting applications.
ArangoDB is a scalable, distributed multi-model database. However, for this talk, it is not necessary to know what this means. Rather the only crucial fact is that it is distributed and written in C++.
Before you stop reading: This talk is about a golang success story.
Namely, we had to implement resilient data center to data center (DC2DC) replication for ArangoDB clusters from scratch within 6 weeks (plus some time for testing and debugging). Therefore, we built upon
– ArangoDB’s HTTP-based API for asynchronous replication,
– the existing golang driver,
– the fault tolerant scalable message queue system Kafka,
– a lot of existing golang libraries and
– golang’s fantastic capabilities for parallelism, communication and data manipulation
and pulled this task off. This talk is the story of this project with its many challenges and successes and ends with a surprising revelation about which of the above we did not actually need in the end.
What we see in the modern data store world is a race between different approaches to achieve a distributed and resilient storage of data. Most applications need a stateful layer which holds the data. There are at least three necessary ingredients which are everything else than trivial to combine and of course even more challenging when heading for an acceptable performance.
Over the past years there has been significant progress in respect in both the science and practical implementations of such data stores. In his talk Max Neunhoeffer will introduce the audience to some of the needed ingredients, address the difficulties of their interplay and show four modern approaches of distributed open-source data stores.
Topics are:
- Challenges in developing a distributed, resilient data store
- Consensus, distributed transactions, distributed query optimization and execution
- The inner workings of ArangoDB, Cassandra, Cockroach and RethinkDB
The talk will touch complex and difficult computer science, but will at the same time be accessible to and enjoyable by a wide range of developers.
The complexity and amount of data rises. Modern graph databases are designed to handle the complexity but still not for the amount of data. When hitting a certain size of a graph, many dedicated graph databases reach their limits in vertical or, most common, horizontal scalability. In this talk I’ll provide a brief overview about current approaches and their limits towards scalability. Dealing with complex data in a complex system doesn’t make things easier… but more fun finding a solution. Join me on my journey to handle billions of edges in a graph database.
There are a lot of variants of Linux out there and many versions of each.
As a manufacturer of an application that is written in C++, providing binary packages to customers is a nightmare. Wouldn’t it be nice if one could make a “Linux package” with a unique static binary that simply works everywhere? As it turns out, this is essentially possible, but needs a bit of care.
This is a story about using Alpine Linux and libmusl to build completely static binaries, creating universal Debian packages which run on any version of Debian or Ubuntu, creating universal RPM packages which run on any version of any RPM based Linux distribution and about doing all this on any variant of Linux using Docker images of various Linux distributions.
For us at ArangoDB, this approach brings the release process down from 10 hours to approximately half an hour.
Want to learn more about multi-model and graphs? Have a look here:
Recent talks:
- Devoxx Belgium:
The Computer Science behind a modern distributed data store
View Video - Data Works Summit:
Fishing Graphs in a Hadoop Data Lake
View Video - MesosCon Asia:
Handling Billions of Edges in a Graph Database
View Video - DevOps & Infrastructure NRW:
View Slides
Recent podcasts:
- Multi-model databases and ArangoDB
Listen to the podcast - ArangoDB
Listen to the podcast