When you build an application, you expect your database to work: connect to an endpoint, run queries, and get results. That’s the contract, and your expectation is completely reasonable. Your application should focus on business logic and features, not on distributed database coordination or handling cluster topology changes. That’s what infrastructure does. On Upsun’s Dedicated Generation 2 (DG2) architecture, we run MariaDB in a three-node Galera Cluster. Galera is a multi-master setup where any node can accept writes, which provides high availability but creates coordination challenges. Those challenges belong in the infrastructure layer. We provide you with a stable database endpoint, and behind it runs a resilient cluster. This is where ZooKeeper comes in.Documentation Index
Fetch the complete documentation index at: https://developer.upsun.com/llms.txt
Use this file to discover all available pages before exploring further.
The coordination challenge
Galera uses a quorum system where transactions must commit to at least two of three nodes before succeeding, which provides strong consistency across the cluster. The design follows the CAP theorem, meaning Galera chooses consistency and partition tolerance over constant availability. In practice, transactions can occasionally fail because another node wrote conflicting data, network latency spiked, or the quorum wasn’t reachable. Multi-master databases like Galera are designed for applications to retry on transaction conflicts, but most applications don’t implement this retry logic by default. Magento, Drupal, WordPress, and many custom applications connect to a database and expect consistent availability without having to handle these edge cases themselves. You could solve this in two ways: build retry logic into your application or handle coordination at the infrastructure layer. Given our position, we’ve chosen to solve this problem at the infrastructure level so it works for most of our customers by default.Our approach with ZooKeeper
We handle the complexity at the infrastructure layer. When you provision a triple-redundant MariaDB cluster on Upsun DG2, we expose a single primary write node while the other two nodes serve as read replicas (though they remain capable of accepting writes for failover scenarios). Your application connects to one stable endpoint, and behind the scenes all nodes stay synchronized through Galera’s multi-master replication. This gives you read-after-write consistency and distributed system reliability through a simple interface. But which node is the primary, and how do we handle transitions when a node becomes unavailable? ZooKeeper answers these questions. Apache ZooKeeper is a coordination service originally developed at Yahoo!. It’s a hierarchical key-value store that looks like a file system, where the root is/ and you can create child nodes (called znodes) under any path. Written in Java, it’s been doing this job reliably since 2008.
You might know etcd, which serves a similar purpose, but we chose ZooKeeper for its battle-tested stability and specific features for handling node failures gracefully.
Three ZooKeeper features that make it work
ZooKeeper gives us three key capabilities that solve the coordination problem: sequences, watchers, and ephemeral nodes. Let’s look at each one and how we use it.Sequences: Establishing node order
The first challenge is getting all nodes to agree on who’s primary, and ZooKeeper solves this with sequential znodes. When you create a sequential znode, ZooKeeper appends a monotonically increasing number that’s consistent across all clients. Even if three nodes create znodes simultaneously, ZooKeeper assigns them an order that all clients see the same way. Here’s what it looks like in Python using the kazoo library:/mariadb/primary/. The first node in the sequence becomes the primary, and all nodes agree on this order because ZooKeeper guarantees consistency. The primary node gets traffic while the others stand by as read replicas.
Watchers: Staying in sync
What happens when the primary node dies? The other nodes need to know immediately so they can promote a new primary. ZooKeeper provides watchers, which are one-time notifications that fire when a znode changes. Each node sets a watch on/mariadb/primary/, and when nodes join or leave, those watchers fire.
Here’s how it works:
Ephemeral nodes: Automatic cleanup
The third piece is ephemeral nodes, which are znodes tied to a client session that vanish when the client disconnects. This solves the hardest problem in distributed systems: detecting failures. Did a node die, or did it temporarily lose network connectivity? ZooKeeper handles this through session timeouts. Here’s what an ephemeral node looks like:- MariaDB crashes: Health check fails, agent drops its session, and the node is removed
- Network partition: The node can’t reach ZooKeeper, session timeout expires, and the node is removed
- Entire VM dies: Session times out and the ephemeral node vanishes
Beyond databases: Worker management
We use the same ZooKeeper pattern for worker processes. Many applications run background workers to process queues, send emails, or generate reports, and while you want workers for high availability, running the same worker on multiple nodes creates problems. Queue systems like RabbitMQ can coordinate multiple workers so each job gets processed once, but that’s extra complexity. What if you could run the worker on one node at a time with automatic failover? Same ZooKeeper pattern. Each node’s agent creates an ephemeral sequential znode in/workers/email-sender/, the first node in sequence starts the worker, and the others wait. When that node dies, its ephemeral node disappears, the next node in sequence sees the change, and it starts its worker.
You get high availability without building distributed coordination into your worker code. The worker runs somewhere, and if that node dies, it runs somewhere else. Your application doesn’t need to know which node.
The takeaway
ZooKeeper provides a single source of truth for cluster coordination through three features that work together:- Sequences establish consistent ordering across all nodes
- Watchers enable immediate coordination when cluster state changes
- Ephemeral nodes provide automatic cleanup when nodes become unavailable