> ## Documentation Index
> Fetch the complete documentation index at: https://developer.upsun.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Keeping the peace: how ZooKeeper stops database nodes from fighting

> Learn how Upsun uses Zookeeper's sequences, watchers, and ephemeral nodes to coordinate MariaDB clusters and workers across distributed systems without breaking your application.

export const PostMeta = ({data = {}}) => {
  const {author, date, image} = data;
  const authors = Array.isArray(author) ? author : author ? [author] : [];
  const resolveAuthor = slug => {
    const entry = AUTHOR_MAP[slug] || ({});
    const name = entry.name || slug;
    const github = entry.github || null;
    const linkedin = entry.linkedin || null;
    const url = github ? `https://github.com/${github}` : linkedin || null;
    const avatarUrl = github ? `https://github.com/${github}.png?size=64` : null;
    return {
      name,
      url,
      avatarUrl
    };
  };
  const formattedDate = date ? new Date(date).toLocaleDateString('en-US', {
    year: 'numeric',
    month: 'long',
    day: 'numeric'
  }) : null;
  if (!image && authors.length === 0 && !formattedDate) return null;
  const AUTHOR_MAP = {
    "aaron-collier": {
      "name": "Aaron Collier"
    },
    "aaron-dudenhofer": {
      "name": "Aaron Dudenhofer"
    },
    "aaron-porter": {
      "name": "Aaron Porter"
    },
    "adriaan-odendaal": {
      "name": "Adriaan Odendaal"
    },
    "ajmal": {
      "name": "Ajmal Siddiqui"
    },
    "akalipetis": {
      "name": "Antonis Kalipetis"
    },
    "alexander-varwijk": {
      "name": "Alexander Varwijk"
    },
    "alicia-bevilacqua": {
      "name": "Alicia Bevilacqua"
    },
    "amelie-deguerry": {
      "name": "Amelie Deguerry"
    },
    "anacidre": {
      "name": "Ana Cidre",
      "linkedin": "https://www.linkedin.com/in/ana-cidre"
    },
    "andoni": {
      "name": "Andoni Auzmendi"
    },
    "andrei-taranu": {
      "name": "Andrei (Alex) Taranu",
      "linkedin": "https://www.linkedin.com/in/andrei-alex-taranu/"
    },
    "andrew-baxter": {
      "name": "Andrew Baxter"
    },
    "andrew-melck": {
      "name": "Andrew Melck"
    },
    "antoine-crochet-damais": {
      "name": "Antoine Crochet Damais"
    },
    "augustin-delaporte": {
      "name": "Augustin Delaporte",
      "linkedin": "https://www.linkedin.com/in/augustindelaporte/"
    },
    "branislav-bujisic": {
      "name": "Branislav Bujisic"
    },
    "carl-smith": {
      "name": "Carl Smith"
    },
    "caroline-leroy": {
      "name": "Caroline Leroy"
    },
    "cati-mayer": {
      "name": "Cati Mayer"
    },
    "catplat": {
      "name": "C Trinkwon"
    },
    "ceelolulu": {
      "name": "Celeste van der Watt"
    },
    "chadwcarlson": {
      "name": "Chad Carlson",
      "github": "chadwcarlson",
      "linkedin": "https://www.linkedin.com/in/chadwcarlson"
    },
    "chris-ward": {
      "name": "Chris Ward"
    },
    "chris-yates": {
      "name": "Chris Yates"
    },
    "christian-sieber": {
      "name": "Christian Sieber"
    },
    "christopher-lockheardt": {
      "name": "Christopher Lockheardt"
    },
    "christopher-skene": {
      "name": "Christopher Skene"
    },
    "chuck-morgan": {
      "name": "Chuck Morgan"
    },
    "corey-dockendorf": {
      "name": "Corey Dockendorf"
    },
    "crell": {
      "name": "Crell"
    },
    "damz": {
      "name": "Damz"
    },
    "dan-morrison": {
      "name": "Dan Morrison"
    },
    "davidbonachera": {
      "name": "David Bonachera",
      "github": "davidbonachera",
      "linkedin": "https://www.linkedin.com/in/davidbonachera"
    },
    "dereliahmet1": {
      "name": "Ahmet Faruk Dereli"
    },
    "devicezero": {
      "name": "Jonas Kröger",
      "github": "devicezero",
      "linkedin": "https://www.linkedin.com/in/jonaskroeger/"
    },
    "doug-goldberg": {
      "name": "Doug Goldberg"
    },
    "duncan-naves": {
      "name": "Duncan Naves",
      "github": "duncannaves",
      "linkedin": "https://www.linkedin.com/in/duncan-naves-a94423aa"
    },
    "erika-bustamante": {
      "name": "Erika Bustamante"
    },
    "fabpot": {
      "name": "Fabien Potencier"
    },
    "flovntp": {
      "name": "Florent Huck",
      "github": "flovntp",
      "linkedin": "https://www.linkedin.com/in/florenthuck"
    },
    "fred-plais": {
      "name": "Fred Plais"
    },
    "gauthier-garnier": {
      "name": "Gauthier Garnier"
    },
    "gilzow": {
      "name": "Paul Gilzow"
    },
    "gmoigneu": {
      "name": "Guillaume Moigneu",
      "github": "gmoigneu",
      "linkedin": "https://www.linkedin.com/in/guillaumemoigneu/"
    },
    "gregqualls": {
      "name": "Greg Qualls"
    },
    "guguss": {
      "name": "Augustin Delaporte"
    },
    "haylee-millar": {
      "name": "Haylee Millar"
    },
    "ivana-kotur": {
      "name": "Ivana Kotur"
    },
    "jackrabbithanna": {
      "name": "Mark Hanna"
    },
    "jared-wright": {
      "name": "Jared Wright",
      "github": "jww-sh",
      "linkedin": "https://www.linkedin.com/in/jaredwaynewright"
    },
    "jessica-orozco": {
      "name": "Jessica Orozco"
    },
    "joey-stanford": {
      "name": "Joey Stanford"
    },
    "john-grubb": {
      "name": "John Grubb"
    },
    "jonas-kruger": {
      "name": "Jonas Kruger"
    },
    "kathryn-frazer": {
      "name": "Kathryn Frazer"
    },
    "kemiojo": {
      "name": "Kemi Elizabeth Ojogbede"
    },
    "kieronsambrook-smith": {
      "name": "Kieronsambrook Smith"
    },
    "laurent-arnoud": {
      "name": "Laurent Arnoud"
    },
    "letoya-boyne": {
      "name": "Letoya Boyne"
    },
    "lolautruche": {
      "name": "Jérôme Vieilledent"
    },
    "lyly-lepinay": {
      "name": "Lyly Lepinay"
    },
    "manauwar-alam": {
      "name": "Manauwar Alam"
    },
    "marc-antoine-porri": {
      "name": "Marc Antoine Porri"
    },
    "maria-antinkaapo": {
      "name": "Maria Antinkaapo"
    },
    "maria-de-anton": {
      "name": "Maria De Anton"
    },
    "mark-dorison": {
      "name": "Mark Dorison"
    },
    "markus-hausammann": {
      "name": "Markus Hausammann"
    },
    "mary-thomas": {
      "name": "Mary Thomas"
    },
    "mathias-bolt-lesniak": {
      "name": "Mathias Bolt Lesniak"
    },
    "mathieu-strauch": {
      "name": "Mathieu Strauch"
    },
    "matthias-van-woensel": {
      "name": "Matthias Van Woensel",
      "linkedin": "https://www.linkedin.com/in/matthias-van-woensel-267a069"
    },
    "michael-sharp": {
      "name": "Michael Sharp"
    },
    "mupsi": {
      "name": "Marine Gandy"
    },
    "natalie-harper": {
      "name": "Natalie Harper"
    },
    "ngommenginger": {
      "name": "Nicolas Gommenginger",
      "linkedin": "https://www.linkedin.com/in/nicolas-gommenginger"
    },
    "nicholas-bennison": {
      "name": "Nicholas Bennison"
    },
    "nicholas-vahalik": {
      "name": "Nicholas Vahalik"
    },
    "nick-hardiman": {
      "name": "Nick Hardiman"
    },
    "nickanderegg": {
      "name": "Nickanderegg"
    },
    "nicolas-grekas": {
      "name": "Nicolas Grekas",
      "github": "nicolas-grekas",
      "linkedin": "https://www.linkedin.com/in/nicolasgrekas/"
    },
    "niti-malwade": {
      "name": "Niti Malwade"
    },
    "opensocialteam": {
      "name": "Opensocialteam"
    },
    "ori-pekelman": {
      "name": "Ori Pekelman"
    },
    "otavio-santana": {
      "name": "Otavio Santana"
    },
    "palwandi": {
      "name": "Pawan Alwandi",
      "github": "pawpy",
      "linkedin": "https://www.linkedin.com/in/pawanalwandi"
    },
    "patrick-boest": {
      "name": "Patrick Boest"
    },
    "patrick-dawkins": {
      "name": "Patrick Dawkins",
      "github": "pjcdawkins",
      "linkedin": "https://www.linkedin.com/in/patrickdawkins"
    },
    "patrick-klima": {
      "name": "Patrick Klima"
    },
    "pjcdawkins": {
      "name": "Pjcdawkins"
    },
    "prineet-kaurbhurji": {
      "name": "Prineet Kaurbhurji"
    },
    "quentin-sinig": {
      "name": "Quentin Sinig"
    },
    "ralt": {
      "name": "Florian Margaine",
      "github": "ralt",
      "linkedin": "https://www.linkedin.com/in/florian-margaine-43971136"
    },
    "ramanathanramakrishnamurthy": {
      "name": "Ramanathanramakrishnamurthy"
    },
    "remi-lejeune": {
      "name": "Rémi Lejeune"
    },
    "ribel": {
      "name": "Taras Kruts"
    },
    "robert-douglass": {
      "name": "Robert Douglass"
    },
    "rudy-weber": {
      "name": "Rudy Weber"
    },
    "ryan-hicks": {
      "name": "Ryan Hicks"
    },
    "sabri-helal": {
      "name": "Sabri Helal"
    },
    "savannah-bergeron": {
      "name": "Savannah Bergeron"
    },
    "shannon-vettes": {
      "name": "Shannon Vettes"
    },
    "shawn-ogasawara": {
      "name": "Shawn Ogasawara",
      "linkedin": "https://www.linkedin.com/in/shawn-ogasawara-83a9a0/"
    },
    "shawna-spoor": {
      "name": "Shawna Spoor"
    },
    "shedrack-akintayo": {
      "name": "Shedrack Akintayo"
    },
    "simon-ruggier": {
      "name": "Simon Ruggier"
    },
    "sophie-van-der-kindere": {
      "name": "Sophie Van Der Kindere"
    },
    "stefanos-thampis": {
      "name": "Stefanos Thampis"
    },
    "stephen-weinberg": {
      "name": "Stephen Weinberg"
    },
    "sukhman-virk": {
      "name": "Sukhman Virk"
    },
    "sumaira-nazir": {
      "name": "Sumaira Nazir"
    },
    "sumer": {
      "name": "Sümer Cip"
    },
    "syed-raza": {
      "name": "Syed Raza"
    },
    "tamara-bacchia": {
      "name": "Tamara Bacchia"
    },
    "tara-arnold": {
      "name": "Tara Arnold"
    },
    "theosakamg": {
      "name": "Mickael Gaillard",
      "github": "theosakamg"
    },
    "thomasdiluccio": {
      "name": "Thomas di Luccio"
    },
    "tim-anderson": {
      "name": "Tim Anderson"
    },
    "tom-helmer-hansen": {
      "name": "Tom Helmer Hansen"
    },
    "tylermills": {
      "name": "Tyler Mills"
    },
    "upsun": {
      "name": "Upsun"
    },
    "veronika-tolkachova": {
      "name": "Veronika Tolkachova",
      "linkedin": "https://www.linkedin.com/in/veronika-tolkachova-169167a2"
    },
    "vince-parker": {
      "name": "Vince Parker"
    },
    "vinnie-russo": {
      "name": "Vincenzo Russo"
    },
    "vrobert78": {
      "name": "Vincent Robert",
      "github": "vrobert78",
      "linkedin": "https://www.linkedin.com/in/vincent-robert-498a883"
    },
    "yuriy-babenko": {
      "name": "Yuriy Babenko"
    },
    "yuriy-gerasimov": {
      "name": "Yuriy Gerasimov"
    }
  };
  return <div className="post-meta">
      {(authors.length > 0 || formattedDate) && <div className="post-meta-info">
          {authors.length > 0 && <div className="post-meta-authors">
              {authors.map(slug => {
    const {name, url, avatarUrl} = resolveAuthor(slug);
    const inner = <>
                    {avatarUrl && <img src={avatarUrl} alt={name} className="post-meta-avatar" />}
                    <span className="post-meta-author-name">{name}</span>
                  </>;
    return url ? <a key={slug} href={url} target="_blank" rel="noopener noreferrer" className="post-meta-author">
                    {inner}
                  </a> : <span key={slug} className="post-meta-author">{inner}</span>;
  })}
            </div>}
          {authors.length > 0 && formattedDate && <span className="post-meta-separator" aria-hidden="true">·</span>}
          {formattedDate && <span className="post-meta-date">{formattedDate}</span>}
        </div>}
      {image && <img src={image} alt="" className="post-meta-image" aria-hidden="true" />}
    </div>;
};

<PostMeta data={{ author: ["ralt"], date: "2025-12-09T00:00:00+00:00", image: "/images/posts/how-it-works/keeping-the-peace-how-zookeeper-stops-database-nodes-from-fighting/keeping-the-peace-how-zookeeper-stops-database-nodes-from-fighting.webp" }} />

When you build an application, you expect your database to work: connect to an endpoint, run queries, and get results. That's the contract, and your expectation is completely reasonable. Your application should focus on business logic and features, not on distributed database coordination or handling cluster topology changes. That's what infrastructure does.

On Upsun's Dedicated Generation 2 (DG2) architecture, we run MariaDB in a three-node [Galera Cluster](https://galeracluster.com/). Galera is a multi-master setup where any node can accept writes, which provides high availability but creates coordination challenges.

Those challenges belong in the infrastructure layer. We provide you with a stable database endpoint, and behind it runs a resilient cluster. This is where ZooKeeper comes in.

## The coordination challenge

Galera uses a quorum system where transactions must commit to at least two of three nodes before succeeding, which provides strong consistency across the cluster.

The design follows the [CAP theorem](https://en.wikipedia.org/wiki/CAP_theorem), meaning Galera chooses consistency and partition tolerance over constant availability. In practice, transactions can occasionally fail because another node wrote conflicting data, network latency spiked, or the quorum wasn't reachable.

Multi-master databases like Galera are designed for applications to retry on transaction conflicts, but most applications don't implement this retry logic by default. Magento, Drupal, WordPress, and many custom applications connect to a database and expect consistent availability without having to handle these edge cases themselves.

You could solve this in two ways: build retry logic into your application or handle coordination at the infrastructure layer. Given our position, we've chosen to solve this problem at the infrastructure level so it works for most of our customers by default.

## Our approach with ZooKeeper

We handle the complexity at the infrastructure layer. When you provision a triple-redundant MariaDB cluster on Upsun DG2, we expose a single primary write node while the other two nodes serve as read replicas (though they remain capable of accepting writes for failover scenarios).

Your application connects to one stable endpoint, and behind the scenes all nodes stay synchronized through Galera's multi-master replication. This gives you read-after-write consistency and distributed system reliability through a simple interface.

But which node is the primary, and how do we handle transitions when a node becomes unavailable? ZooKeeper answers these questions.

[Apache ZooKeeper](https://zookeeper.apache.org/) is a coordination service originally developed at Yahoo!. It's a hierarchical key-value store that looks like a file system, where the root is `/` and you can create child nodes (called znodes) under any path. Written in Java, it's been doing this job reliably since 2008.

You might know [etcd](https://etcd.io/), which serves a similar purpose, but we chose ZooKeeper for its battle-tested stability and specific features for handling node failures gracefully.

## Three ZooKeeper features that make it work

ZooKeeper gives us three key capabilities that solve the coordination problem: sequences, watchers, and ephemeral nodes. Let's look at each one and how we use it.

### Sequences: Establishing node order

The first challenge is getting all nodes to agree on who's primary, and ZooKeeper solves this with sequential znodes.

When you create a sequential znode, ZooKeeper appends a monotonically increasing number that's consistent across all clients. Even if three nodes create znodes simultaneously, ZooKeeper assigns them an order that all clients see the same way.

Here's what it looks like in Python using the [kazoo](https://kazoo.readthedocs.io/) library:

```python theme={null}
from kazoo.client import KazooClient

zk = KazooClient(hosts='zookeeper:2181')
zk.start()

# Create a sequential znode
path = zk.create(
    '/mariadb/primary/node-',
    b'node-hostname',
    sequence=True,
    ephemeral=True
)

print(f"Created: {path}")
# Output: Created: /mariadb/primary/node-0000000001

# Get all nodes and sort them
children = sorted(zk.get_children('/mariadb/primary'))
primary = children[0]

print(f"Primary node: {primary}")
# Output: Primary node: node-0000000001
```

Each MariaDB node runs a local agent that creates a sequential znode in `/mariadb/primary/`. The first node in the sequence becomes the primary, and all nodes agree on this order because ZooKeeper guarantees consistency. The primary node gets traffic while the others stand by as read replicas.

### Watchers: Staying in sync

What happens when the primary node dies? The other nodes need to know immediately so they can promote a new primary.

ZooKeeper provides watchers, which are one-time notifications that fire when a znode changes. Each node sets a watch on `/mariadb/primary/`, and when nodes join or leave, those watchers fire.

Here's how it works:

```python theme={null}
from kazoo.client import KazooClient

zk = KazooClient(hosts='zookeeper:2181')
zk.start()

def primary_changed(children):
    """Called when the primary list changes"""
    if not children:
        print("No primary available!")
        return

    sorted_children = sorted(children)
    new_primary = sorted_children[0]

    print(f"Primary changed to: {new_primary}")

    # Reconfigure the local proxy to point to new primary
    configure_proxy(new_primary)

# Set up a watch on the primary path
@zk.ChildrenWatch('/mariadb/primary')
def watch_children(children):
    primary_changed(children)
    return True  # Keep watching

def configure_proxy(primary_node):
    """Update local proxy configuration"""
    # This would update iptables or HAProxy configuration
    # to redirect database traffic to the new primary
    pass
```

When the primary node dies, its znode disappears (we'll explain why in a moment), and all watching nodes get notified within seconds. They read the updated list, identify the new primary, and reconfigure their local proxies.

Your application keeps sending queries to the same connection string while we've redirected traffic to a new primary behind the scenes. This works seamlessly because Galera is multi-master, meaning every node can accept writes at any time. The failover happens without you noticing.

### Ephemeral nodes: Automatic cleanup

The third piece is ephemeral nodes, which are znodes tied to a client session that vanish when the client disconnects.

This solves the hardest problem in distributed systems: detecting failures. Did a node die, or did it temporarily lose network connectivity? ZooKeeper handles this through session timeouts.

Here's what an ephemeral node looks like:

```python theme={null}
from kazoo.client import KazooClient

zk = KazooClient(
    hosts='zookeeper:2181',
    timeout=10.0  # Session timeout in seconds
)
zk.start()

# Create an ephemeral sequential znode
path = zk.create(
    '/mariadb/primary/node-',
    b'node-hostname',
    sequence=True,
    ephemeral=True  # Disappears when session ends
)

# Keep session alive by sending heartbeats
while True:
    if not is_mariadb_healthy():
        # MariaDB is down, close connection
        # This removes our ephemeral node
        zk.stop()
        break

    time.sleep(10)

def is_mariadb_healthy():
    """Check if local MariaDB is responding"""
    try:
        # Run: mysql -e "SELECT 1"
        result = subprocess.run(
            ['mysql', '-e', 'SELECT 1'],
            capture_output=True,
            timeout=5
        )
        return result.returncode == 0
    except:
        return False
```

Each node runs an agent that monitors the local MariaDB instance every 10 seconds. If MariaDB responds, the agent keeps the ZooKeeper session alive. If MariaDB stops responding, the agent drops the session, the ephemeral node disappears, and the other nodes see the change through their watchers and reconfigure.

This handles different failure scenarios:

* **MariaDB crashes**: Health check fails, agent drops its session, and the node is removed
* **Network partition**: The node can't reach ZooKeeper, session timeout expires, and the node is removed
* **Entire VM dies**: Session times out and the ephemeral node vanishes

We don't need to distinguish between failure types because any problem that prevents the node from maintaining its ZooKeeper session triggers automatic removal. The cluster heals itself.

## Beyond databases: Worker management

We use the same ZooKeeper pattern for worker processes. Many applications run background workers to process queues, send emails, or generate reports, and while you want workers for high availability, running the same worker on multiple nodes creates problems.

Queue systems like RabbitMQ can coordinate multiple workers so each job gets processed once, but that's extra complexity. What if you could run the worker on one node at a time with automatic failover?

Same ZooKeeper pattern. Each node's agent creates an ephemeral sequential znode in `/workers/email-sender/`, the first node in sequence starts the worker, and the others wait. When that node dies, its ephemeral node disappears, the next node in sequence sees the change, and it starts its worker.

You get high availability without building distributed coordination into your worker code. The worker runs somewhere, and if that node dies, it runs somewhere else. Your application doesn't need to know which node.

## The takeaway

ZooKeeper provides a single source of truth for cluster coordination through three features that work together:

* **Sequences** establish consistent ordering across all nodes
* **Watchers** enable immediate coordination when cluster state changes
* **Ephemeral nodes** provide automatic cleanup when nodes become unavailable

This lets us provide the stable database interface your application expects while running a highly available three-node cluster underneath. You get both reliability and simplicity.

Your application connects to a database endpoint and it behaves as expected. When cluster state changes, a node becomes unavailable, or we perform maintenance, the infrastructure layer handles coordination. This is the robustness principle in practice: we accept applications with standard expectations and provide dependable service.

Infrastructure complexity belongs in the infrastructure layer. You build features for your users, and we'll handle distributed systems coordination.
