Cloning a terabyte in a minute, and the limit hiding behind it

Cloning your whole production environment, database included, takes about a minute on Upsun. It doesn’t matter whether that database is 10 GB or a terabyte. You get a preview environment with all of your production data, ready to break in whatever way you need. That speed isn’t magic. It’s a storage trick called copy-on-write, and like every good trick, it comes with a bill attached. The funny part is where the bill shows up.

How a one-minute clone works

Every container on Upsun gets its own volume on a Ceph cluster. Ceph is distributed storage: think of it as network-attached disks you can map to any machine and unmap a second later. We covered the move to it in why we moved from LVM to Ceph. When you clone an environment, Ceph doesn’t copy your data. It takes a copy-on-write snapshot. At that moment, nothing is written to disk except a bit of metadata. The clone and the original share the same underlying blocks. The copying happens later, lazily. The first time you write to a given spot on the disk, Ceph writes the new block and updates the metadata so the clone stops pointing at the parent’s version. Every untouched block keeps falling through to the parent. That first write to each spot is a touch slower, because there is metadata to update, but you never notice it in practice. That’s the whole reason a terabyte clones as fast as a gigabyte. You pay only for what you change, and you pay for it at the moment you change it.

The bill: walking the chain

Now clone the clone. Then clone that one. Each new environment points at its parent, which points at its parent, all the way back to the original. Reads are where this adds up. When you read a block that nobody in the chain has modified, Ceph asks the clone, then its parent, then its parent’s parent, until it finds the block or reaches the original. The deeper the chain, the more hops per read. Ceph draws a line here. The Linux kernel’s RBD driver caps the parent chain at 16 levels (RBD_MAX_PARENT_CHAIN_LEN). A clone can sit at most 15 levels below the original. Try to go deeper and Ceph refuses. It’s a hard limit, and a reasonable one: they’re trading a little flexibility for predictable read performance, which is the right call. Some pick an even lower line. OpenStack’s Cinder defaults rbd_max_clone_depth to 5, and breaks the chain on its own once a clone would cross it, rather than riding the kernel’s 16 all the way up. It decided 5 layers of read amplification is already enough to hurt. Same mechanism, same trade-off, two teams drawing the line in different places.

Depth, not breadth

One thing worth being clear about: this is about depth, not about how many clones you make. Give production 20 staging environments and you’re fine. Each one has exactly one parent. Give each of those 20 its own child and you’re still fine, because every child is only 2 levels deep. The limit bites only when you clone a clone of a clone of a clone, 16 times over. What matters is the length of a single ancestry line, not the size of the family.

The escape hatch, and its price

There is a way out, and Ceph calls it flattening. Flattening tells a volume to copy every block it was sharing with its parents and become standalone. No more chain, no more parent lookups, and you can start a fresh line of clones from it. You probably see the catch. Flattening copies all the data. It’s the exact slow, full copy that copy-on-write let you skip. Break the chain and that one clone is no longer instant.

Nothing is free

I can’t picture a real customer stacking 16 clones in a line. You’d have to make a development environment from a development environment from a development environment, and keep going past the point of sense. We test for it anyway, because “no customer would ever” is how you end up debugging it at 2 a.m. What I like about this limit is what it says about engineering. Copy-on-write is a genuinely good choice. It’s why the one-minute clone exists at all, and it has held up for years. But the same mechanism that makes clones instant is the one that makes deep chains slow, and that’s not a bug you can patch away. It’s the shape of the trade-off. Every technology you pick hands you its own limits along with its strengths. The job isn’t finding the option with no downsides. It’s picking the best one for the problem, and knowing exactly where it stops working.

​How a one-minute clone works

​The bill: walking the chain

​Depth, not breadth

​The escape hatch, and its price

​Nothing is free

How a one-minute clone works

The bill: walking the chain

Depth, not breadth

The escape hatch, and its price

Nothing is free