The problem: inertia and performance bottlenecks
Years ago, we used LVM for all container volumes, including/tmp volumes. It made sense at the time, with LVM providing snapshots, thin provisioning, and various enterprise features we thought we needed. Eventually, we migrated persistent volumes to Ceph, but /tmp volumes stayed on LVM. Not because we needed LVM’s features for temporary storage, but because we’d always done it that way. Classic inertia.
The real problem revealed itself during work on failover and evacuation optimization. Since our infrastructure is read-only, upgrades mean spinning up new VMs, evacuating containers from old VMs to new ones, then shutting down the old infrastructure. When we looked at the timeline, the numbers were striking: out of 10 minutes total evacuation time, 9 minutes were spent waiting for LVM volumes to delete. That’s 90% of our time wasted on cleanup. The irony wasn’t lost on us.
As it turns out, we weren’t alone in this struggle. Our friends at fly.io ran into the same LVM performance issues. They took a different approach to solving it, but they had different requirements.
The solution: sparse files
Looking at our/tmp volumes, we realized they’re genuinely temporary. No need for snapshots, no long-term persistence requirements, none of LVM’s advanced features actually mattered. We needed three things: fast creation, fast deletion, and thin provisioning so disk space only gets used when data is actually written.
Sparse files deliver all of that with remarkable simplicity: