We switched from LVM to... sparse files?!

When 9 out of 10 minutes of VM evacuation time consists of waiting for LVM (Logical Volume Manager) volumes to delete, something’s clearly wrong. We fixed it with a solution so basic it sounds ridiculous: sparse files.

The problem: inertia and performance bottlenecks

Years ago, we used LVM for all container volumes, including /tmp volumes. It made sense at the time, with LVM providing snapshots, thin provisioning, and various enterprise features we thought we needed. Eventually, we migrated persistent volumes to Ceph, but /tmp volumes stayed on LVM. Not because we needed LVM’s features for temporary storage, but because we’d always done it that way. Classic inertia. The real problem revealed itself during work on failover and evacuation optimization. Since our infrastructure is read-only, upgrades mean spinning up new VMs, evacuating containers from old VMs to new ones, then shutting down the old infrastructure. When we looked at the timeline, the numbers were striking: out of 10 minutes total evacuation time, 9 minutes were spent waiting for LVM volumes to delete. That’s 90% of our time wasted on cleanup. The irony wasn’t lost on us. As it turns out, we weren’t alone in this struggle. Our friends at fly.io ran into the same LVM performance issues. They took a different approach to solving it, but they had different requirements.

The solution: sparse files

Looking at our /tmp volumes, we realized they’re genuinely temporary. No need for snapshots, no long-term persistence requirements, none of LVM’s advanced features actually mattered. We needed three things: fast creation, fast deletion, and thin provisioning so disk space only gets used when data is actually written. Sparse files deliver all of that with remarkable simplicity:

fallocate -l 8G tmp-volume.img
losetup --find --show tmp-volume.img

Two commands give you a thinly-provisioned block device ready for a filesystem. The sparse file starts as 8GB of zeros but only consumes actual disk space as you write data to it, functionally identical to thin-provisioned LVM volumes. The impact was immediate. After switching to sparse files, deletion time essentially disappeared from our evacuation metrics. It’s no longer even a measurable part of the process. But this raised an important question: what about runtime performance? Could LVM-backed volumes be significantly faster during actual use? We ran extensive filesystem benchmarks comparing both approaches with thousands of file operations, various I/O patterns, and real-world workloads. The performance differences were negligible. Both LVM and sparse files were tested on the same fast NVMe disks, and at those speeds, the choice of abstraction layer didn’t make a meaningful difference.

When this approach makes sense

This solution works well for us because we’re dealing with temporary storage that gets created and destroyed frequently. The simplicity matches the use case perfectly. That said, if you need LVM’s features like snapshots, volume resizing, or complex storage management, those capabilities still have value. But if you’re using LVM out of habit for temporary storage, it’s worth questioning whether you actually need it. Sparse files are particularly well-suited when you have temporary storage requirements, fast underlying disk I/O where the abstraction layer overhead becomes negligible, and frequent creation and deletion cycles like our regular VM evacuations during upgrades. If you’re facing similar performance bottlenecks in temporary storage management, sparse files might be worth testing in your environment. The implementation is straightforward, and the performance gains can be substantial.

Articles

​The problem: inertia and performance bottlenecks

​The solution: sparse files

​When this approach makes sense

The problem: inertia and performance bottlenecks

The solution: sparse files

When this approach makes sense