Instant data cloning was a bet. AI agents are the payoff.

A bit over ten years ago, a small team inside a Drupal Commerce agency made a decision that seemed, at the time, wildly ambitious: deployments should include data. Not “data as an afterthought.” Not “here’s a script to restore a database dump.” Data as a first-class citizen of every environment, instantly cloned alongside code, services, and files in one atomic operation. That bet took two years of R&D. At the time, Docker wasn’t standard. Kubernetes didn’t exist. The competition was Heroku, and Heroku was great at deploying code, but code alone wasn’t enough. If you ran a CMS like Drupal, your content lived in the database. If you ran a Magento store, you couldn’t load-test a storefront without production-scale catalog data. The same goes for JCR-based apps like Adobe Experience Manager or Magnolia, where the entire content structure, templates, and workflows live in the repository. The team needed automated deployments that carried the full stack: applications, services, files, and data. So they built it. It wasn’t a moonshot for a future driven by AI, or some grand infrastructure vision. It was a practical response to real problems: CMS content is meaningless without its database, e-commerce performance testing is fiction without production data… The motivation was mundane. The engineering was not.

What cloning the data actually gives you

When you branch an environment on Upsun, the platform snapshots the metadata of your runtimes (frontend, backend, API server…), services (databases, message queues…), and files. It doesn’t copy bytes. That’s why cloning takes a couple of seconds whether your database is 500 MB or 500 GB. You get a fully independent preview environment (with its own URL, its own resources, its own permissions…) with zero interconnection to the source. Delete something in the clone, the original doesn’t notice. This exact same mechanism powers other Upsun key capabilities like:

Branching, which creates a new preview environment from an existing one (like production).
Backups, which capture a point-in-time snapshot that can be confidently restored.
Synchronizing, which pulls fresh data from a parent environment.

Different use cases, same copy-on-write foundation. And speed matters much more than you’d think. A clone that takes 30 minutes is a clone you avoid creating. A clone that takes 8 seconds is a clone you create without thinking about it. That difference changes behavior, and changing behavior is where infrastructure decisions compound.

Agents need the same primitives

Today, AI agents have moved from “generate a code snippet” to “take this task, figure out what to do, and do it”. That shift changed what agents need from infrastructure. A code suggestion can run in a sandbox or on a local machine. An agent that modifies database schemas, runs migrations, or load-tests an API endpoint needs a real environment with real data, real config, and real running services. The requirements look familiar: an isolated environment that mirrors production, created fast enough to be disposable, destroyed when the task is done. That’s what Upsun’s cloning gives you. An agent gets a full copy of production (data, services, config) in seconds. It works against real state, not a synthetic approximation. When it’s done, the environment disappears. This isn’t a feature built for AI. It’s a feature built ten years ago that happens to be what AI agents need today.

Reproducing bugs: the user_7 problem

My own staff user account at Upsun (user_7@example.com) has years of migration history baked into it. Every schema change, every feature flag toggle, every edge case in billing logic accumulated over time in that account’s data. User 7 triggers bugs that nobody else can reproduce, because no fresh test account carries that history. You can’t fake this with seed data. You can’t write a fixture that recreates years of organic state transitions. The only way to reproduce user 7’s bugs is to test against user 7’s actual data. With Upsun data cloning, an agent grabs a full copy of the production environment (user 7 and all), branches it, and starts testing. It can mutate data, break things, try every combination of inputs, and the original environment stays untouched. When the agent identifies the root cause, it reports back. The clone gets deleted. No production risk, no manual environment setup, no waiting for a database dump to finish importing.

Performance optimization without touching production

You can’t measure real query performance against a test database with 50 rows. You can’t profile cache hit rates without production traffic patterns reflected in the data. Performance work needs production-like conditions, and faking those conditions is a losing game. Upsun data cloning solves this cleanly. Clone the production environment, upsize the preview environment to match production resources, then point a load-testing tool at it. Run Blackfire profiling against realistic traffic. Get measurable, actionable recommendations: this query needs an index, this loop should be batched, this cache TTL is too aggressive. An agent automates this entire loop. Clone, configure, run the load test, collect profiling data, analyze results, present a ranked list of optimizations with projected impact. No one touches production. The environment gets thrown away when the analysis is done.

Isolation and data safety

Giving an agent access to production data means thinking about PII. If the agent runs on a third-party AI provider’s infrastructure, you probably don’t want customer emails and credit card tokens flowing through someone else’s model. But there’s a tension here: scrub too aggressively and you lose the exact data quirks that make cloning valuable for bug reproduction in the first place. It’s your call. You can configure sanitization rules that run automatically when data is cloned into a non-production environment, choosing which fields get anonymized and which stay intact. Some teams scrub everything sensitive. Others keep more data in tightly controlled environments where agents operate under strict access policies. The point is that the mechanism exists at the cloning step, so you make the trade-off once and it applies to every clone. Permissions matter too. Upsun supports fine-grained access control at the environment-type level. A production environment can have a completely different permission set than a staging or development environment. API tokens can be scoped so that an agent gets write access to its clone but can’t touch production. This isn’t a bolt-on; it’s how the permission model has always worked.

The payoff

Metadata-level cloning, copy-on-write storage, full-stack environment branching, configurable data sanitization. These are the building blocks of agent-ready infrastructure, and they’ve been in production for a decade. Upsun doesn’t need to bolt on agent support because the foundation was already there. It’s a funny thing about infrastructure bets. The ones that age well aren’t usually the ones that predicted the future correctly. They’re the ones that solved a real, immediate problem with enough generality that the solution turned out to be useful for problems nobody anticipated. Cloning production data in seconds was built for CMS editors and e-commerce teams. It turns out AI agents need the exact same thing. The bet paid off. Not because anyone predicted this particular use case, but because solving one hard problem well tends to stay useful longer than you’d expect.

Articles

​What cloning the data actually gives you

​Agents need the same primitives

​Reproducing bugs: the user_7 problem

​Performance optimization without touching production

​Isolation and data safety

​The payoff