AGENTS.md or CLAUDE.md file in your repository. Every agent developer recommends it. Over 60,000 public GitHub repositories now include one. The general advice? Generate a comprehensive context file that describes your codebase, tooling, conventions, and testing patterns.
However a new study from ETH Zurich and LogicStar.ai suggests this approach doesn’t work, and might actually be making your agent worse at its job. And that something a few of us at Upsun had a gut-feeling about.
What the research actually found
The paper, “Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?” (Gloaguen et al., February 2026), ran coding agents across hundreds of real-world GitHub issues to measure what happens when you add context files. The researchers tested four coding agents (Claude Code with Sonnet 4.5, Codex with GPT-5.2 and GPT-5.1 Mini, and Qwen Code with Qwen3-30B) in three settings: no context file, an LLM-generated context file, and a developer-written context file. The results go against the conventional wisdom. LLM-generated context files, the kind you get when you run/init in your agent, reduced task success rates by about 3% on average. They also increased inference costs by over 20%. The trend held across models and across both benchmarks: SWE-Bench Lite (300 tasks from popular repositories) and the paper’s custom AgentBench (138 tasks from 12 repositories with developer-written context files).
Developer-written context files performed slightly better, improving success rates by about 4% on average. But they still increased costs by up to 19% and added more steps to every task.
Why more context makes things worse
The trace analysis in the paper shows what’s going on. When context files are present, agents explore more files, run more tests, and use more repository-specific tooling. The instructions are being followed.uv usage jumped from near-zero to 1.6 times per instance when mentioned in a context file, and repository-specific tools went from 0.05 to 2.5 average calls.
So the agents aren’t ignoring the context files. They’re following them too diligently.
GPT-5.2 and GPT-5.1 Mini both used significantly more reasoning tokens when context files were present (22% and 14% more on SWE-Bench Lite, respectively). The agents treated the additional instructions as additional constraints to satisfy, making each task harder rather than easier.
The researchers also found that stronger models don’t produce better context files. Context files generated by GPT-5.2 improved performance on SWE-Bench Lite by 2% on average, but degraded performance on AgentBench by 3%. Swapping prompts between agents (using Claude Code’s prompt to generate files for Codex, or vice versa) produced no consistent winner either. The problem isn’t the quality of the generation. It’s the approach itself.
The paper also tested whether context files help agents find relevant files faster. They don’t. Despite 100% of Sonnet 4.5-generated context files and 99% of GPT-5.2-generated ones containing codebase overviews, agents took roughly the same number of steps to reach the files that needed changing. One notable behavior: GPT-5.1 Mini actually spent extra steps searching for and re-reading context files that were already loaded into its context window. The context file created busywork that wouldn’t have existed otherwise.
The redundancy problem
LLM-generated context files mostly repeat what’s already in the repository. The README, the test configuration, the existing documentation. The agent can find and read all of these during task execution. When the researchers removed all documentation files (.md files, the docs/ folder, example code) from repositories, LLM-generated context files suddenly became helpful, improving performance by 2.7% on average.
So context files generated by /init commands are doing little more than pre-caching information the agent would discover on its own. For well-documented repositories, that pre-caching adds cost and constraints without adding value.
Developer-written context files performed better for exactly the reason you’d guess: they contained information that wasn’t already in the repository. Tooling preferences, workflow requirements, conventions that existed in developers’ heads but not in any documentation.
The right approach: start empty, build incrementally
Instead of generating a comprehensive context file upfront, start with an emptyCLAUDE.md (or AGENTS.md, depending on your tooling) and build it iteratively based on actual friction.
Step 1: Begin with nothing. Let the agent work without a context file. It can read your README, test configuration, and existing documentation on its own. For most tasks, that’s enough.
Step 2: Watch where the agent stumbles. Pay attention to recurring mistakes and wrong assumptions. Does it keep using npm when your project uses pnpm? Does it write tests with jest when you use vitest? Does it forget to run the linter before committing?
Step 3: Add one rule at a time. When you spot a pattern (not a one-off mistake, but a repeated behavior) add a specific instruction to your context file. Keep it minimal and concrete:
What belongs in a context file
Based on the research, context files are most valuable when they contain information the agent can’t discover from the repository itself. Include: things the agent can’t figure out on its own, or gets wrong repeatedly. Package manager preference (when multiple are plausible), specific test runner commands, required linting or formatting steps before commits, project-specific tools or scripts, naming conventions for branches or commits, and deployment-specific requirements. Skip: things the agent can read from existing files. Codebase overview and directory structure (your file system already provides this), language and framework details (yourpackage.json, pyproject.toml, or equivalent tells the agent everything it needs), and general coding patterns (the agent infers these from your existing code).
The finding on codebase overviews is worth calling out. Despite being the most commonly recommended section in context files, overviews didn’t help agents locate relevant files any faster. Claude Code’s own initialization prompt gets this partially right: it warns against listing components that are “easily discoverable.” But the generated files still include overviews nearly every time.
A practical test
Next time you’re about to generate a context file with/init, try this instead. Create an empty CLAUDE.md with a single comment:
What this means for teams
For teams adopting AI-assisted development, the takeaway is practical: don’t invest time crafting elaborate context files upfront. Signal-to-noise ratio matters more than comprehensiveness. A five-line context file that addresses your project’s specific quirks will outperform a 2,000-word generated overview that restates your README. There’s a related point here. Verification matters more than instruction. Instead of trying to preemptively tell the agent everything it needs to know, invest in the infrastructure that catches mistakes: test suites, linters, type checkers, CI pipelines. These give the agent immediate feedback when something goes wrong, which beats a static document telling it what to do. Treat yourCLAUDE.md like a .gitignore. It grows as you discover new edge cases, and it gets pruned when entries become irrelevant. Version-control it alongside your code so the whole team benefits from accumulated knowledge about what the agent gets wrong.
Here’s the irony: the agents themselves, when asked to generate their own instructions, produce files that make them perform worse. Sometimes the best context you can give an AI is less.
Further reading: The full paper is available at arxiv.org/abs/2602.11988. The researchers have also released their AgentBench dataset and evaluation code for teams that want to test context file strategies on their own repositories.