Skip to main content
If you’ve been using coding agents like Claude Code, Codex, or Qwen Code, you’ve probably set up an AGENTS.md or CLAUDE.md file in your repository. Every agent developer recommends it. Over 60,000 public GitHub repositories now include one. The general advice? Generate a comprehensive context file that describes your codebase, tooling, conventions, and testing patterns. However a new study from ETH Zurich and LogicStar.ai suggests this approach doesn’t work, and might actually be making your agent worse at its job. And that something a few of us at Upsun had a gut-feeling about.

What the research actually found

The paper, “Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?” (Gloaguen et al., February 2026), ran coding agents across hundreds of real-world GitHub issues to measure what happens when you add context files. The researchers tested four coding agents (Claude Code with Sonnet 4.5, Codex with GPT-5.2 and GPT-5.1 Mini, and Qwen Code with Qwen3-30B) in three settings: no context file, an LLM-generated context file, and a developer-written context file. The results go against the conventional wisdom. LLM-generated context files, the kind you get when you run /init in your agent, reduced task success rates by about 3% on average. They also increased inference costs by over 20%. The trend held across models and across both benchmarks: SWE-Bench Lite (300 tasks from popular repositories) and the paper’s custom AgentBench (138 tasks from 12 repositories with developer-written context files). Developer-written context files performed slightly better, improving success rates by about 4% on average. But they still increased costs by up to 19% and added more steps to every task.

Why more context makes things worse

The trace analysis in the paper shows what’s going on. When context files are present, agents explore more files, run more tests, and use more repository-specific tooling. The instructions are being followed. uv usage jumped from near-zero to 1.6 times per instance when mentioned in a context file, and repository-specific tools went from 0.05 to 2.5 average calls. So the agents aren’t ignoring the context files. They’re following them too diligently. GPT-5.2 and GPT-5.1 Mini both used significantly more reasoning tokens when context files were present (22% and 14% more on SWE-Bench Lite, respectively). The agents treated the additional instructions as additional constraints to satisfy, making each task harder rather than easier. The researchers also found that stronger models don’t produce better context files. Context files generated by GPT-5.2 improved performance on SWE-Bench Lite by 2% on average, but degraded performance on AgentBench by 3%. Swapping prompts between agents (using Claude Code’s prompt to generate files for Codex, or vice versa) produced no consistent winner either. The problem isn’t the quality of the generation. It’s the approach itself. The paper also tested whether context files help agents find relevant files faster. They don’t. Despite 100% of Sonnet 4.5-generated context files and 99% of GPT-5.2-generated ones containing codebase overviews, agents took roughly the same number of steps to reach the files that needed changing. One notable behavior: GPT-5.1 Mini actually spent extra steps searching for and re-reading context files that were already loaded into its context window. The context file created busywork that wouldn’t have existed otherwise.

The redundancy problem

LLM-generated context files mostly repeat what’s already in the repository. The README, the test configuration, the existing documentation. The agent can find and read all of these during task execution. When the researchers removed all documentation files (.md files, the docs/ folder, example code) from repositories, LLM-generated context files suddenly became helpful, improving performance by 2.7% on average. So context files generated by /init commands are doing little more than pre-caching information the agent would discover on its own. For well-documented repositories, that pre-caching adds cost and constraints without adding value. Developer-written context files performed better for exactly the reason you’d guess: they contained information that wasn’t already in the repository. Tooling preferences, workflow requirements, conventions that existed in developers’ heads but not in any documentation.

The right approach: start empty, build incrementally

Instead of generating a comprehensive context file upfront, start with an empty CLAUDE.md (or AGENTS.md, depending on your tooling) and build it iteratively based on actual friction. Step 1: Begin with nothing. Let the agent work without a context file. It can read your README, test configuration, and existing documentation on its own. For most tasks, that’s enough. Step 2: Watch where the agent stumbles. Pay attention to recurring mistakes and wrong assumptions. Does it keep using npm when your project uses pnpm? Does it write tests with jest when you use vitest? Does it forget to run the linter before committing? Step 3: Add one rule at a time. When you spot a pattern (not a one-off mistake, but a repeated behavior) add a specific instruction to your context file. Keep it minimal and concrete:
# CLAUDE.md

## Package management
- Use `pnpm`, not `npm` or `yarn`
- Run `pnpm install --frozen-lockfile` for CI

## Testing
- Use `vitest` for all tests
- Run `pnpm test` before submitting changes

## Code style
- Run `pnpm lint:fix` after making changes
Step 4: Trim what doesn’t help. If you notice an instruction isn’t changing agent behavior, or is causing unnecessary steps, remove it. The paper shows that even developer-written context files add cost. Every instruction should earn its place.

What belongs in a context file

Based on the research, context files are most valuable when they contain information the agent can’t discover from the repository itself. Include: things the agent can’t figure out on its own, or gets wrong repeatedly. Package manager preference (when multiple are plausible), specific test runner commands, required linting or formatting steps before commits, project-specific tools or scripts, naming conventions for branches or commits, and deployment-specific requirements. Skip: things the agent can read from existing files. Codebase overview and directory structure (your file system already provides this), language and framework details (your package.json, pyproject.toml, or equivalent tells the agent everything it needs), and general coding patterns (the agent infers these from your existing code). The finding on codebase overviews is worth calling out. Despite being the most commonly recommended section in context files, overviews didn’t help agents locate relevant files any faster. Claude Code’s own initialization prompt gets this partially right: it warns against listing components that are “easily discoverable.” But the generated files still include overviews nearly every time.

A practical test

Next time you’re about to generate a context file with /init, try this instead. Create an empty CLAUDE.md with a single comment:
# CLAUDE.md

Work with your agent for a week. Note every time it makes a mistake you have to correct. At the end of the week, review those corrections. The ones that came up more than once? Those become your context file entries. This approach gives you a context file built from empirical evidence rather than hypothetical completeness. Every line exists because it fixed an actual problem, not because a best-practices guide said to include it.

What this means for teams

For teams adopting AI-assisted development, the takeaway is practical: don’t invest time crafting elaborate context files upfront. Signal-to-noise ratio matters more than comprehensiveness. A five-line context file that addresses your project’s specific quirks will outperform a 2,000-word generated overview that restates your README. There’s a related point here. Verification matters more than instruction. Instead of trying to preemptively tell the agent everything it needs to know, invest in the infrastructure that catches mistakes: test suites, linters, type checkers, CI pipelines. These give the agent immediate feedback when something goes wrong, which beats a static document telling it what to do. Treat your CLAUDE.md like a .gitignore. It grows as you discover new edge cases, and it gets pruned when entries become irrelevant. Version-control it alongside your code so the whole team benefits from accumulated knowledge about what the agent gets wrong. Here’s the irony: the agents themselves, when asked to generate their own instructions, produce files that make them perform worse. Sometimes the best context you can give an AI is less.
Further reading: The full paper is available at arxiv.org/abs/2602.11988. The researchers have also released their AgentBench dataset and evaluation code for teams that want to test context file strategies on their own repositories.
Last modified on April 14, 2026