AI code review tools are everywhere now. GitHub has Copilot reviews, there’s CodeRabbit, Qodo, Greptile, and something new every other week. If you’re on GitHub with a standard stack, you’re spoiled for choice. We’re not on GitHub. We run self-hosted GitLab, track issues in Linear, and have the kind of internal setup that makes off-the-shelf integrations difficult. So I built our own review agent. It’s been running on 25 projects, has processed around 1000 merge requests (each with ~2 reviews on average), and costs about 50 cents per review. Here’s what I learned.Documentation Index
Fetch the complete documentation index at: https://developer.upsun.com/llms.txt
Use this file to discover all available pages before exploring further.
The starting point
Six months earlier, a colleague had already built a working AI review system for our GitLab. It ran as a manual CI job: Claude Code with access to the git diff, commit log, and a prompt. Under 200 lines of code. It worked, teams used it, and it proved that AI reviews were worth doing. The limitation was scope. It could see what changed, but not the surrounding code, the rest of the codebase, or any external context. That was a deliberate choice: it was simple and fast. But I wanted to see what would happen if the agent could do what good human reviewers do: search the codebase, read related files, check the issue tracker, and understand the broader context of a change. Sean Goedecke’s blog post on code reviews was part of the inspiration.Why custom?
Our engineering stack includes our self-hosted GitLab, Linear for issue tracking, and plenty of other internal systems and tools. If you’re buying an external review tool, you need it to work with all of those, and most don’t. Building our own means the agent can obtain Linear issue details, apply review standards specific to each project, and use GitLab’s CLI, web searches and other tools to go beyond the local checkout. Each integration makes the reviews noticeably better, and each one would be difficult or impossible with an off-the-shelf tool.Vibe coding a large Python project
I have normally avoided Python. I have written scripts, but building a significant async service would not be something I would normally have attempted. Recent model advances changed this. Claude Opus 4.5 arrived in late November 2025, and with it a level of coding ability that made working in an unfamiliar language feel productive rather than frustrating. I started this project in January. More recently, Opus 4.6 improved things further. The Claude Agent SDK (released September 2025) provided the agentic tooling to make the workflow practical. In the first two weeks, the project grew to about 25K lines of application code and 22K lines of tests. It handles webhook events from GitLab and Linear, classifies requests, triggers CI pipelines, runs Claude agents with sandboxed tool access, and posts results back to GitLab comments and Linear agent sessions. There are broadly four agent task types (review, conversation, planning, and coding), each with different prompts, tool permissions and security constraints. I really like the phrase “vibe coding”, but I don’t mean it to say that there is no oversight or judgement. I made the architectural and UX decisions, read what the AI thought and wrote, and debugged when things broke. But the barrier to working in Python dropped so far that the language choice became almost entirely irrelevant (I only picked it because of the Claude Agent SDK). Some of the more complex features illustrate this well. Incremental reviews - where the agent remembers what it said on a previous review, sees only what changed since, resolves its own comments when issues are fixed, and handles force-push and rebase correctly - involve fiddly commit and diff tracking. It’s the kind of thing I’d normally spend days (or longer) getting right, which I couldn’t afford on an internal, experimental tool. With AI assistance, I focused on the design (what state to track, when to resolve, how to handle rebases) and the AI handled the implementation details. This part did actually take more than one attempt to get right, but each change took minutes instead of hours.How the agent works
The review agent is triggered by a webhook. When you open or update a merge request, the agent picks it up automatically. By the time a human reviewer comes to look at the MR, a summary and inline comments are already there. Here’s what happens during a review:- The system launches a review job (in a GitLab CI pipeline)
- It clones the repository, fetches the MR description and previous comments, and passes those to Claude in a prompt
- The agent will then search the codebase for related code and patterns using its built-in tools
- It looks up the linked issue using the Linear MCP for context on intent and requirements
- It produces structured JSON output: a summary and inline comments with exact file and line references
@agent in a comment, a fast classifier (Claude Haiku) categorizes the request: simple question, review request, coding task, etc. Simple questions get answered immediately by Haiku. Complex requests get routed to Opus via the CI pipeline. This keeps the cheap things cheap.
Incremental reviews. On subsequent pushes, the agent compares against its previous review. It knows what changed, avoids repeating itself, and resolves its own discussion threads when the author fixes an issue. This works even after force-push and rebase by tracking the relevant commit SHAs.
The numbers
| Metric | Value |
|---|---|
| Projects using the agent | 25 |
| Merge requests reviewed | 1000+ |
| Average reviews per MR | ~2 |
| Cost per review | ~$0.50 |
| Cost per MR | ~$1.00 |
| Model | Claude Opus 4.6 |