> ## Documentation Index
> Fetch the complete documentation index at: https://developer.upsun.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Efficient code analysis for LLMs

> Learn how Whatsun generates concise codebase summaries to improve the performance and accuracy of AI features.


export const PostMeta = ({data = {}}) => {
  const {author, date} = data;
  const authors = Array.isArray(author) ? author : author ? [author] : [];
  const resolveAuthor = slug => {
    const entry = AUTHOR_MAP[slug] || ({});
    const name = entry.name || slug;
    const github = entry.github || null;
    const linkedin = entry.linkedin || null;
    const url = github ? `https://github.com/${github}` : linkedin || null;
    const avatarUrl = github ? `https://github.com/${github}.png?size=64` : null;
    return {
      name,
      url,
      avatarUrl
    };
  };
  const formattedDate = date ? new Date(date).toLocaleDateString('en-US', {
    year: 'numeric',
    month: 'long',
    day: 'numeric'
  }) : null;
  if (authors.length === 0 && !formattedDate) return null;
  const AUTHOR_MAP = {
    "aaron-collier": {
      "name": "Aaron Collier"
    },
    "aaron-dudenhofer": {
      "name": "Aaron Dudenhofer"
    },
    "aaron-porter": {
      "name": "Aaron Porter"
    },
    "adriaan-odendaal": {
      "name": "Adriaan Odendaal"
    },
    "ajmal": {
      "name": "Ajmal Siddiqui"
    },
    "akalipetis": {
      "name": "Antonis Kalipetis"
    },
    "alexander-varwijk": {
      "name": "Alexander Varwijk"
    },
    "alicia-bevilacqua": {
      "name": "Alicia Bevilacqua"
    },
    "amelie-deguerry": {
      "name": "Amelie Deguerry"
    },
    "anacidre": {
      "name": "Ana Cidre",
      "linkedin": "https://www.linkedin.com/in/ana-cidre"
    },
    "andoni": {
      "name": "Andoni Auzmendi"
    },
    "andrei-taranu": {
      "name": "Andrei (Alex) Taranu",
      "linkedin": "https://www.linkedin.com/in/andrei-alex-taranu/"
    },
    "andrew-baxter": {
      "name": "Andrew Baxter"
    },
    "andrew-melck": {
      "name": "Andrew Melck"
    },
    "antoine-crochet-damais": {
      "name": "Antoine Crochet Damais"
    },
    "augustin-delaporte": {
      "name": "Augustin Delaporte",
      "linkedin": "https://www.linkedin.com/in/augustindelaporte/"
    },
    "branislav-bujisic": {
      "name": "Branislav Bujisic"
    },
    "carl-smith": {
      "name": "Carl Smith"
    },
    "caroline-leroy": {
      "name": "Caroline Leroy"
    },
    "cati-mayer": {
      "name": "Cati Mayer"
    },
    "catplat": {
      "name": "C Trinkwon"
    },
    "ceelolulu": {
      "name": "Celeste van der Watt"
    },
    "chadwcarlson": {
      "name": "Chad Carlson",
      "github": "chadwcarlson",
      "linkedin": "https://www.linkedin.com/in/chadwcarlson"
    },
    "chris-ward": {
      "name": "Chris Ward"
    },
    "chris-yates": {
      "name": "Chris Yates"
    },
    "christian-sieber": {
      "name": "Christian Sieber"
    },
    "christopher-lockheardt": {
      "name": "Christopher Lockheardt"
    },
    "christopher-skene": {
      "name": "Christopher Skene"
    },
    "chuck-morgan": {
      "name": "Chuck Morgan"
    },
    "corey-dockendorf": {
      "name": "Corey Dockendorf"
    },
    "crell": {
      "name": "Crell"
    },
    "damz": {
      "name": "Damz"
    },
    "dan-morrison": {
      "name": "Dan Morrison"
    },
    "davidbonachera": {
      "name": "David Bonachera",
      "github": "davidbonachera",
      "linkedin": "https://www.linkedin.com/in/davidbonachera"
    },
    "dereliahmet1": {
      "name": "Ahmet Faruk Dereli"
    },
    "devicezero": {
      "name": "Jonas Kröger",
      "github": "devicezero",
      "linkedin": "https://www.linkedin.com/in/jonaskroeger/"
    },
    "doug-goldberg": {
      "name": "Doug Goldberg"
    },
    "duncan-naves": {
      "name": "Duncan Naves",
      "github": "duncannaves",
      "linkedin": "https://www.linkedin.com/in/duncan-naves-a94423aa"
    },
    "erika-bustamante": {
      "name": "Erika Bustamante"
    },
    "fabpot": {
      "name": "Fabien Potencier"
    },
    "flovntp": {
      "name": "Florent Huck",
      "github": "flovntp",
      "linkedin": "https://www.linkedin.com/in/florenthuck"
    },
    "fred-plais": {
      "name": "Fred Plais"
    },
    "gauthier-garnier": {
      "name": "Gauthier Garnier"
    },
    "gilzow": {
      "name": "Paul Gilzow"
    },
    "gmoigneu": {
      "name": "Guillaume Moigneu",
      "github": "gmoigneu",
      "linkedin": "https://www.linkedin.com/in/guillaumemoigneu/"
    },
    "gregqualls": {
      "name": "Greg Qualls"
    },
    "guguss": {
      "name": "Augustin Delaporte"
    },
    "haylee-millar": {
      "name": "Haylee Millar"
    },
    "ivana-kotur": {
      "name": "Ivana Kotur"
    },
    "jackrabbithanna": {
      "name": "Mark Hanna"
    },
    "jared-wright": {
      "name": "Jared Wright",
      "github": "jww-sh",
      "linkedin": "https://www.linkedin.com/in/jaredwaynewright"
    },
    "jessica-orozco": {
      "name": "Jessica Orozco"
    },
    "joey-stanford": {
      "name": "Joey Stanford"
    },
    "john-grubb": {
      "name": "John Grubb"
    },
    "jonas-kruger": {
      "name": "Jonas Kruger"
    },
    "kathryn-frazer": {
      "name": "Kathryn Frazer"
    },
    "kemiojo": {
      "name": "Kemi Elizabeth Ojogbede"
    },
    "kieronsambrook-smith": {
      "name": "Kieronsambrook Smith"
    },
    "laurent-arnoud": {
      "name": "Laurent Arnoud",
      "linkedin": "https://www.linkedin.com/in/laurent-arnoud-861b44121/"
    },
    "letoya-boyne": {
      "name": "Letoya Boyne"
    },
    "lolautruche": {
      "name": "Jérôme Vieilledent"
    },
    "lyly-lepinay": {
      "name": "Lyly Lepinay"
    },
    "manauwar-alam": {
      "name": "Manauwar Alam"
    },
    "marc-antoine-porri": {
      "name": "Marc Antoine Porri"
    },
    "maria-antinkaapo": {
      "name": "Maria Antinkaapo"
    },
    "maria-de-anton": {
      "name": "Maria De Anton"
    },
    "mark-dorison": {
      "name": "Mark Dorison"
    },
    "markus-hausammann": {
      "name": "Markus Hausammann"
    },
    "mary-thomas": {
      "name": "Mary Thomas"
    },
    "mathias-bolt-lesniak": {
      "name": "Mathias Bolt Lesniak"
    },
    "mathieu-strauch": {
      "name": "Mathieu Strauch"
    },
    "matthias-van-woensel": {
      "name": "Matthias Van Woensel",
      "linkedin": "https://www.linkedin.com/in/matthias-van-woensel-267a069"
    },
    "maz-mohammadi": {
      "name": "Maz Mohammadi"
    },
    "michael-sharp": {
      "name": "Michael Sharp"
    },
    "mupsi": {
      "name": "Marine Gandy"
    },
    "natalie-harper": {
      "name": "Natalie Harper"
    },
    "ngommenginger": {
      "name": "Nicolas Gommenginger",
      "linkedin": "https://www.linkedin.com/in/nicolas-gommenginger"
    },
    "nicholas-bennison": {
      "name": "Nicholas Bennison"
    },
    "nicholas-vahalik": {
      "name": "Nicholas Vahalik"
    },
    "nick-hardiman": {
      "name": "Nick Hardiman"
    },
    "nickanderegg": {
      "name": "Nickanderegg"
    },
    "nicolas-grekas": {
      "name": "Nicolas Grekas",
      "github": "nicolas-grekas",
      "linkedin": "https://www.linkedin.com/in/nicolasgrekas/"
    },
    "niti-malwade": {
      "name": "Niti Malwade"
    },
    "opensocialteam": {
      "name": "Opensocialteam"
    },
    "ori-pekelman": {
      "name": "Ori Pekelman"
    },
    "otavio-santana": {
      "name": "Otavio Santana"
    },
    "palwandi": {
      "name": "Pawan Alwandi",
      "github": "pawpy",
      "linkedin": "https://www.linkedin.com/in/pawanalwandi"
    },
    "patrick-boest": {
      "name": "Patrick Boest"
    },
    "patrick-dawkins": {
      "name": "Patrick Dawkins",
      "github": "pjcdawkins",
      "linkedin": "https://www.linkedin.com/in/patrickdawkins"
    },
    "patrick-klima": {
      "name": "Patrick Klima"
    },
    "pjcdawkins": {
      "name": "Pjcdawkins"
    },
    "prineet-kaurbhurji": {
      "name": "Prineet Kaurbhurji"
    },
    "quentin-sinig": {
      "name": "Quentin Sinig"
    },
    "ralt": {
      "name": "Florian Margaine",
      "github": "ralt",
      "linkedin": "https://www.linkedin.com/in/florian-margaine-43971136"
    },
    "ramanathanramakrishnamurthy": {
      "name": "Ramanathanramakrishnamurthy"
    },
    "remi-lejeune": {
      "name": "Rémi Lejeune"
    },
    "ribel": {
      "name": "Taras Kruts"
    },
    "robert-douglass": {
      "name": "Robert Douglass"
    },
    "rudy-weber": {
      "name": "Rudy Weber"
    },
    "ryan-hicks": {
      "name": "Ryan Hicks"
    },
    "sabri-helal": {
      "name": "Sabri Helal"
    },
    "savannah-bergeron": {
      "name": "Savannah Bergeron"
    },
    "shannon-vettes": {
      "name": "Shannon Vettes"
    },
    "shawn-ogasawara": {
      "name": "Shawn Ogasawara",
      "linkedin": "https://www.linkedin.com/in/shawn-ogasawara-83a9a0/"
    },
    "shawna-spoor": {
      "name": "Shawna Spoor"
    },
    "shedrack-akintayo": {
      "name": "Shedrack Akintayo"
    },
    "simon-ruggier": {
      "name": "Simon Ruggier"
    },
    "sophie-van-der-kindere": {
      "name": "Sophie Van Der Kindere"
    },
    "stefanos-thampis": {
      "name": "Stefanos Thampis"
    },
    "stephen-weinberg": {
      "name": "Stephen Weinberg"
    },
    "sukhman-virk": {
      "name": "Sukhman Virk"
    },
    "sumaira-nazir": {
      "name": "Sumaira Nazir"
    },
    "sumer": {
      "name": "Sümer Cip"
    },
    "syed-raza": {
      "name": "Syed Raza"
    },
    "tamara-bacchia": {
      "name": "Tamara Bacchia"
    },
    "tara-arnold": {
      "name": "Tara Arnold"
    },
    "theosakamg": {
      "name": "Mickael Gaillard",
      "github": "theosakamg"
    },
    "thomasdiluccio": {
      "name": "Thomas di Luccio"
    },
    "tim-anderson": {
      "name": "Tim Anderson"
    },
    "tom-helmer-hansen": {
      "name": "Tom Helmer Hansen"
    },
    "tylermills": {
      "name": "Tyler Mills"
    },
    "upsun": {
      "name": "Upsun"
    },
    "veronika-tolkachova": {
      "name": "Veronika Tolkachova",
      "linkedin": "https://www.linkedin.com/in/veronika-tolkachova-169167a2"
    },
    "vince-parker": {
      "name": "Vince Parker"
    },
    "vinnie-russo": {
      "name": "Vincenzo Russo"
    },
    "vrobert78": {
      "name": "Vincent Robert",
      "github": "vrobert78",
      "linkedin": "https://www.linkedin.com/in/vincent-robert-498a883"
    },
    "yuriy-babenko": {
      "name": "Yuriy Babenko"
    },
    "yuriy-gerasimov": {
      "name": "Yuriy Gerasimov"
    }
  };
  return <div className="post-meta">
      {(authors.length > 0 || formattedDate) && <div className="post-meta-info">
          {authors.length > 0 && <div className="post-meta-authors">
              {authors.map(slug => {
    const {name, url, avatarUrl} = resolveAuthor(slug);
    const inner = <>
                    {avatarUrl && <img src={avatarUrl} alt={name} className="post-meta-avatar" />}
                    <span className="post-meta-author-name">{name}</span>
                  </>;
    return url ? <a key={slug} href={url} target="_blank" rel="noopener noreferrer" className="post-meta-author">
                    {inner}
                  </a> : <span key={slug} className="post-meta-author">{inner}</span>;
  })}
            </div>}
          {authors.length > 0 && formattedDate && <span className="post-meta-separator" aria-hidden="true">·</span>}
          {formattedDate && <span className="post-meta-date">{formattedDate}</span>}
        </div>}
    </div>;
};

<PostMeta data={{ author: ["Patrick Dawkins"], date: "2025-12-18T12:00:00+00:00", image: "/images/posts/discussions/whatsun-efficient-code-analysis/whatsun.webp" }} />

AI agents excel at understanding large codebases in depth. But sometimes you don't need depth, and you don't want to wait so long.

We wanted a way to get a quick overview of a repository, so we made [Whatsun](https://github.com/upsun/whatsun). It's an open-source tool that reads the structure and dependencies in a codebase of any size, producing a very concise summary. It can be used as a CLI or a Go library. Whatsun does not itself use AI, so it's fast, predictable, and secure. But that speed is exactly why it works well with AI: it handles the quick structural analysis upfront, saving the AI from slower and more expensive processing. This helps to power our AI-assisted configuration feature, which is available through the CLI’s [`upsun init`](/posts/discussions/building-ai-feature-necessary-evals) command for local code, and which you can also try [on the web](https://config.upsun.com) for a public GitHub repository. Of course, we also used AI to help build Whatsun itself.

## The problem

At Upsun we support applications written in many different languages, built with many different tools. These diverse applications can coexist in the same repository as part of a single Upsun project, and this complexity is why we are leveraging AI to help with configuration.

Our AI context for an Upsun project started with a system prompt, a [`tree`](https://en.wikipedia.org/wiki/Tree_\(command\)) and some documentation. As we [evaluated](/posts/discussions/building-ai-feature-necessary-evals) the AI's results, our approach to the context evolved, and we found we needed more precise information that was able to vary according to the type of project.

Another approach would be to ask the LLM to fetch the context it needs, giving it tools to read anything it likes. This can work beautifully in coding agents such as Claude Code, but it did not suit our needs: it would take much longer, cost too much (requiring large reasoning models), and expose too much unnecessary information.

Our AI prompt now includes Whatsun's output, alongside conditional context retrieved automatically based on Whatsun's findings, such as framework-specific documentation.

## The digest

Whatsun produces a *digest* of a repository, in three parts:

1. **A file tree** that limits detail progressively, minimizing [tokens](https://blogs.nvidia.com/blog/ai-tokens-explained/). The tree respects `.gitignore`, which greatly improves performance as it avoids the need to traverse unnecessary directories (like `node_modules`).

2. **Reports** showing detected frameworks, build tools, and package managers. The reports are generated based on configurable rules, explained below.

3. **Selected file contents** from important files like `README.md`, `AGENTS.md`, or `docker-compose.yml`. This will also include files specific to certain findings: for example, `compose.yaml` will be included if Symfony is detected. The contents are limited to the first 2 KB.

The result is a succinct snapshot with the level of detail that we find most helpful.

For example, below is the `reports` section for our [demo project](https://github.com/upsun/demo). It shows that the project contains a Flask backend and a React frontend, managed by `uv` and `bun`.

```yaml theme={null}
reports:
    .:
        - result: bun
          ruleset: package_managers
          groups: [js]
    backend:
        - result: flask
          ruleset: frameworks
          groups: [python]
        - result: uv
          ruleset: package_managers
          groups: [python]
    frontend:
        - result: express
          metadata: {version: 4.21.2}
          ruleset: frameworks
          groups: [js]
        - result: reactjs
          metadata: {version: 18.3.1}
          ruleset: frameworks
          groups: [js]
        - result: bun
          ruleset: package_managers
          groups: [js]
```

## Declarative rules

As we were exploring what this tool could do, we wanted it to be simple to configure, but we didn't want to restrict its potential. We chose [Common Expression Language (CEL)](https://cel.dev/), which lets you write rules as configuration. Each rule is evaluated on every directory of the codebase and then may contribute a result.

Here is an example rule:

```yaml theme={null}
django:
  when: fs.depExists("python", "django")
  then: django
```

The `when` clause is a CEL expression. In this case, in each directory, it invokes a dependency manager function to check whether `django` is required as a Python dependency. The use of other web frameworks in a codebase can be harder to detect. They may be composed of various packages that may or may not indicate use of the framework, such as Symfony's flexible components and libraries. A 'framework' may even be used without any visible presence in the repository, in the case of static site generators. Or it may leave a file as a clue:

```yaml theme={null}
when: fs.fileExists("hugo.toml") || fs.fileExists("hugo.yaml") || fs.fileExists("hugo.json") || fs.fileExists(".hugo_build.lock")
then: hugo
```

The rules-based system makes this very flexible without extra Go code. In theory, other sets of rules for other kinds of analysis could be added in future or provided by library callers.

## Multilingual dependency detection

Whatsun parses package manager manifests for nine languages to get an overview of dependencies: Go, JavaScript, Python, PHP, Ruby, Rust, Java, .NET, and Elixir. Some of these, such as JavaScript and Python, each have numerous package managers. Fortunately, adding new integrations is a particularly good use case for AI, for research and generating code and tests.

## Security and privacy

Upsun requires a security and privacy review of new features, and of course AI-powered features are no exception. A Git repository stores code, and should never usually contain secrets nor personal information, but it remains a possibility. Such data would not be of any use for understanding the code, and it should not be sent to an AI vendor. Whatsun avoids this using multiple layers of protection.

Firstly, it respects developer intent: `.aiignore` and `.aiexclude` files can be used to declare what should not be analyzed. Unfortunately, this isn't a common standard yet, but the former is supported by JetBrains's Junie and the latter by Google's Gemini. As mentioned above, Whatsun also reads and respects `.gitignore` files, for both privacy and efficiency.

Secondly, if a file's content is included, Whatsun sanitizes it. It uses [Gitleaks](https://github.com/gitleaks/gitleaks) for secret detection, redacting API keys and credentials. It also redacts email addresses to remove personally identifiable information, detects and skips binary files, and strips comments to avoid leaking internal notes.

## Performance

Speed is a priority for Whatsun's user experience, scalability, and ease of development.

Whatsun caches the configured CEL rules during build, so that they do not need to be recompiled. It traverses directories in parallel, and then executes rules in parallel, ensuring directory contents are cached between steps to avoid unnecessary [`stat()`](https://en.wikipedia.org/wiki/Stat_\(system_call\)) calls. It operates on Go's `io/fs` virtual filesystem, meaning it can process a Git repository cloned in memory or on disk in the same way.

The resulting digest is designed to minimize tokens, which helps in a few ways: a faster response, less context confusion, and lower costs.

## Caveats

Whatsun is built for the 80% case: surface-level understanding, not deep analysis. It does not provide the full-file context that an AI would need to edit code. And while the rules cover quite a few cases, they would require quite a bit of maintenance to become or remain comprehensive.

## Try it

Whatsun is open-source and [available on GitHub](https://github.com/upsun/whatsun).

If you are building an AI developer tool, you might find Whatsun useful for enhancing context.

Or you can download the `whatsun` CLI, run it on your project, and see what it produces:

```shell theme={null}
go install github.com/upsun/whatsun/cmd/whatsun@latest
```

We'd be glad to hear what you think, and contributions are very welcome.
