> ## Documentation Index
> Fetch the complete documentation index at: https://developer.upsun.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Varnish 103: Cache Optimization with URL Normalization on Upsun

> Learn how to maximize Varnish cache efficiency through URL normalization, query string sorting, and tracking parameter removal to dramatically improve cache hit ratios and reduce backend load.

export const PostMeta = ({data = {}}) => {
  const {author, date, image} = data;
  const authors = Array.isArray(author) ? author : author ? [author] : [];
  const resolveAuthor = slug => {
    const entry = AUTHOR_MAP[slug] || ({});
    const name = entry.name || slug;
    const github = entry.github || null;
    const linkedin = entry.linkedin || null;
    const url = github ? `https://github.com/${github}` : linkedin || null;
    const avatarUrl = github ? `https://github.com/${github}.png?size=64` : null;
    return {
      name,
      url,
      avatarUrl
    };
  };
  const formattedDate = date ? new Date(date).toLocaleDateString('en-US', {
    year: 'numeric',
    month: 'long',
    day: 'numeric'
  }) : null;
  if (!image && authors.length === 0 && !formattedDate) return null;
  const AUTHOR_MAP = {
    "aaron-collier": {
      "name": "Aaron Collier"
    },
    "aaron-dudenhofer": {
      "name": "Aaron Dudenhofer"
    },
    "aaron-porter": {
      "name": "Aaron Porter"
    },
    "adriaan-odendaal": {
      "name": "Adriaan Odendaal"
    },
    "ajmal": {
      "name": "Ajmal Siddiqui"
    },
    "akalipetis": {
      "name": "Antonis Kalipetis"
    },
    "alexander-varwijk": {
      "name": "Alexander Varwijk"
    },
    "alicia-bevilacqua": {
      "name": "Alicia Bevilacqua"
    },
    "amelie-deguerry": {
      "name": "Amelie Deguerry"
    },
    "anacidre": {
      "name": "Ana Cidre",
      "linkedin": "https://www.linkedin.com/in/ana-cidre"
    },
    "andoni": {
      "name": "Andoni Auzmendi"
    },
    "andrei-taranu": {
      "name": "Andrei (Alex) Taranu",
      "linkedin": "https://www.linkedin.com/in/andrei-alex-taranu/"
    },
    "andrew-baxter": {
      "name": "Andrew Baxter"
    },
    "andrew-melck": {
      "name": "Andrew Melck"
    },
    "antoine-crochet-damais": {
      "name": "Antoine Crochet Damais"
    },
    "augustin-delaporte": {
      "name": "Augustin Delaporte",
      "linkedin": "https://www.linkedin.com/in/augustindelaporte/"
    },
    "branislav-bujisic": {
      "name": "Branislav Bujisic"
    },
    "carl-smith": {
      "name": "Carl Smith"
    },
    "caroline-leroy": {
      "name": "Caroline Leroy"
    },
    "cati-mayer": {
      "name": "Cati Mayer"
    },
    "catplat": {
      "name": "C Trinkwon"
    },
    "ceelolulu": {
      "name": "Celeste van der Watt"
    },
    "chadwcarlson": {
      "name": "Chad Carlson",
      "github": "chadwcarlson",
      "linkedin": "https://www.linkedin.com/in/chadwcarlson"
    },
    "chris-ward": {
      "name": "Chris Ward"
    },
    "chris-yates": {
      "name": "Chris Yates"
    },
    "christian-sieber": {
      "name": "Christian Sieber"
    },
    "christopher-lockheardt": {
      "name": "Christopher Lockheardt"
    },
    "christopher-skene": {
      "name": "Christopher Skene"
    },
    "chuck-morgan": {
      "name": "Chuck Morgan"
    },
    "corey-dockendorf": {
      "name": "Corey Dockendorf"
    },
    "crell": {
      "name": "Crell"
    },
    "damz": {
      "name": "Damz"
    },
    "dan-morrison": {
      "name": "Dan Morrison"
    },
    "davidbonachera": {
      "name": "David Bonachera",
      "github": "davidbonachera",
      "linkedin": "https://www.linkedin.com/in/davidbonachera"
    },
    "dereliahmet1": {
      "name": "Ahmet Faruk Dereli"
    },
    "devicezero": {
      "name": "Jonas Kröger",
      "github": "devicezero",
      "linkedin": "https://www.linkedin.com/in/jonaskroeger/"
    },
    "doug-goldberg": {
      "name": "Doug Goldberg"
    },
    "duncan-naves": {
      "name": "Duncan Naves",
      "github": "duncannaves",
      "linkedin": "https://www.linkedin.com/in/duncan-naves-a94423aa"
    },
    "erika-bustamante": {
      "name": "Erika Bustamante"
    },
    "fabpot": {
      "name": "Fabien Potencier"
    },
    "flovntp": {
      "name": "Florent Huck",
      "github": "flovntp",
      "linkedin": "https://www.linkedin.com/in/florenthuck"
    },
    "fred-plais": {
      "name": "Fred Plais"
    },
    "gauthier-garnier": {
      "name": "Gauthier Garnier"
    },
    "gilzow": {
      "name": "Paul Gilzow"
    },
    "gmoigneu": {
      "name": "Guillaume Moigneu",
      "github": "gmoigneu",
      "linkedin": "https://www.linkedin.com/in/guillaumemoigneu/"
    },
    "gregqualls": {
      "name": "Greg Qualls"
    },
    "guguss": {
      "name": "Augustin Delaporte"
    },
    "haylee-millar": {
      "name": "Haylee Millar"
    },
    "ivana-kotur": {
      "name": "Ivana Kotur"
    },
    "jackrabbithanna": {
      "name": "Mark Hanna"
    },
    "jared-wright": {
      "name": "Jared Wright",
      "github": "jww-sh",
      "linkedin": "https://www.linkedin.com/in/jaredwaynewright"
    },
    "jessica-orozco": {
      "name": "Jessica Orozco"
    },
    "joey-stanford": {
      "name": "Joey Stanford"
    },
    "john-grubb": {
      "name": "John Grubb"
    },
    "jonas-kruger": {
      "name": "Jonas Kruger"
    },
    "kathryn-frazer": {
      "name": "Kathryn Frazer"
    },
    "kemiojo": {
      "name": "Kemi Elizabeth Ojogbede"
    },
    "kieronsambrook-smith": {
      "name": "Kieronsambrook Smith"
    },
    "laurent-arnoud": {
      "name": "Laurent Arnoud"
    },
    "letoya-boyne": {
      "name": "Letoya Boyne"
    },
    "lolautruche": {
      "name": "Jérôme Vieilledent"
    },
    "lyly-lepinay": {
      "name": "Lyly Lepinay"
    },
    "manauwar-alam": {
      "name": "Manauwar Alam"
    },
    "marc-antoine-porri": {
      "name": "Marc Antoine Porri"
    },
    "maria-antinkaapo": {
      "name": "Maria Antinkaapo"
    },
    "maria-de-anton": {
      "name": "Maria De Anton"
    },
    "mark-dorison": {
      "name": "Mark Dorison"
    },
    "markus-hausammann": {
      "name": "Markus Hausammann"
    },
    "mary-thomas": {
      "name": "Mary Thomas"
    },
    "mathias-bolt-lesniak": {
      "name": "Mathias Bolt Lesniak"
    },
    "mathieu-strauch": {
      "name": "Mathieu Strauch"
    },
    "matthias-van-woensel": {
      "name": "Matthias Van Woensel",
      "linkedin": "https://www.linkedin.com/in/matthias-van-woensel-267a069"
    },
    "michael-sharp": {
      "name": "Michael Sharp"
    },
    "mupsi": {
      "name": "Marine Gandy"
    },
    "natalie-harper": {
      "name": "Natalie Harper"
    },
    "ngommenginger": {
      "name": "Nicolas Gommenginger",
      "linkedin": "https://www.linkedin.com/in/nicolas-gommenginger"
    },
    "nicholas-bennison": {
      "name": "Nicholas Bennison"
    },
    "nicholas-vahalik": {
      "name": "Nicholas Vahalik"
    },
    "nick-hardiman": {
      "name": "Nick Hardiman"
    },
    "nickanderegg": {
      "name": "Nickanderegg"
    },
    "nicolas-grekas": {
      "name": "Nicolas Grekas",
      "github": "nicolas-grekas",
      "linkedin": "https://www.linkedin.com/in/nicolasgrekas/"
    },
    "niti-malwade": {
      "name": "Niti Malwade"
    },
    "opensocialteam": {
      "name": "Opensocialteam"
    },
    "ori-pekelman": {
      "name": "Ori Pekelman"
    },
    "otavio-santana": {
      "name": "Otavio Santana"
    },
    "palwandi": {
      "name": "Pawan Alwandi",
      "github": "pawpy",
      "linkedin": "https://www.linkedin.com/in/pawanalwandi"
    },
    "patrick-boest": {
      "name": "Patrick Boest"
    },
    "patrick-dawkins": {
      "name": "Patrick Dawkins",
      "github": "pjcdawkins",
      "linkedin": "https://www.linkedin.com/in/patrickdawkins"
    },
    "patrick-klima": {
      "name": "Patrick Klima"
    },
    "pjcdawkins": {
      "name": "Pjcdawkins"
    },
    "prineet-kaurbhurji": {
      "name": "Prineet Kaurbhurji"
    },
    "quentin-sinig": {
      "name": "Quentin Sinig"
    },
    "ralt": {
      "name": "Florian Margaine",
      "github": "ralt",
      "linkedin": "https://www.linkedin.com/in/florian-margaine-43971136"
    },
    "ramanathanramakrishnamurthy": {
      "name": "Ramanathanramakrishnamurthy"
    },
    "remi-lejeune": {
      "name": "Rémi Lejeune"
    },
    "ribel": {
      "name": "Taras Kruts"
    },
    "robert-douglass": {
      "name": "Robert Douglass"
    },
    "rudy-weber": {
      "name": "Rudy Weber"
    },
    "ryan-hicks": {
      "name": "Ryan Hicks"
    },
    "sabri-helal": {
      "name": "Sabri Helal"
    },
    "savannah-bergeron": {
      "name": "Savannah Bergeron"
    },
    "shannon-vettes": {
      "name": "Shannon Vettes"
    },
    "shawn-ogasawara": {
      "name": "Shawn Ogasawara",
      "linkedin": "https://www.linkedin.com/in/shawn-ogasawara-83a9a0/"
    },
    "shawna-spoor": {
      "name": "Shawna Spoor"
    },
    "shedrack-akintayo": {
      "name": "Shedrack Akintayo"
    },
    "simon-ruggier": {
      "name": "Simon Ruggier"
    },
    "sophie-van-der-kindere": {
      "name": "Sophie Van Der Kindere"
    },
    "stefanos-thampis": {
      "name": "Stefanos Thampis"
    },
    "stephen-weinberg": {
      "name": "Stephen Weinberg"
    },
    "sukhman-virk": {
      "name": "Sukhman Virk"
    },
    "sumaira-nazir": {
      "name": "Sumaira Nazir"
    },
    "sumer": {
      "name": "Sümer Cip"
    },
    "syed-raza": {
      "name": "Syed Raza"
    },
    "tamara-bacchia": {
      "name": "Tamara Bacchia"
    },
    "tara-arnold": {
      "name": "Tara Arnold"
    },
    "theosakamg": {
      "name": "Mickael Gaillard",
      "github": "theosakamg"
    },
    "thomasdiluccio": {
      "name": "Thomas di Luccio"
    },
    "tim-anderson": {
      "name": "Tim Anderson"
    },
    "tom-helmer-hansen": {
      "name": "Tom Helmer Hansen"
    },
    "tylermills": {
      "name": "Tyler Mills"
    },
    "upsun": {
      "name": "Upsun"
    },
    "veronika-tolkachova": {
      "name": "Veronika Tolkachova",
      "linkedin": "https://www.linkedin.com/in/veronika-tolkachova-169167a2"
    },
    "vince-parker": {
      "name": "Vince Parker"
    },
    "vinnie-russo": {
      "name": "Vincenzo Russo"
    },
    "vrobert78": {
      "name": "Vincent Robert",
      "github": "vrobert78",
      "linkedin": "https://www.linkedin.com/in/vincent-robert-498a883"
    },
    "yuriy-babenko": {
      "name": "Yuriy Babenko"
    },
    "yuriy-gerasimov": {
      "name": "Yuriy Gerasimov"
    }
  };
  return <div className="post-meta">
      {(authors.length > 0 || formattedDate) && <div className="post-meta-info">
          {authors.length > 0 && <div className="post-meta-authors">
              {authors.map(slug => {
    const {name, url, avatarUrl} = resolveAuthor(slug);
    const inner = <>
                    {avatarUrl && <img src={avatarUrl} alt={name} className="post-meta-avatar" />}
                    <span className="post-meta-author-name">{name}</span>
                  </>;
    return url ? <a key={slug} href={url} target="_blank" rel="noopener noreferrer" className="post-meta-author">
                    {inner}
                  </a> : <span key={slug} className="post-meta-author">{inner}</span>;
  })}
            </div>}
          {authors.length > 0 && formattedDate && <span className="post-meta-separator" aria-hidden="true">·</span>}
          {formattedDate && <span className="post-meta-date">{formattedDate}</span>}
        </div>}
      {image && <img src={image} alt="" className="post-meta-image" aria-hidden="true" />}
    </div>;
};

<PostMeta data={{ author: ["jared-wright"], date: "2025-10-31T10:00:00+00:00", image: "/images/posts/hands-on/varnish-103-cache-optimization-with-url-normalization-on-upsun/varnish-103.webp" }} />

**Cache efficiency can make or break your application's performance.** While Varnish is incredibly powerful at caching, small URL variations can fragment your cache and dramatically reduce hit ratios. A single page with tracking parameters, query string variations, or trailing punctuation can create dozens of duplicate cache entries, each consuming memory and forcing unnecessary backend requests. The technniques covered in this article are essential for anyone looking serve traffic at scale.

## The Cache Fragmentation Problem

Consider these URLs that all represent the exact same content:

```
https://example.com/products
https://example.com/products?
https://example.com/products?utm_source=twitter&utm_campaign=spring
https://example.com/products?utm_campaign=spring&utm_source=twitter
https://example.com/products/?
```

Without URL normalization, Varnish creates **five separate cache entries** for identical content. For a high-traffic site, this pattern repeated across thousands of pages results in:

* **Wasted memory**: Storing duplicate content
* **Lower cache hit ratio**: More requests miss cache and hit backend
* **Higher backend load**: Unnecessary duplicate requests to application servers
* **Slower response times**: More requests waiting for backend processing

## URL Normalization Strategies

On Upsun, [Varnish Cache](https://docs.upsun.com/add-services/varnish.html) combined with intelligent URL normalization VCL can dramatically improve cache efficiency. Let's explore proven techniques that consolidate cache entries and maximize hit ratios.

## Removing Empty Query Strings

One of the simplest yet most effective optimizations is removing trailing question marks from URLs.

```vcl {filename=".upsun/config.vcl"} theme={null}
sub vcl_recv {
...
    # Remove empty query string parameters
    # e.g.: www.example.com/index.html?
    if (req.url ~ "\?$") {
        set req.url = regsub(req.url, "\?$", "");
    }
...
}
```

### Why This Matters

Empty query strings often appear when:

* Users manually type URLs with trailing `?`
* Applications generate links with optional parameters that aren't always present
* URL builders concatenate parameters conditionally, leaving trailing `?` when none apply

Consider these two URLs:

* `https://example.com/products/widget`
* `https://example.com/products/widget?`

Without normalization, Varnish treats these as completely different URLs, creating separate cache entries for identical content.

### How It Works

1. **Pattern Detection**: The condition `if (req.url ~ "\?$")` uses regex matching to detect URLs ending with a question mark:
   * `~`: Regex match operator
   * `\?`: Escaped question mark (literal `?` character)
   * `$`: End-of-string anchor

2. **URL Rewriting**: The `regsub()` function performs string substitution:
   * **First parameter** (`req.url`): The string to modify
   * **Second parameter** (`"\?$"`): The regex pattern to find (trailing `?`)
   * **Third parameter** (`""`): The replacement string (empty, effectively removing the `?`)

3. **In-Place Modification**: `set req.url = ...` updates the request URL before cache lookup or backend request.

### The Impact on Caching

**Without normalization:**

```
Request 1: /page?     → Cache MISS → Backend request → Cache entry A
Request 2: /page      → Cache MISS → Backend request → Cache entry B
Request 3: /page?     → Cache HIT  → Served from entry A
Request 4: /page      → Cache HIT  → Served from entry B
```

Cache hit ratio: 50% (2 hits, 2 misses)

**With normalization:**

```
Request 1: /page? → normalized to /page → Cache MISS → Backend request → Cache entry
Request 2: /page  → Cache HIT  → Served from cache
Request 3: /page? → normalized to /page → Cache HIT  → Served from cache
Request 4: /page  → Cache HIT  → Served from cache
```

Cache hit ratio: 75% (3 hits, 1 miss)

## Query String Sorting for Cache Consistency

URLs with identical parameters in different orders should result in the same cached response, but by default they don't.

```vcl {filename=".upsun/config.vcl"} theme={null}
sub vcl_recv {
...
    # Sorts query string parameters alphabetically for cache normalization purposes, only when there are multiple parameters
    if (req.url ~ "\?.+&.+") {
        set req.url = std.querysort(req.url);
    }
...
}
```

### The Query String Problem

Consider these URLs that represent the same content:

* `https://example.com/search?category=books&sort=price&color=blue`
* `https://example.com/search?color=blue&category=books&sort=price`
* `https://example.com/search?sort=price&color=blue&category=books`

Without query string sorting, Varnish creates three separate cache entries for identical content, drastically reducing cache efficiency.

### How It Works

1. **Multi-Parameter Detection**: The regex `\?.+&.+` checks if the URL has multiple query parameters:
   * `\?`: Escaped question mark (start of query string)
   * `.+`: One or more characters (first parameter)
   * `&`: Ampersand separator (indicates multiple parameters)
   * `.+`: One or more characters (second/additional parameters)

2. **Conditional Sorting**: The `if` condition ensures we only call `std.querysort()` when needed:
   * Single-parameter URLs like `/page?id=123` skip sorting (no `&` present)
   * Multi-parameter URLs get sorted
   * This minor optimization avoids unnecessary function calls

3. **Alphabetical Sorting**: The `std.querysort()` function (from the Varnish standard library) sorts parameters alphabetically by key:
   * `/page?z=1&a=2&m=3` becomes `/page?a=2&m=3&z=1`
   * Maintains parameter values unchanged
   * Creates consistent cache keys regardless of parameter order

### Why Skip Single Parameters?

The condition `\?.+&.+` specifically checks for the `&` character, which only appears when multiple parameters exist. This is a small VCL optimization:

```vcl {filename=".upsun/config.vcl"} theme={null}
    # Without the multi-parameter check
    set req.url = std.querysort(req.url);  # Called on every request with query string

    # With the multi-parameter check (recommended)
    if (req.url ~ "\?.+&.+") {
        set req.url = std.querysort(req.url);  # Only called when sorting is needed
    }
```

For URLs like `/page?id=123` (single parameter), there's nothing to sort, so we skip the function call entirely.

### Real-World Impact

Query parameter ordering is especially problematic with:

**E-commerce filters:**

```
/products?category=electronics&brand=sony&price_max=1000
/products?price_max=1000&brand=sony&category=electronics
/products?brand=sony&price_max=1000&category=electronics
```

All represent the same filtered product list, but create 3 cache entries without sorting.

**Search results:**

```
/search?q=varnish&sort=relevance&page=1
/search?page=1&q=varnish&sort=relevance
```

Identical search results, different cache entries.

**API endpoints:**

```
/api/users?limit=10&offset=20&sort=name
/api/users?sort=name&limit=10&offset=20
```

Same API response, fragmented cache.

### Cache Efficiency Example

Consider an e-commerce site with 3 common filters (category, price, brand):

* **Without sorting**: 3! = 6 possible orderings = 6 cache entries per unique filter combination
* **With sorting**: 1 canonical ordering = 1 cache entry per unique filter combination
* **Result**: 6x reduction in cache fragmentation

For 1,000 unique filter combinations:

* Without sorting: 6,000 cache entries
* With sorting: 1,000 cache entries
* Memory saved: 83%

### Important Considerations

1. **Parameter Values Unchanged**: `std.querysort()` only sorts keys, not values:
   ```
   Before: /page?ids=3,1,2&sort=desc
   After:  /page?ids=3,1,2&sort=desc  # ids value order preserved
   ```

2. **Case Sensitivity**: Parameter names are case-sensitive:
   ```
   /page?Category=books&category=books  # Different parameters!
   ```
   Consider adding case normalization if needed.

3. **Array Parameters**: Some applications use repeated parameter names:
   ```
   /page?tag=red&tag=blue&tag=green
   ```
   `std.querysort()` handles these, but behavior depends on your application's expectations.

4. **Fragment Identifiers**: URL fragments (`#section`) are client-side only and never sent to servers, so they don't affect caching.

## Removing Tracking and Marketing Parameters

Marketing and analytics platforms add tracking parameters to URLs that don't affect content but severely fragment your cache. When users click links from social media, email campaigns, or ads, third-party platforms automatically append tracking parameters that create multiple cache entries for identical content.

**Key insight**: These tracking parameters are typically processed by JavaScript running in the user's browser, not by your server application. Since JavaScript can read parameters from `window.location.search`, your server doesn't need to see them—the client-side analytics code extracts and sends them directly to the analytics platform.

Here's a comprehensive VCL snippet that handles virtually all known tracking parameters from major platforms:

```vcl {filename=".upsun/config.vcl"} theme={null}
sub vcl_recv {
...
    # Remove all marketing get parameters to minimize the cache objects
    if (req.url ~ "(\?|&)(_branch_match_id|srsltid|_bta_c|_bta_tid|_ga|_gl|_ke|_kx|campid|cof|customid|cx|dclid|dm_i|ef_id|epik|fbclid|gad_source|gbraid|gclid|gclsrc|gdffi|gdfms|gdftrk|hsa_acc|hsa_ad|hsa_cam|hsa_grp|hsa_kw|hsa_mt|hsa_net|hsa_src|hsa_tgt|hsa_ver|ie|igshid|irclickid|matomo_campaign|matomo_cid|matomo_content|matomo_group|matomo_keyword|matomo_medium|matomo_placement|matomo_source|mc_cid|mc_eid|mkcid|mkevt|mkrid|mkwid|msclkid|mtm_campaign|mtm_cid|mtm_content|mtm_group|mtm_keyword|mtm_medium|mtm_placement|mtm_source|nb_klid|ndclid|origin|pcrid|piwik_campaign|piwik_keyword|piwik_kwd|pk_campaign|pk_keyword|pk_kwd|redirect_log_mongo_id|redirect_mongo_id|rtid|sb_referer_host|ScCid|si|siteurl|s_kwcid|sms_click|sms_source|sms_uph|toolid|trk_contact|trk_module|trk_msg|trk_sid|ttclid|twclid|utm_campaign|utm_content|utm_creative_format|utm_id|utm_marketing_tactic|utm_medium|utm_source|utm_source_platform|utm_term|wbraid|yclid|zanpid|mc_[a-z]+|utm_[a-z]+|_bta_[a-z]+)=") {
        set req.url = regsuball(req.url, "(_branch_match_id|srsltid|_bta_c|_bta_tid|_ga|_gl|_ke|_kx|campid|cof|customid|cx|dclid|dm_i|ef_id|epik|fbclid|gad_source|gbraid|gclid|gclsrc|gdffi|gdfms|gdftrk|hsa_acc|hsa_ad|hsa_cam|hsa_grp|hsa_kw|hsa_mt|hsa_net|hsa_src|hsa_tgt|hsa_ver|ie|igshid|irclickid|matomo_campaign|matomo_cid|matomo_content|matomo_group|matomo_keyword|matomo_medium|matomo_placement|matomo_source|mc_cid|mc_eid|mkcid|mkevt|mkrid|mkwid|msclkid|mtm_campaign|mtm_cid|mtm_content|mtm_group|mtm_keyword|mtm_medium|mtm_placement|mtm_source|nb_klid|ndclid|origin|pcrid|piwik_campaign|piwik_keyword|piwik_kwd|pk_campaign|pk_keyword|pk_kwd|redirect_log_mongo_id|redirect_mongo_id|rtid|sb_referer_host|ScCid|si|siteurl|s_kwcid|sms_click|sms_source|sms_uph|toolid|trk_contact|trk_module|trk_msg|trk_sid|ttclid|twclid|utm_campaign|utm_content|utm_creative_format|utm_id|utm_marketing_tactic|utm_medium|utm_source|utm_source_platform|utm_term|wbraid|yclid|zanpid|mc_[a-z]+|utm_[a-z]+|_bta_[a-z]+)=[-_A-z0-9+(){}%.]+&?", "");
        set req.url = regsub(req.url, "[?|&]+$", "");
    }
...
}
```

### Platform Coverage

This extended list covers tracking parameters from dozens of platforms and services that automatically append parameters outside your control:

**Major advertising platforms:**

* Google Ads: `gclid`, `gclsrc`, `gbraid`, `wbraid`, `gad_source`
* Facebook/Meta: `fbclid`
* Twitter: `twclid`
* TikTok: `ttclid`
* Bing/Microsoft: `msclkid`
* Yahoo: `yclid`

**Analytics platforms:**

* Google Analytics: `_ga`, `_gl`, `utm_*` (multiple variants)
* Matomo/Piwik: `matomo_*`, `mtm_*`, `piwik_*`, `pk_*`

**Email marketing:**

* Mailchimp: `mc_cid`, `mc_eid`, `mc_*` (pattern match)
* SMS campaigns: `sms_click`, `sms_source`, `sms_uph`

**Affiliate and commerce:**

* eBay: `mkcid`, `mkevt`, `mkrid`, `mkwid`, `toolid`
* Rakuten: `rtid`, `zanpid`
* Impact: `irclickid`

**Social media:**

* Instagram: `igshid`
* Branch.io (deep linking): `_branch_match_id`

**Specialized tracking:**

* Pinterest: `epik`
* Bing Shopping: `_bta_c`, `_bta_tid`, `_bta_*` (pattern match)
* Custom tracking: `trk_*`, `redirect_*`

### Pattern Matching for Flexibility

Notice the regex patterns at the end of the parameter list:

* `mc_[a-z]+`: Matches any Mailchimp parameter starting with `mc_`
* `utm_[a-z]+`: Matches any UTM parameter, including future additions
* `_bta_[a-z]+`: Matches any Bing Tracking parameters

This approach future-proofs your VCL against new tracking parameters from existing platforms.

### How It Works

1. **Two-Stage Approach**: The pattern is checked twice for efficiency:
   * **First check** (`if` condition): Quick regex to see if ANY tracking parameter exists
   * **Second operation** (`regsuball`): More expensive operation to remove all matches
   * The initial check short-circuits when no tracking parameters are present, avoiding expensive `regsuball()` calls on 40-50% of traffic

2. **Enhanced Value Matching**: The removal pattern `[-_A-z0-9+(){}%.]+` matches a wide range of characters found in tracking parameter values, accommodating encoded characters and complex tracking IDs

3. **Cleanup**: The final `regsub` uses `[?|&]+$` to remove any trailing combination of `?` or `&` characters

### Real-World Example

Consider a product page with tracking from multiple campaigns:

```
Without parameter removal:
/products/widget?utm_source=facebook&utm_campaign=spring&fbclid=xyz123
/products/widget?utm_source=twitter&utm_campaign=spring&twclid=abc456
/products/widget?gclid=def789
/products/widget
→ 4 cache entries for identical content

With parameter removal:
All normalize to: /products/widget
→ 1 cache entry
```

### Real-World Impact

For a content site running marketing campaigns across multiple platforms:

* **Daily requests**: 10 million
* **URLs with tracking parameters**: 70% (7 million)
* **Tracking parameter combinations per page**: 100+ variants
* **Cache entries without stripping**: 100 entries per unique page
* **Cache entries with stripping**: 1 entry per unique page
* **Cache hit ratio improvement**: 30% → 85%
* **Backend load reduction**: 55%

### Best Practices

1. **Start Conservative**: Begin with a small list and add parameters as you observe them in your logs

2. **Monitor for False Positives**: Ensure you're not accidentally removing functional parameters that affect content

3. **Coordinate with Teams**: Reassure marketing that analytics will still work (client-side processing is unaffected), and verify with development that no application logic depends on these parameters server-side

4. **Customize**: Remove parameters for platforms you don't use to simplify the regex

## Extended URL Normalization

You can expand normalization patterns to handle other URL variations:

```vcl {filename=".upsun/config.vcl"} theme={null}
sub vcl_recv {
...
    # Remove empty query strings
    if (req.url ~ "\?$") {
        set req.url = regsub(req.url, "\?$", "");
    }

    # Remove trailing slashes (except root) - This can break some applications and redirects
    if (req.url ~ "^/(.+)/$") {
        set req.url = regsub(req.url, "/$", "");
    }

    # Convert multiple slashes to single slash
    if (req.url ~ "//+") {
        set req.url = regsuball(req.url, "//+", "/");
    }
...
}
```

**Important**: Always test trailing slash removal thoroughly. Some applications and frameworks rely on the distinction between `/path` and `/path/` for routing or generating canonical URLs.

## Combining Normalization Techniques

For maximum cache efficiency, combine multiple normalization strategies in the correct order:

```vcl {filename=".upsun/config.vcl"} theme={null}
sub vcl_recv {
...
    # Step 1: Remove empty query strings
    if (req.url ~ "\?$") {
        set req.url = regsub(req.url, "\?$", "");
    }

    # Step 2: Remove tracking parameters
    if (req.url ~ "(\?|&)(_branch_match_id|srsltid|_bta_c|_bta_tid|_ga|_gl|_ke|_kx|campid|cof|customid|cx|dclid|dm_i|ef_id|epik|fbclid|gad_source|gbraid|gclid|gclsrc|gdffi|gdfms|gdftrk|hsa_acc|hsa_ad|hsa_cam|hsa_grp|hsa_kw|hsa_mt|hsa_net|hsa_src|hsa_tgt|hsa_ver|ie|igshid|irclickid|matomo_campaign|matomo_cid|matomo_content|matomo_group|matomo_keyword|matomo_medium|matomo_placement|matomo_source|mc_cid|mc_eid|mkcid|mkevt|mkrid|mkwid|msclkid|mtm_campaign|mtm_cid|mtm_content|mtm_group|mtm_keyword|mtm_medium|mtm_placement|mtm_source|nb_klid|ndclid|origin|pcrid|piwik_campaign|piwik_keyword|piwik_kwd|pk_campaign|pk_keyword|pk_kwd|redirect_log_mongo_id|redirect_mongo_id|rtid|sb_referer_host|ScCid|si|siteurl|s_kwcid|sms_click|sms_source|sms_uph|toolid|trk_contact|trk_module|trk_msg|trk_sid|ttclid|twclid|utm_campaign|utm_content|utm_creative_format|utm_id|utm_marketing_tactic|utm_medium|utm_source|utm_source_platform|utm_term|wbraid|yclid|zanpid|mc_[a-z]+|utm_[a-z]+|_bta_[a-z]+)=") {
        set req.url = regsuball(req.url, "(_branch_match_id|srsltid|_bta_c|_bta_tid|_ga|_gl|_ke|_kx|campid|cof|customid|cx|dclid|dm_i|ef_id|epik|fbclid|gad_source|gbraid|gclid|gclsrc|gdffi|gdfms|gdftrk|hsa_acc|hsa_ad|hsa_cam|hsa_grp|hsa_kw|hsa_mt|hsa_net|hsa_src|hsa_tgt|hsa_ver|ie|igshid|irclickid|matomo_campaign|matomo_cid|matomo_content|matomo_group|matomo_keyword|matomo_medium|matomo_placement|matomo_source|mc_cid|mc_eid|mkcid|mkevt|mkrid|mkwid|msclkid|mtm_campaign|mtm_cid|mtm_content|mtm_group|mtm_keyword|mtm_medium|mtm_placement|mtm_source|nb_klid|ndclid|origin|pcrid|piwik_campaign|piwik_keyword|piwik_kwd|pk_campaign|pk_keyword|pk_kwd|redirect_log_mongo_id|redirect_mongo_id|rtid|sb_referer_host|ScCid|si|siteurl|s_kwcid|sms_click|sms_source|sms_uph|toolid|trk_contact|trk_module|trk_msg|trk_sid|ttclid|twclid|utm_campaign|utm_content|utm_creative_format|utm_id|utm_marketing_tactic|utm_medium|utm_source|utm_source_platform|utm_term|wbraid|yclid|zanpid|mc_[a-z]+|utm_[a-z]+|_bta_[a-z]+)=[-_A-z0-9+(){}%.]+&?", "");
        set req.url = regsub(req.url, "[?|&]+$", "");
    }

    # Step 3: Sort remaining parameters (only if multiple exist)
    if (req.url ~ "\?.+&.+") {
        set req.url = std.querysort(req.url);
    }

    # Step 4: Remove trailing slashes
    if (req.url ~ "^/(.+)/$") {
        set req.url = regsub(req.url, "/$", "");
    }
...
}
```

**Order matters!** Remove unwanted parameters *before* sorting to avoid sorting parameters you'll delete anyway.

## URL Normalization Best Practices

1. **Apply Early**: Place normalization logic at the beginning of `vcl_recv` before any cache lookups or routing decisions.

2. **Be Consistent**: Normalize URLs the same way on both request and cache key generation.

3. **Test Thoroughly**: Ensure normalization doesn't break functionality. Some applications may rely on empty query strings or specific URL formats.

4. **Document Your Rules**: URL normalization rules should be clearly documented, especially when removing query parameters that might be meaningful to some users.

5. **Consider Application Behavior**: Some single-page applications (SPAs) or tracking systems may generate URLs with query parameters. Understand your application's URL patterns before normalizing.

6. **Monitor Impact**: Track cache hit ratio improvements and watch for any broken functionality after implementing normalization.

## Performance Benefits

URL normalization provides several advantages:

* **Higher Cache Hit Ratio**: Fewer unique URLs mean more cache hits
* **Reduced Memory Usage**: Fewer cached objects consume less memory
* **Lower Backend Load**: More cache hits mean fewer backend requests
* **Faster Response Times**: Cached responses are orders of magnitude faster than backend requests
* **Improved Origin Performance**: Less load on your application servers

### Real-World Example

For a high-traffic site serving 1 million requests per day:

* **Without normalization**: 5% of URLs have trailing `?` = 50,000 duplicate cache entries
* **With normalization**: These 50,000 requests now hit existing cache entries
* **Impact**: 50,000 fewer backend requests, reduced memory usage, faster response times

This simple 4-line VCL snippet can have a measurable impact on your application's performance and infrastructure costs.

## Performance Impact by the Numbers

For a site receiving 10 million requests per day with 30% containing multiple query parameters:

* **Requests affected**: 3 million
* **Average parameter orderings**: 4 variations per unique parameter set
* **Cache entries without sorting**: 12 million
* **Cache entries with sorting**: 3 million
* **Backend requests saved**: \~2.25 million per day (assuming 75% cache hit rate on normalized URLs)

This optimization alone can reduce backend load by 20-30% for parameter-heavy applications.

## Conclusion

URL normalization is one of the highest-impact, lowest-effort optimizations you can implement in Varnish. By removing empty query strings, sorting parameters, and stripping tracking parameters, you can dramatically improve cache hit ratios and reduce backend load.

These simple VCL patterns consolidate duplicate cache entries, freeing memory and ensuring that more requests are served from cache. Start with these foundational normalization patterns, monitor your cache hit ratios, and adjust based on your application's specific URL patterns and traffic characteristics.

In our final article, Varnish 104: Advanced Traffic Filtering, we'll explore how to use classification headers to block malicious traffic based on abuse scores, geographic location, and network operators—adding an advanced security layer to your Varnish configuration.
