Varnish 103: Cache Optimization with URL Normalization on Upsun

Cache efficiency can make or break your application’s performance. While Varnish is incredibly powerful at caching, small URL variations can fragment your cache and dramatically reduce hit ratios. A single page with tracking parameters, query string variations, or trailing punctuation can create dozens of duplicate cache entries, each consuming memory and forcing unnecessary backend requests. The technniques covered in this article are essential for anyone looking serve traffic at scale.

The Cache Fragmentation Problem

Consider these URLs that all represent the exact same content:

https://example.com/products
https://example.com/products?
https://example.com/products?utm_source=twitter&utm_campaign=spring
https://example.com/products?utm_campaign=spring&utm_source=twitter
https://example.com/products/?

Without URL normalization, Varnish creates five separate cache entries for identical content. For a high-traffic site, this pattern repeated across thousands of pages results in:

Wasted memory: Storing duplicate content
Lower cache hit ratio: More requests miss cache and hit backend
Higher backend load: Unnecessary duplicate requests to application servers
Slower response times: More requests waiting for backend processing

URL Normalization Strategies

On Upsun, Varnish Cache combined with intelligent URL normalization VCL can dramatically improve cache efficiency. Let’s explore proven techniques that consolidate cache entries and maximize hit ratios.

Removing Empty Query Strings

One of the simplest yet most effective optimizations is removing trailing question marks from URLs.

sub vcl_recv {
...
    # Remove empty query string parameters
    # e.g.: www.example.com/index.html?
    if (req.url ~ "\?$") {
        set req.url = regsub(req.url, "\?$", "");
    }
...
}

Why This Matters

Empty query strings often appear when:

Users manually type URLs with trailing ?
Applications generate links with optional parameters that aren’t always present
URL builders concatenate parameters conditionally, leaving trailing ? when none apply

Consider these two URLs:

https://example.com/products/widget
https://example.com/products/widget?

Without normalization, Varnish treats these as completely different URLs, creating separate cache entries for identical content.

How It Works

Pattern Detection: The condition if (req.url ~ "\?$") uses regex matching to detect URLs ending with a question mark:
- ~: Regex match operator
- \?: Escaped question mark (literal ? character)
- $: End-of-string anchor
URL Rewriting: The regsub() function performs string substitution:
- First parameter (req.url): The string to modify
- Second parameter ("\?$"): The regex pattern to find (trailing ?)
- Third parameter (""): The replacement string (empty, effectively removing the ?)
In-Place Modification: set req.url = ... updates the request URL before cache lookup or backend request.

The Impact on Caching

Without normalization:

Request 1: /page?     → Cache MISS → Backend request → Cache entry A
Request 2: /page      → Cache MISS → Backend request → Cache entry B
Request 3: /page?     → Cache HIT  → Served from entry A
Request 4: /page      → Cache HIT  → Served from entry B

Cache hit ratio: 50% (2 hits, 2 misses) With normalization:

Request 1: /page? → normalized to /page → Cache MISS → Backend request → Cache entry
Request 2: /page  → Cache HIT  → Served from cache
Request 3: /page? → normalized to /page → Cache HIT  → Served from cache
Request 4: /page  → Cache HIT  → Served from cache

Cache hit ratio: 75% (3 hits, 1 miss)

Query String Sorting for Cache Consistency

URLs with identical parameters in different orders should result in the same cached response, but by default they don’t.

sub vcl_recv {
...
    # Sorts query string parameters alphabetically for cache normalization purposes, only when there are multiple parameters
    if (req.url ~ "\?.+&.+") {
        set req.url = std.querysort(req.url);
    }
...
}

The Query String Problem

Consider these URLs that represent the same content:

https://example.com/search?category=books&sort=price&color=blue
https://example.com/search?color=blue&category=books&sort=price
https://example.com/search?sort=price&color=blue&category=books

Without query string sorting, Varnish creates three separate cache entries for identical content, drastically reducing cache efficiency.

How It Works

Multi-Parameter Detection: The regex \?.+&.+ checks if the URL has multiple query parameters:
- \?: Escaped question mark (start of query string)
- .+: One or more characters (first parameter)
- &: Ampersand separator (indicates multiple parameters)
- .+: One or more characters (second/additional parameters)
Conditional Sorting: The if condition ensures we only call std.querysort() when needed:
- Single-parameter URLs like /page?id=123 skip sorting (no & present)
- Multi-parameter URLs get sorted
- This minor optimization avoids unnecessary function calls
Alphabetical Sorting: The std.querysort() function (from the Varnish standard library) sorts parameters alphabetically by key:
- /page?z=1&a=2&m=3 becomes /page?a=2&m=3&z=1
- Maintains parameter values unchanged
- Creates consistent cache keys regardless of parameter order

Why Skip Single Parameters?

The condition \?.+&.+ specifically checks for the & character, which only appears when multiple parameters exist. This is a small VCL optimization:

    # Without the multi-parameter check
    set req.url = std.querysort(req.url);  # Called on every request with query string

    # With the multi-parameter check (recommended)
    if (req.url ~ "\?.+&.+") {
        set req.url = std.querysort(req.url);  # Only called when sorting is needed
    }

For URLs like /page?id=123 (single parameter), there’s nothing to sort, so we skip the function call entirely.

Real-World Impact

Query parameter ordering is especially problematic with: E-commerce filters:

/products?category=electronics&brand=sony&price_max=1000
/products?price_max=1000&brand=sony&category=electronics
/products?brand=sony&price_max=1000&category=electronics

All represent the same filtered product list, but create 3 cache entries without sorting. Search results:

/search?q=varnish&sort=relevance&page=1
/search?page=1&q=varnish&sort=relevance

Identical search results, different cache entries. API endpoints:

/api/users?limit=10&offset=20&sort=name
/api/users?sort=name&limit=10&offset=20

Same API response, fragmented cache.

Cache Efficiency Example

Consider an e-commerce site with 3 common filters (category, price, brand):

Without sorting: 3! = 6 possible orderings = 6 cache entries per unique filter combination
With sorting: 1 canonical ordering = 1 cache entry per unique filter combination
Result: 6x reduction in cache fragmentation

For 1,000 unique filter combinations:

Without sorting: 6,000 cache entries
With sorting: 1,000 cache entries
Memory saved: 83%

Important Considerations

Parameter Values Unchanged: std.querysort() only sorts keys, not values:

Before: /page?ids=3,1,2&sort=desc
After:  /page?ids=3,1,2&sort=desc  # ids value order preserved

Case Sensitivity: Parameter names are case-sensitive:
```
/page?Category=books&category=books  # Different parameters!
```
Consider adding case normalization if needed.
Array Parameters: Some applications use repeated parameter names:
```
/page?tag=red&tag=blue&tag=green
```
std.querysort() handles these, but behavior depends on your application’s expectations.
Fragment Identifiers: URL fragments (#section) are client-side only and never sent to servers, so they don’t affect caching.

Removing Tracking and Marketing Parameters

Marketing and analytics platforms add tracking parameters to URLs that don’t affect content but severely fragment your cache. When users click links from social media, email campaigns, or ads, third-party platforms automatically append tracking parameters that create multiple cache entries for identical content. Key insight: These tracking parameters are typically processed by JavaScript running in the user’s browser, not by your server application. Since JavaScript can read parameters from window.location.search, your server doesn’t need to see them—the client-side analytics code extracts and sends them directly to the analytics platform. Here’s a comprehensive VCL snippet that handles virtually all known tracking parameters from major platforms:

sub vcl_recv {
...
    # Remove all marketing get parameters to minimize the cache objects
    if (req.url ~ "(\?|&)(_branch_match_id|srsltid|_bta_c|_bta_tid|_ga|_gl|_ke|_kx|campid|cof|customid|cx|dclid|dm_i|ef_id|epik|fbclid|gad_source|gbraid|gclid|gclsrc|gdffi|gdfms|gdftrk|hsa_acc|hsa_ad|hsa_cam|hsa_grp|hsa_kw|hsa_mt|hsa_net|hsa_src|hsa_tgt|hsa_ver|ie|igshid|irclickid|matomo_campaign|matomo_cid|matomo_content|matomo_group|matomo_keyword|matomo_medium|matomo_placement|matomo_source|mc_cid|mc_eid|mkcid|mkevt|mkrid|mkwid|msclkid|mtm_campaign|mtm_cid|mtm_content|mtm_group|mtm_keyword|mtm_medium|mtm_placement|mtm_source|nb_klid|ndclid|origin|pcrid|piwik_campaign|piwik_keyword|piwik_kwd|pk_campaign|pk_keyword|pk_kwd|redirect_log_mongo_id|redirect_mongo_id|rtid|sb_referer_host|ScCid|si|siteurl|s_kwcid|sms_click|sms_source|sms_uph|toolid|trk_contact|trk_module|trk_msg|trk_sid|ttclid|twclid|utm_campaign|utm_content|utm_creative_format|utm_id|utm_marketing_tactic|utm_medium|utm_source|utm_source_platform|utm_term|wbraid|yclid|zanpid|mc_[a-z]+|utm_[a-z]+|_bta_[a-z]+)=") {
        set req.url = regsuball(req.url, "(_branch_match_id|srsltid|_bta_c|_bta_tid|_ga|_gl|_ke|_kx|campid|cof|customid|cx|dclid|dm_i|ef_id|epik|fbclid|gad_source|gbraid|gclid|gclsrc|gdffi|gdfms|gdftrk|hsa_acc|hsa_ad|hsa_cam|hsa_grp|hsa_kw|hsa_mt|hsa_net|hsa_src|hsa_tgt|hsa_ver|ie|igshid|irclickid|matomo_campaign|matomo_cid|matomo_content|matomo_group|matomo_keyword|matomo_medium|matomo_placement|matomo_source|mc_cid|mc_eid|mkcid|mkevt|mkrid|mkwid|msclkid|mtm_campaign|mtm_cid|mtm_content|mtm_group|mtm_keyword|mtm_medium|mtm_placement|mtm_source|nb_klid|ndclid|origin|pcrid|piwik_campaign|piwik_keyword|piwik_kwd|pk_campaign|pk_keyword|pk_kwd|redirect_log_mongo_id|redirect_mongo_id|rtid|sb_referer_host|ScCid|si|siteurl|s_kwcid|sms_click|sms_source|sms_uph|toolid|trk_contact|trk_module|trk_msg|trk_sid|ttclid|twclid|utm_campaign|utm_content|utm_creative_format|utm_id|utm_marketing_tactic|utm_medium|utm_source|utm_source_platform|utm_term|wbraid|yclid|zanpid|mc_[a-z]+|utm_[a-z]+|_bta_[a-z]+)=[-_A-z0-9+(){}%.]+&?", "");
        set req.url = regsub(req.url, "[?|&]+$", "");
    }
...
}

Platform Coverage

This extended list covers tracking parameters from dozens of platforms and services that automatically append parameters outside your control: Major advertising platforms:

Google Ads: gclid, gclsrc, gbraid, wbraid, gad_source
Facebook/Meta: fbclid
Twitter: twclid
TikTok: ttclid
Bing/Microsoft: msclkid
Yahoo: yclid

Analytics platforms:

Google Analytics: _ga, _gl, utm_* (multiple variants)
Matomo/Piwik: matomo_*, mtm_*, piwik_*, pk_*

Email marketing:

Mailchimp: mc_cid, mc_eid, mc_* (pattern match)
SMS campaigns: sms_click, sms_source, sms_uph

Affiliate and commerce:

eBay: mkcid, mkevt, mkrid, mkwid, toolid
Rakuten: rtid, zanpid
Impact: irclickid

Social media:

Instagram: igshid
Branch.io (deep linking): _branch_match_id

Specialized tracking:

Pinterest: epik
Bing Shopping: _bta_c, _bta_tid, _bta_* (pattern match)
Custom tracking: trk_*, redirect_*

Pattern Matching for Flexibility

Notice the regex patterns at the end of the parameter list:

mc_[a-z]+: Matches any Mailchimp parameter starting with mc_
utm_[a-z]+: Matches any UTM parameter, including future additions
_bta_[a-z]+: Matches any Bing Tracking parameters

This approach future-proofs your VCL against new tracking parameters from existing platforms.

How It Works

Two-Stage Approach: The pattern is checked twice for efficiency:
- First check (if condition): Quick regex to see if ANY tracking parameter exists
- Second operation (regsuball): More expensive operation to remove all matches
- The initial check short-circuits when no tracking parameters are present, avoiding expensive regsuball() calls on 40-50% of traffic
Enhanced Value Matching: The removal pattern [-_A-z0-9+(){}%.]+ matches a wide range of characters found in tracking parameter values, accommodating encoded characters and complex tracking IDs
Cleanup: The final regsub uses [?|&]+$ to remove any trailing combination of ? or & characters

Real-World Example

Consider a product page with tracking from multiple campaigns:

Without parameter removal:
/products/widget?utm_source=facebook&utm_campaign=spring&fbclid=xyz123
/products/widget?utm_source=twitter&utm_campaign=spring&twclid=abc456
/products/widget?gclid=def789
/products/widget
→ 4 cache entries for identical content

With parameter removal:
All normalize to: /products/widget
→ 1 cache entry

Real-World Impact

For a content site running marketing campaigns across multiple platforms:

Daily requests: 10 million
URLs with tracking parameters: 70% (7 million)
Tracking parameter combinations per page: 100+ variants
Cache entries without stripping: 100 entries per unique page
Cache entries with stripping: 1 entry per unique page
Cache hit ratio improvement: 30% → 85%
Backend load reduction: 55%

Best Practices

Start Conservative: Begin with a small list and add parameters as you observe them in your logs
Monitor for False Positives: Ensure you’re not accidentally removing functional parameters that affect content
Coordinate with Teams: Reassure marketing that analytics will still work (client-side processing is unaffected), and verify with development that no application logic depends on these parameters server-side
Customize: Remove parameters for platforms you don’t use to simplify the regex

Extended URL Normalization

You can expand normalization patterns to handle other URL variations:

sub vcl_recv {
...
    # Remove empty query strings
    if (req.url ~ "\?$") {
        set req.url = regsub(req.url, "\?$", "");
    }

    # Remove trailing slashes (except root) - This can break some applications and redirects
    if (req.url ~ "^/(.+)/$") {
        set req.url = regsub(req.url, "/$", "");
    }

    # Convert multiple slashes to single slash
    if (req.url ~ "//+") {
        set req.url = regsuball(req.url, "//+", "/");
    }
...
}

Important: Always test trailing slash removal thoroughly. Some applications and frameworks rely on the distinction between /path and /path/ for routing or generating canonical URLs.

Combining Normalization Techniques

For maximum cache efficiency, combine multiple normalization strategies in the correct order:

sub vcl_recv {
...
    # Step 1: Remove empty query strings
    if (req.url ~ "\?$") {
        set req.url = regsub(req.url, "\?$", "");
    }

    # Step 2: Remove tracking parameters
    if (req.url ~ "(\?|&)(_branch_match_id|srsltid|_bta_c|_bta_tid|_ga|_gl|_ke|_kx|campid|cof|customid|cx|dclid|dm_i|ef_id|epik|fbclid|gad_source|gbraid|gclid|gclsrc|gdffi|gdfms|gdftrk|hsa_acc|hsa_ad|hsa_cam|hsa_grp|hsa_kw|hsa_mt|hsa_net|hsa_src|hsa_tgt|hsa_ver|ie|igshid|irclickid|matomo_campaign|matomo_cid|matomo_content|matomo_group|matomo_keyword|matomo_medium|matomo_placement|matomo_source|mc_cid|mc_eid|mkcid|mkevt|mkrid|mkwid|msclkid|mtm_campaign|mtm_cid|mtm_content|mtm_group|mtm_keyword|mtm_medium|mtm_placement|mtm_source|nb_klid|ndclid|origin|pcrid|piwik_campaign|piwik_keyword|piwik_kwd|pk_campaign|pk_keyword|pk_kwd|redirect_log_mongo_id|redirect_mongo_id|rtid|sb_referer_host|ScCid|si|siteurl|s_kwcid|sms_click|sms_source|sms_uph|toolid|trk_contact|trk_module|trk_msg|trk_sid|ttclid|twclid|utm_campaign|utm_content|utm_creative_format|utm_id|utm_marketing_tactic|utm_medium|utm_source|utm_source_platform|utm_term|wbraid|yclid|zanpid|mc_[a-z]+|utm_[a-z]+|_bta_[a-z]+)=") {
        set req.url = regsuball(req.url, "(_branch_match_id|srsltid|_bta_c|_bta_tid|_ga|_gl|_ke|_kx|campid|cof|customid|cx|dclid|dm_i|ef_id|epik|fbclid|gad_source|gbraid|gclid|gclsrc|gdffi|gdfms|gdftrk|hsa_acc|hsa_ad|hsa_cam|hsa_grp|hsa_kw|hsa_mt|hsa_net|hsa_src|hsa_tgt|hsa_ver|ie|igshid|irclickid|matomo_campaign|matomo_cid|matomo_content|matomo_group|matomo_keyword|matomo_medium|matomo_placement|matomo_source|mc_cid|mc_eid|mkcid|mkevt|mkrid|mkwid|msclkid|mtm_campaign|mtm_cid|mtm_content|mtm_group|mtm_keyword|mtm_medium|mtm_placement|mtm_source|nb_klid|ndclid|origin|pcrid|piwik_campaign|piwik_keyword|piwik_kwd|pk_campaign|pk_keyword|pk_kwd|redirect_log_mongo_id|redirect_mongo_id|rtid|sb_referer_host|ScCid|si|siteurl|s_kwcid|sms_click|sms_source|sms_uph|toolid|trk_contact|trk_module|trk_msg|trk_sid|ttclid|twclid|utm_campaign|utm_content|utm_creative_format|utm_id|utm_marketing_tactic|utm_medium|utm_source|utm_source_platform|utm_term|wbraid|yclid|zanpid|mc_[a-z]+|utm_[a-z]+|_bta_[a-z]+)=[-_A-z0-9+(){}%.]+&?", "");
        set req.url = regsub(req.url, "[?|&]+$", "");
    }

    # Step 3: Sort remaining parameters (only if multiple exist)
    if (req.url ~ "\?.+&.+") {
        set req.url = std.querysort(req.url);
    }

    # Step 4: Remove trailing slashes
    if (req.url ~ "^/(.+)/$") {
        set req.url = regsub(req.url, "/$", "");
    }
...
}

Order matters! Remove unwanted parameters before sorting to avoid sorting parameters you’ll delete anyway.

URL Normalization Best Practices

Apply Early: Place normalization logic at the beginning of vcl_recv before any cache lookups or routing decisions.
Be Consistent: Normalize URLs the same way on both request and cache key generation.
Test Thoroughly: Ensure normalization doesn’t break functionality. Some applications may rely on empty query strings or specific URL formats.
Document Your Rules: URL normalization rules should be clearly documented, especially when removing query parameters that might be meaningful to some users.
Consider Application Behavior: Some single-page applications (SPAs) or tracking systems may generate URLs with query parameters. Understand your application’s URL patterns before normalizing.
Monitor Impact: Track cache hit ratio improvements and watch for any broken functionality after implementing normalization.

Performance Benefits

URL normalization provides several advantages:

Higher Cache Hit Ratio: Fewer unique URLs mean more cache hits
Reduced Memory Usage: Fewer cached objects consume less memory
Lower Backend Load: More cache hits mean fewer backend requests
Faster Response Times: Cached responses are orders of magnitude faster than backend requests
Improved Origin Performance: Less load on your application servers

Real-World Example

For a high-traffic site serving 1 million requests per day:

Without normalization: 5% of URLs have trailing ? = 50,000 duplicate cache entries
With normalization: These 50,000 requests now hit existing cache entries
Impact: 50,000 fewer backend requests, reduced memory usage, faster response times

This simple 4-line VCL snippet can have a measurable impact on your application’s performance and infrastructure costs.

Performance Impact by the Numbers

For a site receiving 10 million requests per day with 30% containing multiple query parameters:

Requests affected: 3 million
Average parameter orderings: 4 variations per unique parameter set
Cache entries without sorting: 12 million
Cache entries with sorting: 3 million
Backend requests saved: ~2.25 million per day (assuming 75% cache hit rate on normalized URLs)

This optimization alone can reduce backend load by 20-30% for parameter-heavy applications.

Conclusion

URL normalization is one of the highest-impact, lowest-effort optimizations you can implement in Varnish. By removing empty query strings, sorting parameters, and stripping tracking parameters, you can dramatically improve cache hit ratios and reduce backend load. These simple VCL patterns consolidate duplicate cache entries, freeing memory and ensuring that more requests are served from cache. Start with these foundational normalization patterns, monitor your cache hit ratios, and adjust based on your application’s specific URL patterns and traffic characteristics. In our final article, Varnish 104: Advanced Traffic Filtering, we’ll explore how to use classification headers to block malicious traffic based on abuse scores, geographic location, and network operators—adding an advanced security layer to your Varnish configuration.

Articles

​The Cache Fragmentation Problem

​URL Normalization Strategies

​Removing Empty Query Strings

​Why This Matters

​How It Works

​The Impact on Caching

​Query String Sorting for Cache Consistency

​The Query String Problem

​How It Works

​Why Skip Single Parameters?

​Real-World Impact

​Cache Efficiency Example

​Important Considerations

​Removing Tracking and Marketing Parameters

​Platform Coverage

​Pattern Matching for Flexibility

​How It Works

​Real-World Example

​Real-World Impact

​Best Practices

​Extended URL Normalization

​Combining Normalization Techniques

​URL Normalization Best Practices

​Performance Benefits

​Real-World Example

​Performance Impact by the Numbers

​Conclusion

The Cache Fragmentation Problem

URL Normalization Strategies

Removing Empty Query Strings

Why This Matters

How It Works

The Impact on Caching

Query String Sorting for Cache Consistency

The Query String Problem

How It Works

Why Skip Single Parameters?

Real-World Impact

Cache Efficiency Example

Important Considerations

Removing Tracking and Marketing Parameters

Platform Coverage

Pattern Matching for Flexibility

How It Works

Real-World Example

Real-World Impact

Best Practices

Extended URL Normalization

Combining Normalization Techniques

URL Normalization Best Practices

Performance Benefits

Real-World Example

Performance Impact by the Numbers

Conclusion