Cache efficiency can make or break your application’s performance. While Varnish is incredibly powerful at caching, small URL variations can fragment your cache and dramatically reduce hit ratios. A single page with tracking parameters, query string variations, or trailing punctuation can create dozens of duplicate cache entries, each consuming memory and forcing unnecessary backend requests. The technniques covered in this article are essential for anyone looking serve traffic at scale.
The Cache Fragmentation Problem
Consider these URLs that all represent the exact same content:
https://example.com/products
https://example.com/products?
https://example.com/products?utm_source=twitter&utm_campaign=spring
https://example.com/products?utm_campaign=spring&utm_source=twitter
https://example.com/products/?
Without URL normalization, Varnish creates five separate cache entries for identical content. For a high-traffic site, this pattern repeated across thousands of pages results in:
- Wasted memory: Storing duplicate content
- Lower cache hit ratio: More requests miss cache and hit backend
- Higher backend load: Unnecessary duplicate requests to application servers
- Slower response times: More requests waiting for backend processing
URL Normalization Strategies
On Upsun, Varnish Cache combined with intelligent URL normalization VCL can dramatically improve cache efficiency. Let’s explore proven techniques that consolidate cache entries and maximize hit ratios.
Removing Empty Query Strings
One of the simplest yet most effective optimizations is removing trailing question marks from URLs.
sub vcl_recv {
...
# Remove empty query string parameters
# e.g.: www.example.com/index.html?
if (req.url ~ "\?$") {
set req.url = regsub(req.url, "\?$", "");
}
...
}
Why This Matters
Empty query strings often appear when:
- Users manually type URLs with trailing
?
- Applications generate links with optional parameters that aren’t always present
- URL builders concatenate parameters conditionally, leaving trailing
? when none apply
Consider these two URLs:
https://example.com/products/widget
https://example.com/products/widget?
Without normalization, Varnish treats these as completely different URLs, creating separate cache entries for identical content.
How It Works
-
Pattern Detection: The condition
if (req.url ~ "\?$") uses regex matching to detect URLs ending with a question mark:
~: Regex match operator
\?: Escaped question mark (literal ? character)
$: End-of-string anchor
-
URL Rewriting: The
regsub() function performs string substitution:
- First parameter (
req.url): The string to modify
- Second parameter (
"\?$"): The regex pattern to find (trailing ?)
- Third parameter (
""): The replacement string (empty, effectively removing the ?)
-
In-Place Modification:
set req.url = ... updates the request URL before cache lookup or backend request.
The Impact on Caching
Without normalization:
Request 1: /page? → Cache MISS → Backend request → Cache entry A
Request 2: /page → Cache MISS → Backend request → Cache entry B
Request 3: /page? → Cache HIT → Served from entry A
Request 4: /page → Cache HIT → Served from entry B
Cache hit ratio: 50% (2 hits, 2 misses)
With normalization:
Request 1: /page? → normalized to /page → Cache MISS → Backend request → Cache entry
Request 2: /page → Cache HIT → Served from cache
Request 3: /page? → normalized to /page → Cache HIT → Served from cache
Request 4: /page → Cache HIT → Served from cache
Cache hit ratio: 75% (3 hits, 1 miss)
Query String Sorting for Cache Consistency
URLs with identical parameters in different orders should result in the same cached response, but by default they don’t.
sub vcl_recv {
...
# Sorts query string parameters alphabetically for cache normalization purposes, only when there are multiple parameters
if (req.url ~ "\?.+&.+") {
set req.url = std.querysort(req.url);
}
...
}
The Query String Problem
Consider these URLs that represent the same content:
https://example.com/search?category=books&sort=price&color=blue
https://example.com/search?color=blue&category=books&sort=price
https://example.com/search?sort=price&color=blue&category=books
Without query string sorting, Varnish creates three separate cache entries for identical content, drastically reducing cache efficiency.
How It Works
-
Multi-Parameter Detection: The regex
\?.+&.+ checks if the URL has multiple query parameters:
\?: Escaped question mark (start of query string)
.+: One or more characters (first parameter)
&: Ampersand separator (indicates multiple parameters)
.+: One or more characters (second/additional parameters)
-
Conditional Sorting: The
if condition ensures we only call std.querysort() when needed:
- Single-parameter URLs like
/page?id=123 skip sorting (no & present)
- Multi-parameter URLs get sorted
- This minor optimization avoids unnecessary function calls
-
Alphabetical Sorting: The
std.querysort() function (from the Varnish standard library) sorts parameters alphabetically by key:
/page?z=1&a=2&m=3 becomes /page?a=2&m=3&z=1
- Maintains parameter values unchanged
- Creates consistent cache keys regardless of parameter order
Why Skip Single Parameters?
The condition \?.+&.+ specifically checks for the & character, which only appears when multiple parameters exist. This is a small VCL optimization:
# Without the multi-parameter check
set req.url = std.querysort(req.url); # Called on every request with query string
# With the multi-parameter check (recommended)
if (req.url ~ "\?.+&.+") {
set req.url = std.querysort(req.url); # Only called when sorting is needed
}
For URLs like /page?id=123 (single parameter), there’s nothing to sort, so we skip the function call entirely.
Real-World Impact
Query parameter ordering is especially problematic with:
E-commerce filters:
/products?category=electronics&brand=sony&price_max=1000
/products?price_max=1000&brand=sony&category=electronics
/products?brand=sony&price_max=1000&category=electronics
All represent the same filtered product list, but create 3 cache entries without sorting.
Search results:
/search?q=varnish&sort=relevance&page=1
/search?page=1&q=varnish&sort=relevance
Identical search results, different cache entries.
API endpoints:
/api/users?limit=10&offset=20&sort=name
/api/users?sort=name&limit=10&offset=20
Same API response, fragmented cache.
Cache Efficiency Example
Consider an e-commerce site with 3 common filters (category, price, brand):
- Without sorting: 3! = 6 possible orderings = 6 cache entries per unique filter combination
- With sorting: 1 canonical ordering = 1 cache entry per unique filter combination
- Result: 6x reduction in cache fragmentation
For 1,000 unique filter combinations:
- Without sorting: 6,000 cache entries
- With sorting: 1,000 cache entries
- Memory saved: 83%
Important Considerations
-
Parameter Values Unchanged:
std.querysort() only sorts keys, not values:
Before: /page?ids=3,1,2&sort=desc
After: /page?ids=3,1,2&sort=desc # ids value order preserved
-
Case Sensitivity: Parameter names are case-sensitive:
/page?Category=books&category=books # Different parameters!
Consider adding case normalization if needed.
-
Array Parameters: Some applications use repeated parameter names:
/page?tag=red&tag=blue&tag=green
std.querysort() handles these, but behavior depends on your application’s expectations.
-
Fragment Identifiers: URL fragments (
#section) are client-side only and never sent to servers, so they don’t affect caching.
Removing Tracking and Marketing Parameters
Marketing and analytics platforms add tracking parameters to URLs that don’t affect content but severely fragment your cache. When users click links from social media, email campaigns, or ads, third-party platforms automatically append tracking parameters that create multiple cache entries for identical content.
Key insight: These tracking parameters are typically processed by JavaScript running in the user’s browser, not by your server application. Since JavaScript can read parameters from window.location.search, your server doesn’t need to see them—the client-side analytics code extracts and sends them directly to the analytics platform.
Here’s a comprehensive VCL snippet that handles virtually all known tracking parameters from major platforms:
sub vcl_recv {
...
# Remove all marketing get parameters to minimize the cache objects
if (req.url ~ "(\?|&)(_branch_match_id|srsltid|_bta_c|_bta_tid|_ga|_gl|_ke|_kx|campid|cof|customid|cx|dclid|dm_i|ef_id|epik|fbclid|gad_source|gbraid|gclid|gclsrc|gdffi|gdfms|gdftrk|hsa_acc|hsa_ad|hsa_cam|hsa_grp|hsa_kw|hsa_mt|hsa_net|hsa_src|hsa_tgt|hsa_ver|ie|igshid|irclickid|matomo_campaign|matomo_cid|matomo_content|matomo_group|matomo_keyword|matomo_medium|matomo_placement|matomo_source|mc_cid|mc_eid|mkcid|mkevt|mkrid|mkwid|msclkid|mtm_campaign|mtm_cid|mtm_content|mtm_group|mtm_keyword|mtm_medium|mtm_placement|mtm_source|nb_klid|ndclid|origin|pcrid|piwik_campaign|piwik_keyword|piwik_kwd|pk_campaign|pk_keyword|pk_kwd|redirect_log_mongo_id|redirect_mongo_id|rtid|sb_referer_host|ScCid|si|siteurl|s_kwcid|sms_click|sms_source|sms_uph|toolid|trk_contact|trk_module|trk_msg|trk_sid|ttclid|twclid|utm_campaign|utm_content|utm_creative_format|utm_id|utm_marketing_tactic|utm_medium|utm_source|utm_source_platform|utm_term|wbraid|yclid|zanpid|mc_[a-z]+|utm_[a-z]+|_bta_[a-z]+)=") {
set req.url = regsuball(req.url, "(_branch_match_id|srsltid|_bta_c|_bta_tid|_ga|_gl|_ke|_kx|campid|cof|customid|cx|dclid|dm_i|ef_id|epik|fbclid|gad_source|gbraid|gclid|gclsrc|gdffi|gdfms|gdftrk|hsa_acc|hsa_ad|hsa_cam|hsa_grp|hsa_kw|hsa_mt|hsa_net|hsa_src|hsa_tgt|hsa_ver|ie|igshid|irclickid|matomo_campaign|matomo_cid|matomo_content|matomo_group|matomo_keyword|matomo_medium|matomo_placement|matomo_source|mc_cid|mc_eid|mkcid|mkevt|mkrid|mkwid|msclkid|mtm_campaign|mtm_cid|mtm_content|mtm_group|mtm_keyword|mtm_medium|mtm_placement|mtm_source|nb_klid|ndclid|origin|pcrid|piwik_campaign|piwik_keyword|piwik_kwd|pk_campaign|pk_keyword|pk_kwd|redirect_log_mongo_id|redirect_mongo_id|rtid|sb_referer_host|ScCid|si|siteurl|s_kwcid|sms_click|sms_source|sms_uph|toolid|trk_contact|trk_module|trk_msg|trk_sid|ttclid|twclid|utm_campaign|utm_content|utm_creative_format|utm_id|utm_marketing_tactic|utm_medium|utm_source|utm_source_platform|utm_term|wbraid|yclid|zanpid|mc_[a-z]+|utm_[a-z]+|_bta_[a-z]+)=[-_A-z0-9+(){}%.]+&?", "");
set req.url = regsub(req.url, "[?|&]+$", "");
}
...
}
This extended list covers tracking parameters from dozens of platforms and services that automatically append parameters outside your control:
Major advertising platforms:
- Google Ads:
gclid, gclsrc, gbraid, wbraid, gad_source
- Facebook/Meta:
fbclid
- Twitter:
twclid
- TikTok:
ttclid
- Bing/Microsoft:
msclkid
- Yahoo:
yclid
Analytics platforms:
- Google Analytics:
_ga, _gl, utm_* (multiple variants)
- Matomo/Piwik:
matomo_*, mtm_*, piwik_*, pk_*
Email marketing:
- Mailchimp:
mc_cid, mc_eid, mc_* (pattern match)
- SMS campaigns:
sms_click, sms_source, sms_uph
Affiliate and commerce:
- eBay:
mkcid, mkevt, mkrid, mkwid, toolid
- Rakuten:
rtid, zanpid
- Impact:
irclickid
Social media:
- Instagram:
igshid
- Branch.io (deep linking):
_branch_match_id
Specialized tracking:
- Pinterest:
epik
- Bing Shopping:
_bta_c, _bta_tid, _bta_* (pattern match)
- Custom tracking:
trk_*, redirect_*
Pattern Matching for Flexibility
Notice the regex patterns at the end of the parameter list:
mc_[a-z]+: Matches any Mailchimp parameter starting with mc_
utm_[a-z]+: Matches any UTM parameter, including future additions
_bta_[a-z]+: Matches any Bing Tracking parameters
This approach future-proofs your VCL against new tracking parameters from existing platforms.
How It Works
-
Two-Stage Approach: The pattern is checked twice for efficiency:
- First check (
if condition): Quick regex to see if ANY tracking parameter exists
- Second operation (
regsuball): More expensive operation to remove all matches
- The initial check short-circuits when no tracking parameters are present, avoiding expensive
regsuball() calls on 40-50% of traffic
-
Enhanced Value Matching: The removal pattern
[-_A-z0-9+(){}%.]+ matches a wide range of characters found in tracking parameter values, accommodating encoded characters and complex tracking IDs
-
Cleanup: The final
regsub uses [?|&]+$ to remove any trailing combination of ? or & characters
Real-World Example
Consider a product page with tracking from multiple campaigns:
Without parameter removal:
/products/widget?utm_source=facebook&utm_campaign=spring&fbclid=xyz123
/products/widget?utm_source=twitter&utm_campaign=spring&twclid=abc456
/products/widget?gclid=def789
/products/widget
→ 4 cache entries for identical content
With parameter removal:
All normalize to: /products/widget
→ 1 cache entry
Real-World Impact
For a content site running marketing campaigns across multiple platforms:
- Daily requests: 10 million
- URLs with tracking parameters: 70% (7 million)
- Tracking parameter combinations per page: 100+ variants
- Cache entries without stripping: 100 entries per unique page
- Cache entries with stripping: 1 entry per unique page
- Cache hit ratio improvement: 30% → 85%
- Backend load reduction: 55%
Best Practices
-
Start Conservative: Begin with a small list and add parameters as you observe them in your logs
-
Monitor for False Positives: Ensure you’re not accidentally removing functional parameters that affect content
-
Coordinate with Teams: Reassure marketing that analytics will still work (client-side processing is unaffected), and verify with development that no application logic depends on these parameters server-side
-
Customize: Remove parameters for platforms you don’t use to simplify the regex
Extended URL Normalization
You can expand normalization patterns to handle other URL variations:
sub vcl_recv {
...
# Remove empty query strings
if (req.url ~ "\?$") {
set req.url = regsub(req.url, "\?$", "");
}
# Remove trailing slashes (except root) - This can break some applications and redirects
if (req.url ~ "^/(.+)/$") {
set req.url = regsub(req.url, "/$", "");
}
# Convert multiple slashes to single slash
if (req.url ~ "//+") {
set req.url = regsuball(req.url, "//+", "/");
}
...
}
Important: Always test trailing slash removal thoroughly. Some applications and frameworks rely on the distinction between /path and /path/ for routing or generating canonical URLs.
Combining Normalization Techniques
For maximum cache efficiency, combine multiple normalization strategies in the correct order:
sub vcl_recv {
...
# Step 1: Remove empty query strings
if (req.url ~ "\?$") {
set req.url = regsub(req.url, "\?$", "");
}
# Step 2: Remove tracking parameters
if (req.url ~ "(\?|&)(_branch_match_id|srsltid|_bta_c|_bta_tid|_ga|_gl|_ke|_kx|campid|cof|customid|cx|dclid|dm_i|ef_id|epik|fbclid|gad_source|gbraid|gclid|gclsrc|gdffi|gdfms|gdftrk|hsa_acc|hsa_ad|hsa_cam|hsa_grp|hsa_kw|hsa_mt|hsa_net|hsa_src|hsa_tgt|hsa_ver|ie|igshid|irclickid|matomo_campaign|matomo_cid|matomo_content|matomo_group|matomo_keyword|matomo_medium|matomo_placement|matomo_source|mc_cid|mc_eid|mkcid|mkevt|mkrid|mkwid|msclkid|mtm_campaign|mtm_cid|mtm_content|mtm_group|mtm_keyword|mtm_medium|mtm_placement|mtm_source|nb_klid|ndclid|origin|pcrid|piwik_campaign|piwik_keyword|piwik_kwd|pk_campaign|pk_keyword|pk_kwd|redirect_log_mongo_id|redirect_mongo_id|rtid|sb_referer_host|ScCid|si|siteurl|s_kwcid|sms_click|sms_source|sms_uph|toolid|trk_contact|trk_module|trk_msg|trk_sid|ttclid|twclid|utm_campaign|utm_content|utm_creative_format|utm_id|utm_marketing_tactic|utm_medium|utm_source|utm_source_platform|utm_term|wbraid|yclid|zanpid|mc_[a-z]+|utm_[a-z]+|_bta_[a-z]+)=") {
set req.url = regsuball(req.url, "(_branch_match_id|srsltid|_bta_c|_bta_tid|_ga|_gl|_ke|_kx|campid|cof|customid|cx|dclid|dm_i|ef_id|epik|fbclid|gad_source|gbraid|gclid|gclsrc|gdffi|gdfms|gdftrk|hsa_acc|hsa_ad|hsa_cam|hsa_grp|hsa_kw|hsa_mt|hsa_net|hsa_src|hsa_tgt|hsa_ver|ie|igshid|irclickid|matomo_campaign|matomo_cid|matomo_content|matomo_group|matomo_keyword|matomo_medium|matomo_placement|matomo_source|mc_cid|mc_eid|mkcid|mkevt|mkrid|mkwid|msclkid|mtm_campaign|mtm_cid|mtm_content|mtm_group|mtm_keyword|mtm_medium|mtm_placement|mtm_source|nb_klid|ndclid|origin|pcrid|piwik_campaign|piwik_keyword|piwik_kwd|pk_campaign|pk_keyword|pk_kwd|redirect_log_mongo_id|redirect_mongo_id|rtid|sb_referer_host|ScCid|si|siteurl|s_kwcid|sms_click|sms_source|sms_uph|toolid|trk_contact|trk_module|trk_msg|trk_sid|ttclid|twclid|utm_campaign|utm_content|utm_creative_format|utm_id|utm_marketing_tactic|utm_medium|utm_source|utm_source_platform|utm_term|wbraid|yclid|zanpid|mc_[a-z]+|utm_[a-z]+|_bta_[a-z]+)=[-_A-z0-9+(){}%.]+&?", "");
set req.url = regsub(req.url, "[?|&]+$", "");
}
# Step 3: Sort remaining parameters (only if multiple exist)
if (req.url ~ "\?.+&.+") {
set req.url = std.querysort(req.url);
}
# Step 4: Remove trailing slashes
if (req.url ~ "^/(.+)/$") {
set req.url = regsub(req.url, "/$", "");
}
...
}
Order matters! Remove unwanted parameters before sorting to avoid sorting parameters you’ll delete anyway.
URL Normalization Best Practices
-
Apply Early: Place normalization logic at the beginning of
vcl_recv before any cache lookups or routing decisions.
-
Be Consistent: Normalize URLs the same way on both request and cache key generation.
-
Test Thoroughly: Ensure normalization doesn’t break functionality. Some applications may rely on empty query strings or specific URL formats.
-
Document Your Rules: URL normalization rules should be clearly documented, especially when removing query parameters that might be meaningful to some users.
-
Consider Application Behavior: Some single-page applications (SPAs) or tracking systems may generate URLs with query parameters. Understand your application’s URL patterns before normalizing.
-
Monitor Impact: Track cache hit ratio improvements and watch for any broken functionality after implementing normalization.
URL normalization provides several advantages:
- Higher Cache Hit Ratio: Fewer unique URLs mean more cache hits
- Reduced Memory Usage: Fewer cached objects consume less memory
- Lower Backend Load: More cache hits mean fewer backend requests
- Faster Response Times: Cached responses are orders of magnitude faster than backend requests
- Improved Origin Performance: Less load on your application servers
Real-World Example
For a high-traffic site serving 1 million requests per day:
- Without normalization: 5% of URLs have trailing
? = 50,000 duplicate cache entries
- With normalization: These 50,000 requests now hit existing cache entries
- Impact: 50,000 fewer backend requests, reduced memory usage, faster response times
This simple 4-line VCL snippet can have a measurable impact on your application’s performance and infrastructure costs.
For a site receiving 10 million requests per day with 30% containing multiple query parameters:
- Requests affected: 3 million
- Average parameter orderings: 4 variations per unique parameter set
- Cache entries without sorting: 12 million
- Cache entries with sorting: 3 million
- Backend requests saved: ~2.25 million per day (assuming 75% cache hit rate on normalized URLs)
This optimization alone can reduce backend load by 20-30% for parameter-heavy applications.
Conclusion
URL normalization is one of the highest-impact, lowest-effort optimizations you can implement in Varnish. By removing empty query strings, sorting parameters, and stripping tracking parameters, you can dramatically improve cache hit ratios and reduce backend load.
These simple VCL patterns consolidate duplicate cache entries, freeing memory and ensuring that more requests are served from cache. Start with these foundational normalization patterns, monitor your cache hit ratios, and adjust based on your application’s specific URL patterns and traffic characteristics.
In our final article, Varnish 104: Advanced Traffic Filtering, we’ll explore how to use classification headers to block malicious traffic based on abuse scores, geographic location, and network operators—adding an advanced security layer to your Varnish configuration. Last modified on April 14, 2026