Cache efficiency can make or break your application’s performance. While Varnish is incredibly powerful at caching, small URL variations can fragment your cache and dramatically reduce hit ratios. A single page with tracking parameters, query string variations, or trailing punctuation can create dozens of duplicate cache entries, each consuming memory and forcing unnecessary backend requests. The technniques covered in this article are essential for anyone looking serve traffic at scale.Documentation Index
Fetch the complete documentation index at: https://developer.upsun.com/llms.txt
Use this file to discover all available pages before exploring further.
The Cache Fragmentation Problem
Consider these URLs that all represent the exact same content:- Wasted memory: Storing duplicate content
- Lower cache hit ratio: More requests miss cache and hit backend
- Higher backend load: Unnecessary duplicate requests to application servers
- Slower response times: More requests waiting for backend processing
URL Normalization Strategies
On Upsun, Varnish Cache combined with intelligent URL normalization VCL can dramatically improve cache efficiency. Let’s explore proven techniques that consolidate cache entries and maximize hit ratios.Removing Empty Query Strings
One of the simplest yet most effective optimizations is removing trailing question marks from URLs.Why This Matters
Empty query strings often appear when:- Users manually type URLs with trailing
? - Applications generate links with optional parameters that aren’t always present
- URL builders concatenate parameters conditionally, leaving trailing
?when none apply
https://example.com/products/widgethttps://example.com/products/widget?
How It Works
-
Pattern Detection: The condition
if (req.url ~ "\?$")uses regex matching to detect URLs ending with a question mark:~: Regex match operator\?: Escaped question mark (literal?character)$: End-of-string anchor
-
URL Rewriting: The
regsub()function performs string substitution:- First parameter (
req.url): The string to modify - Second parameter (
"\?$"): The regex pattern to find (trailing?) - Third parameter (
""): The replacement string (empty, effectively removing the?)
- First parameter (
-
In-Place Modification:
set req.url = ...updates the request URL before cache lookup or backend request.
The Impact on Caching
Without normalization:Query String Sorting for Cache Consistency
URLs with identical parameters in different orders should result in the same cached response, but by default they don’t.The Query String Problem
Consider these URLs that represent the same content:https://example.com/search?category=books&sort=price&color=bluehttps://example.com/search?color=blue&category=books&sort=pricehttps://example.com/search?sort=price&color=blue&category=books
How It Works
-
Multi-Parameter Detection: The regex
\?.+&.+checks if the URL has multiple query parameters:\?: Escaped question mark (start of query string).+: One or more characters (first parameter)&: Ampersand separator (indicates multiple parameters).+: One or more characters (second/additional parameters)
-
Conditional Sorting: The
ifcondition ensures we only callstd.querysort()when needed:- Single-parameter URLs like
/page?id=123skip sorting (no&present) - Multi-parameter URLs get sorted
- This minor optimization avoids unnecessary function calls
- Single-parameter URLs like
-
Alphabetical Sorting: The
std.querysort()function (from the Varnish standard library) sorts parameters alphabetically by key:/page?z=1&a=2&m=3becomes/page?a=2&m=3&z=1- Maintains parameter values unchanged
- Creates consistent cache keys regardless of parameter order
Why Skip Single Parameters?
The condition\?.+&.+ specifically checks for the & character, which only appears when multiple parameters exist. This is a small VCL optimization:
/page?id=123 (single parameter), there’s nothing to sort, so we skip the function call entirely.
Real-World Impact
Query parameter ordering is especially problematic with: E-commerce filters:Cache Efficiency Example
Consider an e-commerce site with 3 common filters (category, price, brand):- Without sorting: 3! = 6 possible orderings = 6 cache entries per unique filter combination
- With sorting: 1 canonical ordering = 1 cache entry per unique filter combination
- Result: 6x reduction in cache fragmentation
- Without sorting: 6,000 cache entries
- With sorting: 1,000 cache entries
- Memory saved: 83%
Important Considerations
-
Parameter Values Unchanged:
std.querysort()only sorts keys, not values: -
Case Sensitivity: Parameter names are case-sensitive:
Consider adding case normalization if needed.
-
Array Parameters: Some applications use repeated parameter names:
std.querysort()handles these, but behavior depends on your application’s expectations. -
Fragment Identifiers: URL fragments (
#section) are client-side only and never sent to servers, so they don’t affect caching.
Removing Tracking and Marketing Parameters
Marketing and analytics platforms add tracking parameters to URLs that don’t affect content but severely fragment your cache. When users click links from social media, email campaigns, or ads, third-party platforms automatically append tracking parameters that create multiple cache entries for identical content. Key insight: These tracking parameters are typically processed by JavaScript running in the user’s browser, not by your server application. Since JavaScript can read parameters fromwindow.location.search, your server doesn’t need to see them—the client-side analytics code extracts and sends them directly to the analytics platform.
Here’s a comprehensive VCL snippet that handles virtually all known tracking parameters from major platforms:
Platform Coverage
This extended list covers tracking parameters from dozens of platforms and services that automatically append parameters outside your control: Major advertising platforms:- Google Ads:
gclid,gclsrc,gbraid,wbraid,gad_source - Facebook/Meta:
fbclid - Twitter:
twclid - TikTok:
ttclid - Bing/Microsoft:
msclkid - Yahoo:
yclid
- Google Analytics:
_ga,_gl,utm_*(multiple variants) - Matomo/Piwik:
matomo_*,mtm_*,piwik_*,pk_*
- Mailchimp:
mc_cid,mc_eid,mc_*(pattern match) - SMS campaigns:
sms_click,sms_source,sms_uph
- eBay:
mkcid,mkevt,mkrid,mkwid,toolid - Rakuten:
rtid,zanpid - Impact:
irclickid
- Instagram:
igshid - Branch.io (deep linking):
_branch_match_id
- Pinterest:
epik - Bing Shopping:
_bta_c,_bta_tid,_bta_*(pattern match) - Custom tracking:
trk_*,redirect_*
Pattern Matching for Flexibility
Notice the regex patterns at the end of the parameter list:mc_[a-z]+: Matches any Mailchimp parameter starting withmc_utm_[a-z]+: Matches any UTM parameter, including future additions_bta_[a-z]+: Matches any Bing Tracking parameters
How It Works
-
Two-Stage Approach: The pattern is checked twice for efficiency:
- First check (
ifcondition): Quick regex to see if ANY tracking parameter exists - Second operation (
regsuball): More expensive operation to remove all matches - The initial check short-circuits when no tracking parameters are present, avoiding expensive
regsuball()calls on 40-50% of traffic
- First check (
-
Enhanced Value Matching: The removal pattern
[-_A-z0-9+(){}%.]+matches a wide range of characters found in tracking parameter values, accommodating encoded characters and complex tracking IDs -
Cleanup: The final
regsubuses[?|&]+$to remove any trailing combination of?or&characters
Real-World Example
Consider a product page with tracking from multiple campaigns:Real-World Impact
For a content site running marketing campaigns across multiple platforms:- Daily requests: 10 million
- URLs with tracking parameters: 70% (7 million)
- Tracking parameter combinations per page: 100+ variants
- Cache entries without stripping: 100 entries per unique page
- Cache entries with stripping: 1 entry per unique page
- Cache hit ratio improvement: 30% → 85%
- Backend load reduction: 55%
Best Practices
- Start Conservative: Begin with a small list and add parameters as you observe them in your logs
- Monitor for False Positives: Ensure you’re not accidentally removing functional parameters that affect content
- Coordinate with Teams: Reassure marketing that analytics will still work (client-side processing is unaffected), and verify with development that no application logic depends on these parameters server-side
- Customize: Remove parameters for platforms you don’t use to simplify the regex
Extended URL Normalization
You can expand normalization patterns to handle other URL variations:/path and /path/ for routing or generating canonical URLs.
Combining Normalization Techniques
For maximum cache efficiency, combine multiple normalization strategies in the correct order:URL Normalization Best Practices
-
Apply Early: Place normalization logic at the beginning of
vcl_recvbefore any cache lookups or routing decisions. - Be Consistent: Normalize URLs the same way on both request and cache key generation.
- Test Thoroughly: Ensure normalization doesn’t break functionality. Some applications may rely on empty query strings or specific URL formats.
- Document Your Rules: URL normalization rules should be clearly documented, especially when removing query parameters that might be meaningful to some users.
- Consider Application Behavior: Some single-page applications (SPAs) or tracking systems may generate URLs with query parameters. Understand your application’s URL patterns before normalizing.
- Monitor Impact: Track cache hit ratio improvements and watch for any broken functionality after implementing normalization.
Performance Benefits
URL normalization provides several advantages:- Higher Cache Hit Ratio: Fewer unique URLs mean more cache hits
- Reduced Memory Usage: Fewer cached objects consume less memory
- Lower Backend Load: More cache hits mean fewer backend requests
- Faster Response Times: Cached responses are orders of magnitude faster than backend requests
- Improved Origin Performance: Less load on your application servers
Real-World Example
For a high-traffic site serving 1 million requests per day:- Without normalization: 5% of URLs have trailing
?= 50,000 duplicate cache entries - With normalization: These 50,000 requests now hit existing cache entries
- Impact: 50,000 fewer backend requests, reduced memory usage, faster response times
Performance Impact by the Numbers
For a site receiving 10 million requests per day with 30% containing multiple query parameters:- Requests affected: 3 million
- Average parameter orderings: 4 variations per unique parameter set
- Cache entries without sorting: 12 million
- Cache entries with sorting: 3 million
- Backend requests saved: ~2.25 million per day (assuming 75% cache hit rate on normalized URLs)