Rate limiting is essential for protecting your application from abuse and ensuring fair resource allocation. While Varnish is best known for caching, itsDocumentation Index
Fetch the complete documentation index at: https://developer.upsun.com/llms.txt
Use this file to discover all available pages before exploring further.
vsthrottle VMOD provides powerful rate limiting capabilities that can protect your backend infrastructure from being overwhelmed by excessive requests, whether from malicious actors or legitimate traffic spikes.
Why Rate Limiting in Varnish?
Rate limiting at the Varnish layer offers several key advantages:- Early Termination: Excessive requests are blocked at the edge, never reaching your application servers
- Resource Protection: Prevents any single client from monopolizing backend resources
- DDoS Mitigation: Provides a first line of defense against distributed denial-of-service attacks
- Fair Resource Allocation: Ensures all users get reasonable access to your application
- Reduced Infrastructure Costs: Less load on backend servers means more efficient resource usage
Understanding vsthrottle
Thevsthrottle VMOD (Varnish Module) provides the core functionality for rate limiting. It works by tracking request counts per identifier (typically IP address) within sliding time windows.
The primary function is vsthrottle.is_denied(), which takes four parameters:
- key: Identifier for tracking (usually IP address)
- limit: Maximum number of requests allowed
- period: Time window for counting requests
- block_duration: How long to block after exceeding the limit
General Request Rate Limiting
The most common rate limiting pattern restricts how many requests a client can make within a time window:How It Works
-
Path Exclusion: The first condition
if (req.url !~ "^/(media|static|banner|admin)/")excludes certain paths from rate limiting:!~: Negative regex match (does NOT match)- Static assets (
/media,/static) are excluded because they’re cacheable and typically high-volume - Admin paths are excluded (these might have their own authentication/rate limiting)
-
vsthrottle.is_denied() Function: This is the core rate limiting logic with four parameters:
- Parameter 1 (
req.http.X-Client-IP): The identifier key for rate limiting. Uses the client’s IP address to track individual users. - Parameter 2 (
30): Request threshold - maximum number of requests allowed - Parameter 3 (
15s): Time window for counting requests (15 seconds) - Parameter 4 (
15s): Penalty period - how long to block after exceeding the limit
- Parameter 1 (
-
Rate Limiting Logic: “30 requests per 15 seconds” means:
- Client can make up to 30 requests in any 15-second rolling window
- Once exceeded, the client is blocked completely for 15 seconds
- After the penalty period, the counter resets
- HTTP 429 Response: Returns a standard “Too Many Requests” status code with a clear message indicating the wait time.
Why Exclude Static Assets?
Static assets like images, CSS, and JavaScript files are:- Usually cached by Varnish (subsequent requests don’t hit backend)
- High-volume but low-cost to serve
- Essential for page rendering
Understanding the Rolling Window
The time window is “rolling,” not fixed. This means:POST/PUT Rate Limiting
Write operations (POST, PUT) are more expensive than reads and require stricter limits:How Write Rate Limiting Works
-
Method Check:
(req.method == "POST" || req.method == "PUT")targets only write operations, allowing unlimited GET requests. -
Admin Exclusion:
(req.url !~ "^/admin")excludes admin paths, which may need different rate limiting rules or have their own authentication. -
Separate Counter: The key
"rw" + req.http.X-Client-IPcreates a separate rate limit counter:- The
"rw"prefix ensures write operations have their own bucket - A client could make 30 GET requests AND 5 POST requests in the same period
- Prevents read traffic from consuming write quota
- The
-
Stricter Limits:
5, 10s, 15smeans:- Only 5 write requests per 10-second window
- Block for 15 seconds after exceeding
- Much more restrictive than read limits (5 vs 30 requests)
- Different Penalty Window: The 10-second measurement window is shorter than the 15-second penalty, meaning clients stay blocked even after their offense window expires.
Why Separate Write Limits?
Write operations typically:- Consume more server resources (database writes, processing)
- Have security implications (CSRF, injection attacks)
- Are used in brute force attacks (login forms, API abuse)
- Should be rate-limited more aggressively than reads
Real-World Example
Consider a user browsing an e-commerce site:Advanced Rate Limiting Patterns
Per-Path Rate Limiting
Different endpoints may need different limits:Key Insights
- API Endpoints: Higher limits (100 req/60s) but longer penalty periods for legitimate API usage
- Login Protection: Very strict limits (5 req/60s) with long penalties (5 minutes) to prevent brute force
- Authentication-Based: Different limits for authenticated vs anonymous users
Combining Rate Limiting with Security Headers
Layer rate limiting with abuse scores and geographic filtering for enhanced protection, the last article in the series covers these headers in more detail:Dynamic Rate Limiting by Risk Level
You can create tiered rate limiting based on risk scores: Important: Thevsthrottle.is_denied() function requires literal values for time parameters. To implement tiered limits, use separate conditional blocks:
Tuning Rate Limits
Adjust parameters based on your application’s needs and traffic patterns:Finding the Right Limits
-
Baseline Analysis: Monitor your application logs to understand typical request patterns:
- Average requests per second per user
- Peak traffic during normal usage
- Page load request counts (including assets)
-
Conservative Start: Begin with generous limits and tighten based on observed abuse:
-
Test Legitimate Use Cases: Ensure normal user workflows don’t trigger limits:
- Single page application (SPA) initial load
- Form submissions with multiple validations
- Mobile app burst traffic on app open
Monitoring and Alerting
Track 429 responses in your logs to identify:- False positives (legitimate users being blocked)
- Attack patterns (sudden spikes in rate-limited requests)
- Effectiveness (reduction in backend load during attacks)
Rate Limiting Best Practices
-
Use X-Client-IP: The
X-Client-IPheader typically contains the true client IP, even behind proxies or load balancers on Upsun. -
Namespace Your Keys: Use prefixes like
"rw","api:", or"login:"to create separate rate limit buckets for different types of traffic. - Monitor and Adjust: Start with conservative limits and relax them based on legitimate traffic patterns.
-
Clear Error Messages: Tell users how long to wait and consider including support contact information:
- Document Your Limits: Provide API documentation that clearly states rate limits so developers can code accordingly.
-
Consider Business Logic: Don’t rate-limit so aggressively that it impacts revenue:
- E-commerce checkout flows should be permissive
- Payment endpoints need special consideration
- Customer support areas may need higher limits
-
Graceful Degradation: Consider returning cached content instead of blocking entirely: