A few years ago, I worked on a Rails app deployed to Heroku. We had public APIs handling device activation pings from Apple and Google. The kind of traffic where every phone in the wild could hit you at any moment, with no authentication, only a key per device. It worked fine until it didn’t. Some endpoints started getting hammered, and we needed a way to stop the abuse before it took everyone down. The natural answer would have been to drop a reverse proxy in front of Rails and rate limit there. The previous article in this series shows exactly how to do that with Varnish andDocumentation Index
Fetch the complete documentation index at: https://developer.upsun.com/llms.txt
Use this file to discover all available pages before exploring further.
vsthrottle. But on Heroku, you don’t get that layer. You get a router, a dyno, and your app. That’s it.
We did the next best thing: we rate limited inside the application.
Rack middleware: rate limiting before Rails wakes up
The tool we picked was rack-attack, a Rack middleware that runs ahead of your controllers, your views, and most of the Rails request lifecycle. Rack middleware sits at the bottom of the stack. A request comes in, hits the middleware chain, and rack-attack decides whether to let it through or return a 429 immediately. The controller never runs. Your business logic stays untouched.What rack-attack actually loads
Rack-attack runs early, but it doesn’t run in a vacuum. By the time the middleware kicks in, your Rails app is already booted. The database connection pool is alive. Models are loaded. Redis is connected. When rate limiting blocks a request, you’re not saving an entire process boot. You’re saving the controller, the queries, the rendering, and whatever else the request would have triggered. On a Ruby app where each worker process serves multiple requests through fibers, that still adds up. CPU stops getting burned on requests you didn’t want to serve. Memory pressure stays predictable. It’s not as cheap as filtering at the edge, but it’s a lot cheaper than letting the request reach the database.Configuration is plain Ruby
Rack-attack configures via Ruby blocks. You give it a name, a key extractor, a limit, and a window. Anything you can read from the request can become the key.req.ip. By user? Read the auth header. By a combination? Concatenate and hash. The block has access to the full request object, and the rules can be as expressive as you need.
The trade-off is that any change requires a deploy. There’s no live config. For a small set of stable rules, that’s fine.
Where you store the counters
Rack-attack needs somewhere to track how many requests each key has made. The two main options are an in-process memory cache or Redis. Both are valid. They optimize for different things. In-memory is the simplest path. No extra service to run, no network hop, counters live next to the code that reads them. It works well when you have a single process, or when each process serves a stable, well-defined slice of traffic. The catch is that each process keeps its own counter. Two dynos means two counters, and your effective limit doubles. Restart a process and the counters reset, which gives a fresh window to anyone patient enough to wait it out. Redis flips that trade-off. The counters live in one place, shared across every process, and they survive restarts. The price is the dependency: you need Redis reachable, and you pay a small network hop on every request the limiter inspects. We used Redis on the project I described, and never had a performance issue with it. Pick based on what you’re actually defending against. Soft caps for fairness on a single-instance app? In-memory is enough. Hard caps that need to hold across deploys, scale-outs, and restarts? Reach for Redis.What we actually capped, and why
The point of capping isn’t always to block attackers. Often it’s about quality of service. Your application has finite CPU. Every concurrent request takes a slice of it, and once the slices run out, every request slows down for everyone. Without rate limiting, one greedy client can monopolize that CPU and degrade the experience for the rest of your traffic. With rate limiting, that client gets pushed back to a fair share, and the other users keep getting fast responses. For us, the CPU pressure came from abusive clients hidden among the legitimate Apple and Google device activation requests. We capped the noisiest endpoints at around 10 requests per second per IP. Higher than what a real device would ever need, low enough that no single client could starve the rest of the stack. Downtime from these endpoints stopped. The other endpoints kept working through traffic spikes. Latency at the application level held steady. None of it was free, but it was the cheapest thing we could ship without owning the edge.What about authenticated traffic?
This is the case where edge rate limiting starts to struggle. Varnish doesn’t know who your users are. It can throttle by IP and by URL, but if you want to throttle by authenticated user, you need session data, token lookups, or sometimes a database query. That’s the application’s job. GitHub does this. Hit their API unauthenticated and you get 60 requests per hour. Authenticate and you get 5,000. That kind of tiered limit is hard to do at the edge without leaking auth state outside your application. If you’re building anything where the rate limit depends on who the user is, you want this layer.Rails 8 ships this out of the box
Rack-attack is still maintained, and it still works. If you’re on Rails 8, you don’t need it. Rails now ships its own rate limiting primitive in the framework, with the same kind of pluggable storage. Memory cache, Redis cache, whatever yourRails.cache is set to.
The API is smaller, and it covers most of what rack-attack does: