If your Next.js application struggles under load, crashes with 200+ concurrent users, or shows uneven CPU usage across PM2 instances, you’re likely experiencing event loop blocking. This guide explains what event loops are, why they matter, and how to implement monitoring to diagnose and fix performance bottlenecks.
Table of Contents
- Understanding Event Loops
- Why Event Loops Matter for Next.js
- Implementing Event Loop Monitoring
- Interpreting Monitoring Data
- Common Blocking Patterns and Fixes
- Advanced Monitoring Setup
Understanding Event Loops
What is an Event Loop?
Node.js (and by extension, Next.js) runs on a single thread. Yet it can handle thousands of concurrent connections efficiently. The event loop makes this possible.
Think of a restaurant kitchen with one chef. Instead of preparing one complete order before starting the next, the chef:
- Puts burger #1 on the grill
- While it cooks, starts burger #2
- While both cook, prepares fries
- Checks which items are done
- Serves completed orders
- Repeats
This is how Node.js works. The event loop continuously cycles through tasks, checking what’s ready to execute next.
The Event Loop Cycle
Key Point: The loop waits if a callback takes too long.
Non-Blocking vs Blocking Code
Non-Blocking (Good):
// Request 1
app.get('/api/users', async (req, res) => {
const users = await db.query('SELECT * FROM users');
res.json(users);
});
// Timeline:
// 0ms: Request 1 starts → db.query (async) → Event loop free
// 1ms: Request 2 arrives → handled immediately ✓
// 2ms: Request 3 arrives → handled immediately ✓
// 10ms: Request 1 db done → response sent ✓
Blocking (Bad):
// Request 1
app.get('/api/users', (req, res) => {
let result = [];
for (let i = 0; i < 10000000; i++) {
result.push(heavyCalculation(i));
}
res.json(result);
});
// Timeline:
// 0ms: Request 1 starts → heavy loop → Event loop BLOCKED
// 100ms: Request 2 arrives → WAITING ⏳
// 200ms: Request 3 arrives → WAITING ⏳
// 500ms: Request 1 finally done → Request 2 can now start
The blocking code prevents the event loop from processing other requests, creating a bottleneck even with available CPU capacity. The symptom will be a degraded experience for users while your resources are still not used 100%.
Why Event Loops Matter for Next.js
The PM2 Cluster Scenario
When running Next.js with PM2 in cluster mode, you typically have multiple worker processes:
If one worker’s event loop is blocked by expensive synchronous operations, that worker can’t handle new requests. PM2 continues sending it requests via round-robin distribution, but they queue up, causing timeouts and poor performance.
Symptoms of Event Loop Blocking
- One PM2 instance at 100% CPU, others underutilized
- Requests timing out despite available server resources
- Uneven request distribution across workers
- Application crashes under moderate load (200+ concurrent users)
- Response times increase dramatically under load
Implementing Event Loop Monitoring
Step 1: Create the Monitor
Create the monitoring module using Node.js’s built-in perf_hooks API to track event loop delay with high precision.
Create lib/monitoring/advancedEventLoopMonitor.js:
const { performance, PerformanceObserver } = require('perf_hooks');
const { monitorEventLoopDelay } = require('perf_hooks');
class AdvancedEventLoopMonitor {
constructor(options = {}) {
this.resolution = options.resolution || 10;
this.warningThreshold = options.warningThreshold || 50;
this.criticalThreshold = options.criticalThreshold || 100;
this.logInterval = options.logInterval || 30000;
this.histogram = monitorEventLoopDelay({ resolution: this.resolution });
this.histogram.enable();
this.startTime = Date.now();
this.requestCount = 0;
this.slowRequests = [];
this.startLogging();
this.setupProcessMetrics();
console.log('🔬 Advanced Event Loop Monitor initialized');
}
startLogging() {
this.logIntervalId = setInterval(() => {
this.logDetailedStats();
}, this.logInterval);
}
setupProcessMetrics() {
try {
if (global.gc) {
const gcStats = { count: 0, totalDuration: 0 };
const obs = new PerformanceObserver((list) => {
const entries = list.getEntries();
entries.forEach((entry) => {
if (entry.entryType === 'gc') {
gcStats.count++;
gcStats.totalDuration += entry.duration;
if (entry.duration > 100) {
console.warn(`⚠️ Long GC pause: ${entry.duration.toFixed(2)}ms`);
}
}
});
});
obs.observe({ entryTypes: ['gc'] });
this.gcStats = gcStats;
}
} catch (e) {
console.log('GC monitoring not available (run with --expose-gc flag)');
}
}
logDetailedStats() {
const stats = this.getDetailedStats();
const status = stats.p99 > this.criticalThreshold ? '🔴 CRITICAL' :
stats.p95 > this.warningThreshold ? '🟡 WARNING' :
'🟢 HEALTHY';
console.log('\n' + '='.repeat(60));
console.log(`${status} Event Loop Health Report`);
console.log('='.repeat(60));
console.log(`Instance: ${process.env.INSTANCE_ID || process.pid}`);
console.log(`Uptime: ${this.formatUptime(Date.now() - this.startTime)}`);
console.log(`\nEvent Loop Delay (ms):`);
console.log(` Min: ${stats.min.toFixed(2)}ms`);
console.log(` Mean: ${stats.mean.toFixed(2)}ms`);
console.log(` Max: ${stats.max.toFixed(2)}ms`);
console.log(` P50: ${stats.p50.toFixed(2)}ms`);
console.log(` P95: ${stats.p95.toFixed(2)}ms`);
console.log(` P99: ${stats.p99.toFixed(2)}ms`);
console.log(` StdDev: ${stats.stddev.toFixed(2)}ms`);
console.log(`\nMemory:`);
const mem = process.memoryUsage();
console.log(` RSS: ${(mem.rss / 1024 / 1024).toFixed(2)} MB`);
console.log(` Heap Used: ${(mem.heapUsed / 1024 / 1024).toFixed(2)} MB`);
console.log(` Heap Total: ${(mem.heapTotal / 1024 / 1024).toFixed(2)} MB`);
console.log(` External: ${(mem.external / 1024 / 1024).toFixed(2)} MB`);
if (this.gcStats) {
console.log(`\nGarbage Collection:`);
console.log(` Count: ${this.gcStats.count}`);
console.log(` Total Time: ${this.gcStats.totalDuration.toFixed(2)}ms`);
console.log(` Avg Per GC: ${(this.gcStats.totalDuration / this.gcStats.count || 0).toFixed(2)}ms`);
}
console.log(`\nRequests Processed: ${this.requestCount}`);
if (this.slowRequests.length > 0) {
console.log(`\n⚠️ Slow Requests (last ${this.slowRequests.length}):`);
this.slowRequests.slice(-5).forEach(req => {
console.log(` ${req.method} ${req.url} - ${req.duration.toFixed(2)}ms - ${req.timestamp}`);
});
}
console.log('='.repeat(60) + '\n');
this.histogram.reset();
if (this.slowRequests.length > 100) {
this.slowRequests = this.slowRequests.slice(-50);
}
}
getDetailedStats() {
return {
min: this.histogram.min / 1e6,
max: this.histogram.max / 1e6,
mean: this.histogram.mean / 1e6,
stddev: this.histogram.stddev / 1e6,
p50: this.histogram.percentile(50) / 1e6,
p95: this.histogram.percentile(95) / 1e6,
p99: this.histogram.percentile(99) / 1e6,
p999: this.histogram.percentile(99.9) / 1e6
};
}
formatUptime(ms) {
const seconds = Math.floor(ms / 1000);
const minutes = Math.floor(seconds / 60);
const hours = Math.floor(minutes / 60);
const days = Math.floor(hours / 24);
if (days > 0) return `${days}d ${hours % 24}h`;
if (hours > 0) return `${hours}h ${minutes % 60}m`;
if (minutes > 0) return `${minutes}m ${seconds % 60}s`;
return `${seconds}s`;
}
trackRequest(method, url, duration) {
this.requestCount++;
if (duration > 1000) {
this.slowRequests.push({
method,
url,
duration,
timestamp: new Date().toISOString()
});
}
}
getStats() {
return this.getDetailedStats();
}
stop() {
if (this.logIntervalId) {
clearInterval(this.logIntervalId);
}
this.histogram.disable();
console.log('🔬 Event Loop Monitor stopped');
}
}
module.exports = AdvancedEventLoopMonitor;
Step 2: Initialize on Server Start
Next.js 13+ uses the instrumentation hook for server initialization. Create instrumentation.js in your project root:
export async function register() {
if (process.env.NEXT_RUNTIME === 'nodejs') {
const AdvancedEventLoopMonitor = require('./lib/monitoring/advancedEventLoopMonitor');
global.eventLoopMonitor = new AdvancedEventLoopMonitor({
resolution: 10,
warningThreshold: 50,
criticalThreshold: 100,
logInterval: 30000
});
console.log('✅ Event loop monitoring initialized');
}
}
Enable instrumentation in next.config.js:
/** @type {import('next').NextConfig} */
const nextConfig = {
experimental: {
instrumentationHook: true,
},
};
module.exports = nextConfig;
Step 3: Create Health Check Endpoint
Create app/api/monitoring/health/route.js:
export async function GET(request) {
try {
const monitor = global.eventLoopMonitor;
if (!monitor) {
return Response.json(
{ error: 'Monitor not initialized' },
{ status: 503 }
);
}
const stats = monitor.getStats();
const memory = process.memoryUsage();
const health = {
status: stats.p99 > 100 ? 'critical' :
stats.p95 > 50 ? 'warning' : 'healthy',
instance: process.env.INSTANCE_ID || process.pid,
uptime: process.uptime(),
eventLoop: {
min: parseFloat(stats.min.toFixed(2)),
mean: parseFloat(stats.mean.toFixed(2)),
max: parseFloat(stats.max.toFixed(2)),
p50: parseFloat(stats.p50.toFixed(2)),
p95: parseFloat(stats.p95.toFixed(2)),
p99: parseFloat(stats.p99.toFixed(2)),
},
memory: {
rss: Math.round(memory.rss / 1024 / 1024),
heapUsed: Math.round(memory.heapUsed / 1024 / 1024),
heapTotal: Math.round(memory.heapTotal / 1024 / 1024),
external: Math.round(memory.external / 1024 / 1024),
},
timestamp: new Date().toISOString()
};
return Response.json(health);
} catch (error) {
return Response.json(
{ error: error.message },
{ status: 500 }
);
}
}
Step 4: Create Request Tracking Middleware
Create middleware.js in your project root:
import { NextResponse } from 'next/server';
export function middleware(request) {
const start = Date.now();
const response = NextResponse.next();
response.headers.set('X-Request-Start', start.toString());
return response;
}
export const config = {
matcher: '/api/:path*',
};
Create app/api/[...route]/route.js wrapper to track completion:
export async function GET(request) {
const start = Date.now();
try {
// Your API logic here
const response = await yourApiHandler(request);
const duration = Date.now() - start;
if (global.eventLoopMonitor) {
global.eventLoopMonitor.trackRequest(
'GET',
request.url,
duration
);
}
return response;
} catch (error) {
const duration = Date.now() - start;
if (global.eventLoopMonitor) {
global.eventLoopMonitor.trackRequest(
'GET',
request.url,
duration
);
}
throw error;
}
}
Interpreting Monitoring Data
Understanding the Metrics
The monitor tracks several key metrics:
Event Loop Delay Percentiles:
- P50 (Median): Half of all event loop iterations complete faster than this value
- P95: 95% of iterations complete faster than this value
- P99: 99% of iterations complete faster than this value
Target Values:
- P50: < 10ms (excellent), 10-25ms (good), > 25ms (investigate)
- P95: < 50ms (excellent), 50-100ms (acceptable), > 100ms (warning)
- P99: < 100ms (excellent), 100-250ms (warning), > 250ms (critical)
Reading the Health Report
🟢 HEALTHY Event Loop Health Report
============================================================
Instance: worker-1 (PID: 12345)
Uptime: 2h 34m
Event Loop Delay (ms):
Min: 0.05ms ← Best case scenario
Mean: 8.23ms ← Average delay (good)
Max: 156.42ms ← Worst case (occasional spikes OK)
P50: 4.12ms ← 50% of loops finish this fast
P95: 32.45ms ← 95% of loops finish this fast
P99: 78.91ms ← 99% of loops finish this fast ✓
StdDev: 12.34ms ← Consistency (lower is better)
Health Status Interpretation:
🟢 HEALTHY: P99 < 100ms
- Application responding well
- Event loop processing efficiently
- No immediate action needed
🟡 WARNING: P95 > 50ms or P99 100-250ms
- Event loop experiencing delays
- Investigate recent code changes
- Review slow requests log
- Consider optimization
🔴 CRITICAL: P99 > 250ms
- Event loop heavily blocked
- User experience degraded
- Immediate action required
- Check for CPU-intensive operations
Real-World Example Analysis
Good Performance:
Event Loop Delay (ms):
P50: 3.21ms
P95: 18.45ms
P99: 42.33ms
This shows consistent, fast event loop processing. The application handles load well.
Warning Signs:
Event Loop Delay (ms):
P50: 12.45ms
P95: 156.78ms
P99: 342.11ms
High variance between P50 and P99 indicates sporadic blocking operations. Investigate slow requests.
Critical Issues:
Event Loop Delay (ms):
P50: 45.23ms
P95: 523.45ms
P99: 1234.56ms
Consistently high delays across all percentiles indicate systemic blocking issues. Check for synchronous database operations or heavy computation.
Common Blocking Patterns and Fixes
1. Large JSON Parsing
❌ Blocking:
export default function handler(req, res) {
const data = JSON.parse(largeJsonString); // Blocks event loop
res.json(data);
}
✅ Non-Blocking:
import { Worker } from 'worker_threads';
export default async function handler(req, res) {
const worker = new Worker('./workers/json-parser.js');
worker.postMessage(largeJsonString);
const data = await new Promise((resolve, reject) => {
worker.on('message', resolve);
worker.on('error', reject);
});
res.json(data);
}
2. Synchronous File Operations
❌ Blocking:
import fs from 'fs';
export default function handler(req, res) {
const data = fs.readFileSync('./large-file.json', 'utf8');
res.send(data);
}
✅ Non-Blocking:
import { readFile } from 'fs/promises';
export default async function handler(req, res) {
const data = await readFile('./large-file.json', 'utf8');
res.send(data);
}
3. Complex Array Operations
❌ Blocking:
export default function handler(req, res) {
const results = largeArray.map(item => {
return expensiveOperation(item);
});
res.json(results);
}
✅ Non-Blocking:
export default async function handler(req, res) {
const results = await Promise.all(
largeArray.map(async item => {
return await expensiveOperationAsync(item);
})
);
res.json(results);
}
Or batch process:
export default async function handler(req, res) {
const BATCH_SIZE = 100;
const results = [];
for (let i = 0; i < largeArray.length; i += BATCH_SIZE) {
const batch = largeArray.slice(i, i + BATCH_SIZE);
const batchResults = await Promise.all(
batch.map(item => expensiveOperationAsync(item))
);
results.push(...batchResults);
// Allow event loop to process other requests
await new Promise(resolve => setImmediate(resolve));
}
res.json(results);
}
4. Database Queries Without Connection Pooling
❌ Blocking:
// Creating new connection each time
export default async function handler(req, res) {
const client = await createConnection();
const result = await client.query('SELECT * FROM users');
await client.close();
res.json(result);
}
✅ Non-Blocking:
// Use connection pool
import { pool } from '@/lib/db';
export default async function handler(req, res) {
const result = await pool.query('SELECT * FROM users');
res.json(result);
}
Advanced Monitoring Setup
Real-Time Dashboard Script
Create scripts/monitor-dashboard.sh. Don’t forget to adapt the pm2 application name. You will also need the jq utility on your system.
#!/bin/bash
while true; do
clear
echo "=== Next.js Event Loop Monitoring ==="
echo "Updated: $(date)"
echo ""
# PM2 Status
echo "PM2 Instances:"
pm2 list | grep next-app
echo ""
# Health Check All Instances
echo "Health Status:"
for i in {1..10}; do
curl -s http://localhost:3000/api/monitoring/health 2>/dev/null | \
jq -r '"\(.instance): P99=\(.eventLoop.p99)ms \(.status)"' || \
echo "Instance not responding"
done
echo ""
echo "Press Ctrl+C to exit"
sleep 5
done
Make it executable:
chmod +x scripts/monitor-dashboard.sh
./scripts/monitor-dashboard.sh
Load Test Monitoring
Create scripts/load-test-monitor.sh:
#!/bin/bash
OUTPUT_DIR="./load-test-results/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$OUTPUT_DIR"
echo "timestamp,instance,p95,p99,heap_mb" > "$OUTPUT_DIR/metrics.csv"
echo "Starting load test monitoring..."
echo "Press Ctrl+C when test is complete"
while true; do
timestamp=$(date +%s)
# Collect metrics from health endpoint
curl -s http://localhost:3000/api/monitoring/health 2>/dev/null | \
jq -r --arg ts "$timestamp" \
'[$ts, .instance, .eventLoop.p95, .eventLoop.p99, .memory.heapUsed] | @csv' \
>> "$OUTPUT_DIR/metrics.csv"
sleep 2
done
Run alongside your load test:
# Terminal 1: Start monitoring
./scripts/load-test-monitor.sh
# Terminal 2: Run your load test (locust example but use whatever you prefer)
locust -f loadtest.py --host=http://localhost:3000
Analyzing Results
After your load test, analyze the collected data:
# View summary statistics
cat load-test-results/*/metrics.csv | \
awk -F',' 'NR>1 {sum+=$4; count++; if($4>max) max=$4}
END {print "Avg P99:", sum/count, "ms\nMax P99:", max, "ms"}'
# Find instances with high P99
cat load-test-results/*/metrics.csv | \
awk -F',' 'NR>1 && $4>100 {print $2, $4}' | \
sort -k2 -rn | \
head -10
Best Practices
1. Set Appropriate Thresholds
Adjust thresholds based on your application:
const monitor = new AdvancedEventLoopMonitor({
warningThreshold: 30, // Stricter for high-performance apps
criticalThreshold: 75,
logInterval: 60000 // Less frequent for production
});
2. Monitor in Staging First
Test your monitoring setup in a staging environment before production deployment to:
- Verify thresholds are appropriate
- Ensure logging doesn’t impact performance
- Validate alerting mechanisms
Event loop monitoring complements Application Performance Monitoring tools like Blackfire.io:
- Use event loop monitoring to identify blocking operations
- Use APM for distributed tracing and end-to-end monitoring
- Correlate event loop delays with external service latencies
Schedule monthly performance reviews:
- Analyze P99 trends over time
- Identify endpoints with degrading performance
- Review and optimize slow requests
- Update monitoring thresholds as needed
Conclusion
Event loop monitoring is essential for building performant Next.js applications at scale. By implementing the monitoring system described in this guide, you can:
- Identify bottlenecks before they impact users
- Optimize critical paths with data-driven insights
- Scale confidently knowing your application’s limits
- Diagnose issues quickly with detailed metrics
Remember: The event loop is the heartbeat of your Node.js application. Keep it healthy, and your application will scale smoothly.
Additional Resources
Ready to deploy your optimized Next.js application? Create a free Upsun account to get instant preview environments, Git-driven infrastructure, and built-in observability tools for production-ready deployments. Last modified on April 14, 2026