Event Loop Monitoring in Next.js: Performance Guide

If your Next.js application struggles under load, crashes with 200+ concurrent users, or shows uneven CPU usage across PM2 instances, you’re likely experiencing event loop blocking. This guide explains what event loops are, why they matter, and how to implement monitoring to diagnose and fix performance bottlenecks.

Understanding Event Loops
Why Event Loops Matter for Next.js
Implementing Event Loop Monitoring
Interpreting Monitoring Data
Common Blocking Patterns and Fixes
Advanced Monitoring Setup

Understanding Event Loops

What is an Event Loop?

Node.js (and by extension, Next.js) runs on a single thread. Yet it can handle thousands of concurrent connections efficiently. The event loop makes this possible. Think of a restaurant kitchen with one chef. Instead of preparing one complete order before starting the next, the chef:

Puts burger #1 on the grill
While it cooks, starts burger #2
While both cook, prepares fries
Checks which items are done
Serves completed orders
Repeats

This is how Node.js works. The event loop continuously cycles through tasks, checking what’s ready to execute next.

The Event Loop Cycle

Key Point: The loop waits if a callback takes too long.

Non-Blocking vs Blocking Code

Non-Blocking (Good):

// Request 1
app.get('/api/users', async (req, res) => {
  const users = await db.query('SELECT * FROM users');
  res.json(users);
});

// Timeline:
// 0ms:  Request 1 starts → db.query (async) → Event loop free
// 1ms:  Request 2 arrives → handled immediately ✓
// 2ms:  Request 3 arrives → handled immediately ✓
// 10ms: Request 1 db done → response sent ✓

Blocking (Bad):

// Request 1
app.get('/api/users', (req, res) => {
  let result = [];
  for (let i = 0; i < 10000000; i++) {
    result.push(heavyCalculation(i));
  }
  res.json(result);
});

// Timeline:
// 0ms:   Request 1 starts → heavy loop → Event loop BLOCKED
// 100ms: Request 2 arrives → WAITING ⏳
// 200ms: Request 3 arrives → WAITING ⏳
// 500ms: Request 1 finally done → Request 2 can now start

The blocking code prevents the event loop from processing other requests, creating a bottleneck even with available CPU capacity. The symptom will be a degraded experience for users while your resources are still not used 100%.

Why Event Loops Matter for Next.js

The PM2 Cluster Scenario

When running Next.js with PM2 in cluster mode, you typically have multiple worker processes: If one worker’s event loop is blocked by expensive synchronous operations, that worker can’t handle new requests. PM2 continues sending it requests via round-robin distribution, but they queue up, causing timeouts and poor performance.

Symptoms of Event Loop Blocking

One PM2 instance at 100% CPU, others underutilized
Requests timing out despite available server resources
Uneven request distribution across workers
Application crashes under moderate load (200+ concurrent users)
Response times increase dramatically under load

Implementing Event Loop Monitoring

Step 1: Create the Monitor

Create the monitoring module using Node.js’s built-in perf_hooks API to track event loop delay with high precision. Create lib/monitoring/advancedEventLoopMonitor.js:

const { performance, PerformanceObserver } = require('perf_hooks');
const { monitorEventLoopDelay } = require('perf_hooks');

class AdvancedEventLoopMonitor {
  constructor(options = {}) {
    this.resolution = options.resolution || 10;
    this.warningThreshold = options.warningThreshold || 50;
    this.criticalThreshold = options.criticalThreshold || 100;
    this.logInterval = options.logInterval || 30000;
    
    this.histogram = monitorEventLoopDelay({ resolution: this.resolution });
    this.histogram.enable();
    
    this.startTime = Date.now();
    this.requestCount = 0;
    this.slowRequests = [];
    
    this.startLogging();
    this.setupProcessMetrics();
    
    console.log('🔬 Advanced Event Loop Monitor initialized');
  }

  startLogging() {
    this.logIntervalId = setInterval(() => {
      this.logDetailedStats();
    }, this.logInterval);
  }

  setupProcessMetrics() {
    try {
      if (global.gc) {
        const gcStats = { count: 0, totalDuration: 0 };
        
        const obs = new PerformanceObserver((list) => {
          const entries = list.getEntries();
          entries.forEach((entry) => {
            if (entry.entryType === 'gc') {
              gcStats.count++;
              gcStats.totalDuration += entry.duration;
              
              if (entry.duration > 100) {
                console.warn(`⚠️  Long GC pause: ${entry.duration.toFixed(2)}ms`);
              }
            }
          });
        });
        
        obs.observe({ entryTypes: ['gc'] });
        this.gcStats = gcStats;
      }
    } catch (e) {
      console.log('GC monitoring not available (run with --expose-gc flag)');
    }
  }

  logDetailedStats() {
    const stats = this.getDetailedStats();
    
    const status = stats.p99 > this.criticalThreshold ? '🔴 CRITICAL' :
                   stats.p95 > this.warningThreshold ? '🟡 WARNING' :
                   '🟢 HEALTHY';
    
    console.log('\n' + '='.repeat(60));
    console.log(`${status} Event Loop Health Report`);
    console.log('='.repeat(60));
    console.log(`Instance: ${process.env.INSTANCE_ID || process.pid}`);
    console.log(`Uptime: ${this.formatUptime(Date.now() - this.startTime)}`);
    console.log(`\nEvent Loop Delay (ms):`);
    console.log(`  Min:    ${stats.min.toFixed(2)}ms`);
    console.log(`  Mean:   ${stats.mean.toFixed(2)}ms`);
    console.log(`  Max:    ${stats.max.toFixed(2)}ms`);
    console.log(`  P50:    ${stats.p50.toFixed(2)}ms`);
    console.log(`  P95:    ${stats.p95.toFixed(2)}ms`);
    console.log(`  P99:    ${stats.p99.toFixed(2)}ms`);
    console.log(`  StdDev: ${stats.stddev.toFixed(2)}ms`);
    
    console.log(`\nMemory:`);
    const mem = process.memoryUsage();
    console.log(`  RSS:        ${(mem.rss / 1024 / 1024).toFixed(2)} MB`);
    console.log(`  Heap Used:  ${(mem.heapUsed / 1024 / 1024).toFixed(2)} MB`);
    console.log(`  Heap Total: ${(mem.heapTotal / 1024 / 1024).toFixed(2)} MB`);
    console.log(`  External:   ${(mem.external / 1024 / 1024).toFixed(2)} MB`);
    
    if (this.gcStats) {
      console.log(`\nGarbage Collection:`);
      console.log(`  Count:        ${this.gcStats.count}`);
      console.log(`  Total Time:   ${this.gcStats.totalDuration.toFixed(2)}ms`);
      console.log(`  Avg Per GC:   ${(this.gcStats.totalDuration / this.gcStats.count || 0).toFixed(2)}ms`);
    }
    
    console.log(`\nRequests Processed: ${this.requestCount}`);
    
    if (this.slowRequests.length > 0) {
      console.log(`\n⚠️  Slow Requests (last ${this.slowRequests.length}):`);
      this.slowRequests.slice(-5).forEach(req => {
        console.log(`  ${req.method} ${req.url} - ${req.duration.toFixed(2)}ms - ${req.timestamp}`);
      });
    }
    
    console.log('='.repeat(60) + '\n');
    
    this.histogram.reset();
    
    if (this.slowRequests.length > 100) {
      this.slowRequests = this.slowRequests.slice(-50);
    }
  }

  getDetailedStats() {
    return {
      min: this.histogram.min / 1e6,
      max: this.histogram.max / 1e6,
      mean: this.histogram.mean / 1e6,
      stddev: this.histogram.stddev / 1e6,
      p50: this.histogram.percentile(50) / 1e6,
      p95: this.histogram.percentile(95) / 1e6,
      p99: this.histogram.percentile(99) / 1e6,
      p999: this.histogram.percentile(99.9) / 1e6
    };
  }

  formatUptime(ms) {
    const seconds = Math.floor(ms / 1000);
    const minutes = Math.floor(seconds / 60);
    const hours = Math.floor(minutes / 60);
    const days = Math.floor(hours / 24);
    
    if (days > 0) return `${days}d ${hours % 24}h`;
    if (hours > 0) return `${hours}h ${minutes % 60}m`;
    if (minutes > 0) return `${minutes}m ${seconds % 60}s`;
    return `${seconds}s`;
  }

  trackRequest(method, url, duration) {
    this.requestCount++;
    
    if (duration > 1000) {
      this.slowRequests.push({
        method,
        url,
        duration,
        timestamp: new Date().toISOString()
      });
    }
  }

  getStats() {
    return this.getDetailedStats();
  }

  stop() {
    if (this.logIntervalId) {
      clearInterval(this.logIntervalId);
    }
    this.histogram.disable();
    console.log('🔬 Event Loop Monitor stopped');
  }
}

module.exports = AdvancedEventLoopMonitor;

Step 2: Initialize on Server Start

Next.js 13+ uses the instrumentation hook for server initialization. Create instrumentation.js in your project root:

export async function register() {
  if (process.env.NEXT_RUNTIME === 'nodejs') {
    const AdvancedEventLoopMonitor = require('./lib/monitoring/advancedEventLoopMonitor');
    
    global.eventLoopMonitor = new AdvancedEventLoopMonitor({
      resolution: 10,
      warningThreshold: 50,
      criticalThreshold: 100,
      logInterval: 30000
    });
    
    console.log('✅ Event loop monitoring initialized');
  }
}

Enable instrumentation in next.config.js:

/** @type {import('next').NextConfig} */
const nextConfig = {
  experimental: {
    instrumentationHook: true,
  },
};

module.exports = nextConfig;

Step 3: Create Health Check Endpoint

Create app/api/monitoring/health/route.js:

export async function GET(request) {
  try {
    const monitor = global.eventLoopMonitor;
    
    if (!monitor) {
      return Response.json(
        { error: 'Monitor not initialized' },
        { status: 503 }
      );
    }

    const stats = monitor.getStats();
    const memory = process.memoryUsage();
    
    const health = {
      status: stats.p99 > 100 ? 'critical' : 
              stats.p95 > 50 ? 'warning' : 'healthy',
      instance: process.env.INSTANCE_ID || process.pid,
      uptime: process.uptime(),
      eventLoop: {
        min: parseFloat(stats.min.toFixed(2)),
        mean: parseFloat(stats.mean.toFixed(2)),
        max: parseFloat(stats.max.toFixed(2)),
        p50: parseFloat(stats.p50.toFixed(2)),
        p95: parseFloat(stats.p95.toFixed(2)),
        p99: parseFloat(stats.p99.toFixed(2)),
      },
      memory: {
        rss: Math.round(memory.rss / 1024 / 1024),
        heapUsed: Math.round(memory.heapUsed / 1024 / 1024),
        heapTotal: Math.round(memory.heapTotal / 1024 / 1024),
        external: Math.round(memory.external / 1024 / 1024),
      },
      timestamp: new Date().toISOString()
    };

    return Response.json(health);
  } catch (error) {
    return Response.json(
      { error: error.message },
      { status: 500 }
    );
  }
}

Step 4: Create Request Tracking Middleware

Create middleware.js in your project root:

import { NextResponse } from 'next/server';

export function middleware(request) {
  const start = Date.now();
  
  const response = NextResponse.next();
  
  response.headers.set('X-Request-Start', start.toString());
  
  return response;
}

export const config = {
  matcher: '/api/:path*',
};

Create app/api/[...route]/route.js wrapper to track completion:

export async function GET(request) {
  const start = Date.now();
  
  try {
    // Your API logic here
    const response = await yourApiHandler(request);
    
    const duration = Date.now() - start;
    
    if (global.eventLoopMonitor) {
      global.eventLoopMonitor.trackRequest(
        'GET',
        request.url,
        duration
      );
    }
    
    return response;
  } catch (error) {
    const duration = Date.now() - start;
    
    if (global.eventLoopMonitor) {
      global.eventLoopMonitor.trackRequest(
        'GET',
        request.url,
        duration
      );
    }
    
    throw error;
  }
}

Interpreting Monitoring Data

Understanding the Metrics

The monitor tracks several key metrics: Event Loop Delay Percentiles:

P50 (Median): Half of all event loop iterations complete faster than this value
P95: 95% of iterations complete faster than this value
P99: 99% of iterations complete faster than this value

Target Values:

P50: < 10ms (excellent), 10-25ms (good), > 25ms (investigate)
P95: < 50ms (excellent), 50-100ms (acceptable), > 100ms (warning)
P99: < 100ms (excellent), 100-250ms (warning), > 250ms (critical)

Reading the Health Report

🟢 HEALTHY Event Loop Health Report
============================================================
Instance: worker-1 (PID: 12345)
Uptime: 2h 34m

Event Loop Delay (ms):
  Min:    0.05ms     ← Best case scenario
  Mean:   8.23ms     ← Average delay (good)
  Max:    156.42ms   ← Worst case (occasional spikes OK)
  P50:    4.12ms     ← 50% of loops finish this fast
  P95:    32.45ms    ← 95% of loops finish this fast
  P99:    78.91ms    ← 99% of loops finish this fast ✓
  StdDev: 12.34ms    ← Consistency (lower is better)

Health Status Interpretation: 🟢 HEALTHY: P99 < 100ms

Application responding well
Event loop processing efficiently
No immediate action needed

🟡 WARNING: P95 > 50ms or P99 100-250ms

Event loop experiencing delays
Investigate recent code changes
Review slow requests log
Consider optimization

🔴 CRITICAL: P99 > 250ms

Event loop heavily blocked
User experience degraded
Immediate action required
Check for CPU-intensive operations

Real-World Example Analysis

Good Performance:

Event Loop Delay (ms):
  P50:    3.21ms
  P95:    18.45ms
  P99:    42.33ms

This shows consistent, fast event loop processing. The application handles load well. Warning Signs:

Event Loop Delay (ms):
  P50:    12.45ms
  P95:    156.78ms
  P99:    342.11ms

High variance between P50 and P99 indicates sporadic blocking operations. Investigate slow requests. Critical Issues:

Event Loop Delay (ms):
  P50:    45.23ms
  P95:    523.45ms
  P99:    1234.56ms

Consistently high delays across all percentiles indicate systemic blocking issues. Check for synchronous database operations or heavy computation.

Common Blocking Patterns and Fixes

1. Large JSON Parsing

❌ Blocking:

export default function handler(req, res) {
  const data = JSON.parse(largeJsonString); // Blocks event loop
  res.json(data);
}

✅ Non-Blocking:

import { Worker } from 'worker_threads';

export default async function handler(req, res) {
  const worker = new Worker('./workers/json-parser.js');
  
  worker.postMessage(largeJsonString);
  
  const data = await new Promise((resolve, reject) => {
    worker.on('message', resolve);
    worker.on('error', reject);
  });
  
  res.json(data);
}

2. Synchronous File Operations

❌ Blocking:

import fs from 'fs';

export default function handler(req, res) {
  const data = fs.readFileSync('./large-file.json', 'utf8');
  res.send(data);
}

✅ Non-Blocking:

import { readFile } from 'fs/promises';

export default async function handler(req, res) {
  const data = await readFile('./large-file.json', 'utf8');
  res.send(data);
}

3. Complex Array Operations

❌ Blocking:

export default function handler(req, res) {
  const results = largeArray.map(item => {
    return expensiveOperation(item);
  });
  res.json(results);
}

✅ Non-Blocking:

export default async function handler(req, res) {
  const results = await Promise.all(
    largeArray.map(async item => {
      return await expensiveOperationAsync(item);
    })
  );
  res.json(results);
}

Or batch process:

export default async function handler(req, res) {
  const BATCH_SIZE = 100;
  const results = [];
  
  for (let i = 0; i < largeArray.length; i += BATCH_SIZE) {
    const batch = largeArray.slice(i, i + BATCH_SIZE);
    const batchResults = await Promise.all(
      batch.map(item => expensiveOperationAsync(item))
    );
    results.push(...batchResults);
    
    // Allow event loop to process other requests
    await new Promise(resolve => setImmediate(resolve));
  }
  
  res.json(results);
}

4. Database Queries Without Connection Pooling

❌ Blocking:

// Creating new connection each time
export default async function handler(req, res) {
  const client = await createConnection();
  const result = await client.query('SELECT * FROM users');
  await client.close();
  res.json(result);
}

✅ Non-Blocking:

// Use connection pool
import { pool } from '@/lib/db';

export default async function handler(req, res) {
  const result = await pool.query('SELECT * FROM users');
  res.json(result);
}

Advanced Monitoring Setup

Real-Time Dashboard Script

Create scripts/monitor-dashboard.sh. Don’t forget to adapt the pm2 application name. You will also need the jq utility on your system.

#!/bin/bash

while true; do
  clear
  echo "=== Next.js Event Loop Monitoring ==="
  echo "Updated: $(date)"
  echo ""
  
  # PM2 Status
  echo "PM2 Instances:"
  pm2 list | grep next-app
  echo ""
  
  # Health Check All Instances
  echo "Health Status:"
  for i in {1..10}; do
    curl -s http://localhost:3000/api/monitoring/health 2>/dev/null | \
      jq -r '"\(.instance): P99=\(.eventLoop.p99)ms \(.status)"' || \
      echo "Instance not responding"
  done
  
  echo ""
  echo "Press Ctrl+C to exit"
  sleep 5
done

Make it executable:

chmod +x scripts/monitor-dashboard.sh
./scripts/monitor-dashboard.sh

Load Test Monitoring

Create scripts/load-test-monitor.sh:

#!/bin/bash

OUTPUT_DIR="./load-test-results/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$OUTPUT_DIR"

echo "timestamp,instance,p95,p99,heap_mb" > "$OUTPUT_DIR/metrics.csv"

echo "Starting load test monitoring..."
echo "Press Ctrl+C when test is complete"

while true; do
  timestamp=$(date +%s)
  
  # Collect metrics from health endpoint
  curl -s http://localhost:3000/api/monitoring/health 2>/dev/null | \
    jq -r --arg ts "$timestamp" \
    '[$ts, .instance, .eventLoop.p95, .eventLoop.p99, .memory.heapUsed] | @csv' \
    >> "$OUTPUT_DIR/metrics.csv"
  
  sleep 2
done

Run alongside your load test:

# Terminal 1: Start monitoring
./scripts/load-test-monitor.sh

# Terminal 2: Run your load test (locust example but use whatever you prefer)
locust -f loadtest.py --host=http://localhost:3000

Analyzing Results

After your load test, analyze the collected data:

# View summary statistics
cat load-test-results/*/metrics.csv | \
  awk -F',' 'NR>1 {sum+=$4; count++; if($4>max) max=$4} 
             END {print "Avg P99:", sum/count, "ms\nMax P99:", max, "ms"}'

# Find instances with high P99
cat load-test-results/*/metrics.csv | \
  awk -F',' 'NR>1 && $4>100 {print $2, $4}' | \
  sort -k2 -rn | \
  head -10

Best Practices

1. Set Appropriate Thresholds

Adjust thresholds based on your application:

const monitor = new AdvancedEventLoopMonitor({
  warningThreshold: 30,   // Stricter for high-performance apps
  criticalThreshold: 75,
  logInterval: 60000      // Less frequent for production
});

2. Monitor in Staging First

Test your monitoring setup in a staging environment before production deployment to:

Verify thresholds are appropriate
Ensure logging doesn’t impact performance
Validate alerting mechanisms

3. Combine with APM Tools

Event loop monitoring complements Application Performance Monitoring tools like Blackfire.io:

Use event loop monitoring to identify blocking operations
Use APM for distributed tracing and end-to-end monitoring
Correlate event loop delays with external service latencies

4. Regular Performance Audits

Schedule monthly performance reviews:

Analyze P99 trends over time
Identify endpoints with degrading performance
Review and optimize slow requests
Update monitoring thresholds as needed

Conclusion

Event loop monitoring is essential for building performant Next.js applications at scale. By implementing the monitoring system described in this guide, you can:

Identify bottlenecks before they impact users
Optimize critical paths with data-driven insights
Scale confidently knowing your application’s limits
Diagnose issues quickly with detailed metrics

Remember: The event loop is the heartbeat of your Node.js application. Keep it healthy, and your application will scale smoothly.

Additional Resources

Ready to deploy your optimized Next.js application? Create a free Upsun account to get instant preview environments, Git-driven infrastructure, and built-in observability tools for production-ready deployments.

Articles

Event Loop Monitoring in Next.js: Performance Guide

Table of Contents

Understanding Event Loops

What is an Event Loop?

The Event Loop Cycle

Non-Blocking vs Blocking Code

Why Event Loops Matter for Next.js

The PM2 Cluster Scenario

Symptoms of Event Loop Blocking

Implementing Event Loop Monitoring

Step 1: Create the Monitor

Step 2: Initialize on Server Start

Step 3: Create Health Check Endpoint

Step 4: Create Request Tracking Middleware

Interpreting Monitoring Data

Understanding the Metrics

Reading the Health Report

Real-World Example Analysis

Common Blocking Patterns and Fixes

1. Large JSON Parsing

2. Synchronous File Operations

3. Complex Array Operations

4. Database Queries Without Connection Pooling

Advanced Monitoring Setup

Real-Time Dashboard Script

Load Test Monitoring

Analyzing Results

Best Practices

1. Set Appropriate Thresholds

2. Monitor in Staging First

3. Combine with APM Tools

4. Regular Performance Audits

Conclusion

Additional Resources

Articles

​Table of Contents

​Understanding Event Loops

​What is an Event Loop?

​The Event Loop Cycle

​Non-Blocking vs Blocking Code

​Why Event Loops Matter for Next.js

​The PM2 Cluster Scenario

​Symptoms of Event Loop Blocking

​Implementing Event Loop Monitoring

​Step 1: Create the Monitor

​Step 2: Initialize on Server Start

​Step 3: Create Health Check Endpoint

​Step 4: Create Request Tracking Middleware

​Interpreting Monitoring Data

​Understanding the Metrics

​Reading the Health Report

​Real-World Example Analysis

​Common Blocking Patterns and Fixes

​1. Large JSON Parsing

​2. Synchronous File Operations

​3. Complex Array Operations

​4. Database Queries Without Connection Pooling

​Advanced Monitoring Setup

​Real-Time Dashboard Script

​Load Test Monitoring

​Analyzing Results

​Best Practices

​1. Set Appropriate Thresholds

​2. Monitor in Staging First

​3. Combine with APM Tools

​4. Regular Performance Audits

​Conclusion

​Additional Resources

Table of Contents

Understanding Event Loops

What is an Event Loop?

The Event Loop Cycle

Non-Blocking vs Blocking Code

Why Event Loops Matter for Next.js

The PM2 Cluster Scenario

Symptoms of Event Loop Blocking

Implementing Event Loop Monitoring

Step 1: Create the Monitor

Step 2: Initialize on Server Start

Step 3: Create Health Check Endpoint

Step 4: Create Request Tracking Middleware

Interpreting Monitoring Data

Understanding the Metrics

Reading the Health Report

Real-World Example Analysis

Common Blocking Patterns and Fixes

1. Large JSON Parsing

2. Synchronous File Operations

3. Complex Array Operations

4. Database Queries Without Connection Pooling

Advanced Monitoring Setup

Real-Time Dashboard Script

Load Test Monitoring

Analyzing Results

Best Practices

1. Set Appropriate Thresholds

2. Monitor in Staging First

3. Combine with APM Tools

4. Regular Performance Audits

Conclusion

Additional Resources