How to Stop Microservices Failures from Spreading with the Bulkhead Pattern

Back to Labs Content

System Design
Distributed Systems
Microservices
Bulkhead

How to Stop Microservices Failures from Spreading with the Bulkhead Pattern

Sunday, June 22, 2025 at 10:12:50 AM GMT+8

Microservices are awesome for building apps that scale and evolve quickly. But as your system grows, a small problem in one service can snowball into a disaster, taking down your entire application. This is called a cascading failure, and it’s a big challenge in microservices. The Bulkhead Pattern is a smart way to prevent this by isolating parts of your system so one failure doesn’t sink everything else.

Think of a ship with watertight compartments—if one section floods, the others stay dry, keeping the ship afloat. The Bulkhead Pattern does the same for your microservices, and you can apply it in two ways: inside your code (application-level) and at the server level (infrastructure-level). In this post, we’ll break down the problem, explain the solution, and walk you through practical examples to make your system bulletproof.

The Problem: Why Microservices Failures Spread

Imagine you’re running an online store with three microservices:

- Order-Service: Takes customer orders.

- Payment-Service: Talks to a payment provider (like Stripe) to process payments.

- Inventory-Service: Checks if items are in stock.

Everything’s running smoothly until a big sale hits. Suddenly, the payment provider’s API slows down because it’s overloaded. This makes Payment-Service sluggish. Here’s what happens next without proper safeguards:

1. Order-Service keeps sending requests to the slow Payment-Service, piling up work and using up all its available threads or resources.

2. Order-Service becomes so busy waiting for Payment-Service that it can’t handle other tasks, like talking to Inventory-Service or responding to customers.

3. Customers see errors or long delays when trying to check out.

4. If all your services run on the same server (like in a Kubernetes cluster), Payment-Service might hog all the CPU or memory, crashing Order-Service and Inventory-Service too.

This is a cascading failure: one slow service drags down the whole system. It’s like a traffic jam where one stalled car blocks every lane. The Bulkhead Pattern fixes this by isolating services so a problem in one doesn’t spread to others.

The Solution: The Bulkhead Pattern

The Bulkhead Pattern splits your system into isolated “compartments” to limit the damage from failures. It’s like putting walls between services or tasks so one issue stays contained. You can implement it in two places:

1. Application-Level Bulkhead: Inside your service’s code, you limit how many resources (like threads or tasks) are used for specific jobs, like calling an external API.

2. Infrastructure-Level Bulkhead: On your servers (using tools like Kubernetes), you set limits on CPU and memory so one service can’t overload the system.

Let’s dive into both approaches with clear explanations and code you can use.

1. Application-Level Bulkhead: Isolating Code Resources

At the application level, you protect your service by controlling how many tasks it runs at once. For example, if Order-Service calls Payment-Service, you don’t want a slow Payment-Service to tie up all of Order-Service’s resources. Instead, you create separate “pools” or “queues” for different tasks, like calling external APIs, and limit how many can run simultaneously.

Example: Limiting API Calls in Node.js

Let’s say Order-Service is a Node.js app that sends payment requests to Payment-Service. If Payment-Service slows down, you don’t want Order-Service to keep firing off requests until it crashes. You can use the async library to create a queue that limits how many payment requests happen at once.

const async = require('async');
const axios = require('axios');


// Create a queue that allows only 10 payment requests at a time
const paymentQueue = async.queue(async (task, done) => {
  try {
    const response = await axios.post('http://payment-service/pay', task.data);
    console.log('Payment succeeded:', response.data);
  } catch (error) {
    console.error('Payment failed:', error.message);
  } finally {
    done(); // Tell the queue this task is done
  }
}, 10); // Max 10 requests at once// Add a payment request to the queue
paymentQueue.push({ data: { amount: 100, currency: 'USD' } });


// Log when all tasks are done (optional)
paymentQueue.drain(() => {
  console.log('All payments processed');
});

What’s happening here?

- The paymentQueue only allows 10 payment requests to run at the same time.

- If Payment-Service is slow, extra requests wait in the queue instead of overwhelming Order-Service.

- This leaves Order-Service free to handle other tasks, like checking inventory or serving web pages.

- The done() call ensures the queue moves on to the next task when one finishes.

This is like having a ticket counter with 10 open windows—only 10 people can be served at once, and others wait in line, keeping things orderly.

Adding Timeouts to Avoid Hanging

What if Payment-Service stops responding entirely? You don’t want tasks stuck in the queue forever. You can add a timeout to cancel requests that take too long, using JavaScript’s AbortController:

const async = require('async');
const axios = require('axios');


// Queue with 10 concurrent payment requests
const paymentQueue = async.queue(async (task, done) => {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), 3000); // Cancel after 3 seconds


  try {
    const response = await axios.post('http://payment-service/pay', task.data, {
      signal: controller.signal,
    });
    console.log('Payment succeeded:', response.data);
  } catch (error) {
    if (error.name === 'AbortError') {
      console.log('Payment request timed out after 3 seconds');
    } else {
      console.error('Payment failed:', error.message);
    }
  } finally {
    clearTimeout(timeout); // Clean up the timeout
    done(); // Free the queue slot
  }
}, 10);


// Add a payment task
paymentQueue.push({ data: { amount: 100, currency: 'USD' } });

Why this is better:

- If Payment-Service takes longer than 3 seconds, the request is canceled, freeing up the queue.

- This prevents Order-Service from getting stuck waiting for a broken service.

- You could also add a circuit breaker (using a library like opossum) to pause calls to Payment-Service if it keeps failing, reducing load even more.

Why Application-Level Bulkheads Rock

- Keeps failures contained: A slow API (like Payment-Service) doesn’t block other tasks in Order-Service.

- Saves resources: By limiting concurrent tasks, you avoid overloading your service.

- Handles failures gracefully: If one part fails, your service can still respond to other requests or use fallback options (e.g., “Try another payment method”).

2. Infrastructure-Level Bulkhead: Isolating Server Resources

Application-level bulkheads protect a single service’s code, but what if Payment-Service starts using all the server’s CPU or memory? That could crash Order-Service and Inventory-Service if they’re on the same machine. Infrastructure-level bulkheads use tools like Kubernetes to set hard limits on resources and keep services separate.

Example: Setting Resource Limits in Kubernetes

In Kubernetes, you run services as pods (containers). You can tell Kubernetes how much CPU and memory each pod can use with requests (minimum needed) and limits (maximum allowed). This stops one service from hogging everything.

Here’s a Kubernetes configuration for Payment-Service:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
spec:
  replicas: 2
  selector:
    matchLabels:
      app: payment-service
  template:
    metadata:
      labels:
        app: payment-service
    spec:
      containers:
      - name: payment-service
        image: payment-service:latest
        resources:
          requests:
            cpu: "250m" # Needs at least 0.25 CPU cores
            memory: "256Mi" # Needs at least 256 MB memory
          limits:
            cpu: "500m" # Can’t use more than 0.5 CPU cores
            memory: "512Mi" # Can’t use more than 512 MB memory

What’s happening?

- Requests: Kubernetes guarantees Payment-Service gets 0.25 CPU cores and 256 MB of memory.

- Limits: If Payment-Service tries to use more than 0.5 CPU cores or 512 MB, Kubernetes slows it down or restarts it.

- This protects Order-Service and Inventory-Service from losing resources, even if Payment-Service goes haywire.

Think of it like giving each service its own slice of a pizza—nobody can grab the whole pie.

Spreading Services Across Servers

What if Payment-Service and Order-Service run on the same server, and that server crashes? To avoid this, use Kubernetes podAntiAffinity to spread service replicas across different servers (nodes):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
spec:
  replicas: 2
  selector:
    matchLabels:
      app: payment-service
  template:
    metadata:
      labels:
        app: payment-service
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - payment-service
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: payment-service
        image: payment-service:latest
        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

Why this helps:

- PodAntiAffinity ensures the two replicas of Payment-Service run on different servers.

- If one server fails, the other replica keeps running, so Payment-Service stays available.

- This also prevents one service from overloading a single server’s resources.

It’s like seating rival teams at different tables to avoid fights.

Scaling Smart with Autoscaling

During a traffic spike, Payment-Service might need more pods to handle the load. Kubernetes’ Horizontal Pod Autoscaler (HPA) automatically adds or removes pods based on CPU usage, keeping things efficient.

Here’s an HPA for Payment-Service:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: payment-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: payment-service
  minReplicas: 2
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

What’s happening?

- The HPA watches Payment-Service’s CPU usage.

- If usage goes above 70% of the requested CPU (0.25 cores), Kubernetes adds more pods, up to 5.

- If usage drops, it scales back down to 2 pods.

- This handles traffic spikes without letting Payment-Service consume too many resources.

It’s like hiring extra staff during a busy restaurant rush, but sending them home when it’s quiet.

Why Infrastructure-Level Bulkheads Are Awesome

- Stops resource hogs: One service can’t steal all the CPU or memory.

- Survives server crashes: Spreading pods across servers keeps your system running.

- Handles traffic spikes: Autoscaling adds capacity without breaking the bank.

Putting It All Together

For a rock-solid microservices system, use both application-level and infrastructure-level bulkheads:

1. In Your Code:

- Use queues to limit how many API calls or tasks run at once (e.g., 10 payment requests).

- Add timeouts to stop waiting for slow services (e.g., 3 seconds).

- Consider circuit breakers to pause calls to failing services.

2. On Your Servers:

- Set CPU and memory limits in Kubernetes (e.g., 0.5 CPU cores, 512 MB max).

- Spread service replicas across different servers with pod anti-affinity.

- Use autoscaling to handle traffic spikes safely.

Together, they:

- Keep failures small and contained.

- Protect your system’s resources.

- Let your app keep running even if one part breaks.

Wrapping Up

Cascading failures can turn a tiny glitch into a full-blown outage, but the Bulkhead Pattern is your shield. By isolating resources in your code (with queues and timeouts) and on your servers (with Kubernetes limits and autoscaling), you make your microservices tough enough to handle failures, traffic spikes, and flaky APIs.

Start by applying the Bulkhead Pattern to your most critical services, like payment or order processing. As you see it work, roll it out to others. With this pattern, your microservices will stay afloat no matter how rough the seas get.

Another Recommended Labs Content

Kubernetes, a popular tool, helps manage these pieces by running them in containers and ensuring they’re available and scalable. But as your application grows, Kubernetes alone can’t handle everything. Let’s break this down and see how the sidecar pattern in a service mesh comes to the rescue, making your system easier to monitor, secure, and manage.

In modern software architecture, microservices have become the go-to approach for building scalable, maintainable, and independently deployable applications. However, with great modularity comes great complexity—especially when it comes to managing data consistency across services.

Event Sourcing is an architectural pattern where every change to an application's state is stored as an immutable event, rather than just storing the final state. This fundamentally changes how systems record, reconstruct, and interact with data over time.

🚀Darmawan