What is stress testing for IBM MQ?

Stress testing increases load beyond planned peak until errors, latency blowouts, or resource exhaustion occur, revealing maximum capacity and failure modes for operations and architecture.

How is stress testing different from load testing?

Load testing validates planned or peak expected traffic. Stress testing deliberately exceeds that level to find the breaking point and how the system degrades.

Is stress testing safe in production?

Generally no. Run in isolated environments with rollback plans. Production chaos experiments require executive approval and blast-radius controls.

What failures might stress testing reveal?

MQRC_Q_FULL, channel failures, log full, disk full, CPU saturation, listener connection limits, indoubt transactions, and application thread starvation.

What should we do after stress testing?

Document maximum sustainable rate, failure symptoms, recovery steps, and update capacity plans and alerts. Fix critical issues before relying on headroom assumptions.

MainframeMaster

Stress Testing

Stress testing IBM MQ means pushing the system harder than load testing until something gives—queue full, log disk saturated, channels in RETRY, consumers drowning, or CPU pegged. The goal is not to break production on purpose in anger; it is to learn where the cliff edge lies so capacity planning includes margin and operators recognize early warning signs before customers do. Load testing asks whether we handle Black Friday traffic. Stress testing asks what happens on Black Friday plus thirty percent from a duplicate feed bug. Beginners confuse the two and either never test beyond averages or run uncontrolled spikes in production. This tutorial explains stress methodology, ramp strategies, failure modes to observe, safety controls, recovery validation, and how results feed capacity planning and monitoring thresholds.

Stress Versus Load

Load test vs stress test
Aspect	Load test	Stress test
Target intensity	Planned peak	Beyond peak until failure
Success criterion	Meet SLA at peak	Document max and failure mode
Risk tolerance	Low	Controlled breakage in test
Output	Baseline metrics	Ceiling and degradation curve

Ramp Strategies

Step ramp increases producer rate every five minutes until errors appear—plot latency and depth at each step to find knee of curve. Spike test doubles rate instantly for ten minutes to simulate misconfigured batch replay. Soak at one hundred twenty percent of peak for hours reveals memory leaks and log archive fill. Combine strategies in separate runs; one marathon confuses results.

Failure Modes to Watch

MQRC_Q_FULL when CURDEPTH hits MAXDEPTH—producers block or fail.
Filesystem full on queue or log paths—queue manager may stop.
Channel session limit or partner rejection—XMITQ depth explodes.
Listener backlog—clients cannot connect though MQ internals healthy.
Poison or retry storms—CPU flat, depth barely moves.
Indoubt transactions after forced stop—need TM recovery.

Safety and Ethics

Isolate network from production partners—use test channels or stub receivers. Snapshot disks or use disposable VMs for easy rebuild. Schedule windows with infrastructure teams. Never point stress drivers at production queue names without governance. Document rollback: endmqm, clear test queues, restore from image.

Observing Degradation

Healthy systems degrade gracefully: latency rises before total failure; errors are explicit reason codes. Unhealthy systems exhibit hangs, indoubt states, or partial writes—capture AMQ logs and kernel metrics during stress. Note whether depth recovers after producers stop—drain rate under stress defines recovery time.

Recovery Validation

Stop stress drivers.
Verify queue manager RUNNING and listeners up.
Drain or purge test queues per policy.
Resolve indoubt transactions.
Re-run short load test at fifty percent to confirm normal.

Using Results

Record maximum sustainable msg/s and byte/s before p99 latency doubled or errors exceeded one percent. Set monitoring alerts at seventy percent of that ceiling. Update capacity plan headroom. File defects for non-linear failures (small depth causing disproportionate slowdown).

Tutorial: Stress Run Log Template

text

1
2
3
4
5
6
7
8
STRESS RUN 2026-05-17 — QM_PERF
Ramp: +500 msg/s every 5 min, persistent 2KB
Failure at: 7,500 msg/s — MQRC_Q_FULL on PAYMENT.IN
CURDEPTH: 100,000 / MAXDEPTH 100,000
Channel TO.HUB: RUNNING, XMITQ depth 0
Recovery: stop producers, 45 min drain at 3,200 get/s
Max sustainable (p99 < 500ms): ~6,800 msg/s
Action: raise MAXDEPTH + disk OR add consumer instance

Explainer: How Many People Fit in the Elevator

Stress testing keeps adding people to the elevator until the alarm rings—you learn the real limit, not the sign that says eight when twelve seemed fine until someone brought luggage.

Explain Like I'm Five

Stress testing is putting more and more marbles in the jar until marbles start falling on the floor—then you know the jar is too small for that many at once.

Practice Exercises

Exercise 1

Design a step ramp from 2,000 to 10,000 msg/s in five steps.

Exercise 2

List recovery steps after stress test fills the log disk.

Exercise 3

Convert stress test ceiling to a monitoring alert threshold with margin.

Frequently Asked Questions

Test Your Knowledge

1. Stress testing pushes load:

Beyond planned peak
Only to fifty percent
Only on DLQ
Only off hours production

2. A common stress outcome is:

MQRC_Q_FULL or disk full
Automatic free capacity
Higher MAXMSGL always
Fewer logs

3. After stress test you should:

Document limits and recovery
Delete all queues
Disable TLS
Skip capacity plan

4. Stress tests belong in:

Isolated test environment
Production checkout
Partner prod first
DLQ only

Stress Testing

Stress Versus Load

Ramp Strategies

Failure Modes to Watch

Safety and Ethics

Observing Degradation

Recovery Validation

Using Results

Tutorial: Stress Run Log Template

Explainer: How Many People Fit in the Elevator

Explain Like I'm Five

Practice Exercises

Exercise 1

Exercise 2

Exercise 3

Frequently Asked Questions

Frequently Asked Questions

Test Your Knowledge

Test Your Knowledge

Load Testing

Capacity Planning

Queue Depths

Poison Messages