Latency

Latency is the waiting time a message spends between creation and consumption—or between any two checkpoints you measure along the IBM MQ path. Trading systems chase sub-millisecond LAN latency; batch warehouses accept minutes. Asynchronous messaging deliberately decouples producers from consumer speed, which means latency becomes a design choice, not only a bug. Beginners measure MQPUT return time and assume the business saw the message—ignoring queue depth, channel batching, and COBOL processing seconds later. Operations see channel RUNNING and assume low latency—ignoring fifty thousand messages ahead in the queue. This tutorial decomposes latency into queue wait, queue manager processing, log I/O, channel transit, and application time; contrasts persistent and non-persistent behavior; explains the throughput tradeoff; covers syncpoint and XA; and gives practical measurement patterns for distributed and z/OS MQ without requiring exotic tooling.

Segments of End-to-End Latency

Latency segments on a typical path
SegmentTypical rangeMain drivers
Producer MQPUTSub-ms to tens of msPersistence, syncpoint, network to QM
Queue waitZero to unboundedDepth, consumer speed, priority
Channel transitWAN ms to secondsDistance, TLS, BATCHSZ, size
Consumer MQGETSub-ms to msMatch options, browse vs get
Business logicVariesApplication code, DB calls

Document which segments your SLA covers. A middleware team might own through MQGET; application teams own processing after. Blame games end when timestamps exist in a shared format—ISO-8601 in RFH2 usr folder or corporate correlation ID in MQMD.

Queue Wait and Little's Law Intuition

Average queue wait approximates average depth divided by dequeue rate when the system is stable (Little's Law). If ten thousand messages sit on a queue and consumers remove one thousand per second, new arrivals wait roughly ten seconds on average before their turn—plus service time. Latency spikes during catch-up after outages. Priority queues (where supported) and separate queues for express traffic reduce head-of-line blocking. Do not point low-latency flows at the same queue as bulk batch unless consumers can always keep depth near zero.

Persistence, Logging, and Syncpoint

Persistent MQPUT with default synchronous logging waits for the log record to reach durable storage before returning success. That fsync dominates LAN latency budgets. Applications can use MQPMO_NO_SYNCPOINT or asynchronous log settings only where documentation and risk acceptance allow—many payment flows cannot. MQGET under syncpoint holds locks until commit—adding latency for competing consumers. XA two-phase commit coordinates with databases—correctness over speed. Non-persistent messages reduce put latency dramatically; use only when loss is acceptable.

Channels and Batching

Sender channels batch messages up to BATCHSZ before sending across TCP. A batch of fifty messages might wait until the fiftieth small message arrives or a timer fills the batch—throughput wins, last message in batch waits longest. For low latency over WAN, smaller BATCHSZ and HBINT tuning reduce wait; for bulk replication, large batches help. TLS session reuse avoids repeated handshake latency on new connections—prefer connection pooling on clients.

Client Connection Latency

Each new TCP and TLS handshake costs round trips. Client reconnect after failover adds seconds. CCDT with multiple hosts lets clients try the next connection name quickly. Local binding (server connection on same host as app) removes network for co-located workloads. Remote clients over VPN add tens of milliseconds per hop—measure from client machine, not only from queue manager host.

Browse Versus Destructive Get

Browsing inspects messages without removing them—useful for monitoring, not for low-latency consumption paths that need exactly-once dequeue. Destructive MQGET removes the message; browse-then-get patterns double I/O. Match options (MQMO_MATCH_MSG_ID) add search latency on deep queues—design keys and queue depth accordingly.

Pub/Sub Latency

Publish to multiple subscribers adds fan-out cost. Durable subscribers may write additional logs. Topic routing through hierarchies is usually fast; subscriber application lag is not. For market data, many shops use specialized middleware or non-persistent topics with dedicated consumers—MQ can work when depth stays shallow.

z/OS Considerations

Queue sharing groups add coupling facility access time—usually low microseconds to milliseconds when CF is healthy. CF structure contention raises latency for all members. Shared message sets for large messages behave differently than small messages in CF structures. Measure on-CP time with SMF and MQ accounting where available.

Latency Versus Throughput Tradeoff

Optimizing one often pressures the other. Large channel batches and aggressive consumer batching raise throughput but stretch per-message latency. A low-latency design uses shallow queues, fast disks, minimal sync where safe, co-located apps, small messages, and enough consumer threads to keep depth near zero. A high-throughput batch design accepts seconds of latency overnight. State your goal per queue in architecture documents.

Tutorial: Timestamp Correlation

text
1
2
3
4
5
6
7
8
9
10
11
Producer payload (JSON example): { "orderId": "A123", "producedAt": "2026-05-17T10:00:00.123Z" } Consumer on MQGET: latency_ms = now() - parse(producedAt) log histogram: p50, p95, p99 Also log: putCompletionTime (app after MQPUT) getArrivalTime (before business logic) Separate middleware latency from processing latency.

Tutorial: DISPLAY Depth During Spike

shell
1
2
3
4
DISPLAY QLOCAL('ORDERS.IN') CURDEPTH IPPROCS OPPROCS * Rising CURDEPTH + flat OPPROCS -> consumer starvation * High IPPROCS + low OPPROCS -> producers faster than consumers * After fix: CURDEPTH stable near zero -> latency queue wait drops

Explainer: Waiting in Line

Latency is how long you stand in line at the café. Throughput is how many customers leave per minute. A barista making ten drinks at once (batching) serves the line faster but the last person in the batch waits longer.

Explain Like I'm Five

Latency is how long you wait for your turn on the slide. If ten kids are ahead of you, you wait longer—even if the slide is very fast once you climb on.

Practice Exercises

Exercise 1

If depth is 20,000 and consumers dequeue 2,000 per second, estimate average queue wait ignoring service time.

Exercise 2

Name two tuning changes that lower latency but might lower peak throughput.

Exercise 3

Design a timestamp scheme to split middleware latency from application latency.

Frequently Asked Questions

Frequently Asked Questions

Test Your Knowledge

Test Your Knowledge

1. Latency measures:

  • Time for a message to travel the path
  • Messages per hour only
  • Queue name length
  • Cipher bit count

2. Deep queues usually:

  • Increase wait time for new messages
  • Reduce wait time
  • Eliminate TLS
  • Remove logs

3. Large channel BATCHSZ can:

  • Raise throughput but add batching delay
  • Remove all latency
  • Disable listeners
  • Delete messages

4. Syncpoint on put/get:

  • Adds coordination and I/O latency
  • Removes all delay
  • Only affects topics
  • Skips channels
Published
Read time21 min
AuthorMainframeMaster
Verified: IBM MQ 9.3 documentation