What is the scatter-gather pattern?

Scatter-gather splits one incoming request into multiple sub-messages sent to parallel processing queues (scatter), then combines replies into one response (gather). IBM MQ carries scatter and gather messages on separate queues with correlation identifiers linking them.

How does IBM MQ correlate scatter and gather messages?

Use MsgId and CorrelId in the MQMD: the scatter step sets CorrelId on sub-messages to the original request MsgId or a generated token. Workers reply with CorrelId matching so the aggregator can collect the right set.

What happens if one worker is slow?

The gather step waits until all expected replies arrive or a timeout fires. Design timeout policies: fail whole request, return partial results, or retry missing legs. Document business rules explicitly.

Is scatter-gather the same as pub/sub?

No. Pub/sub broadcasts to subscribers who may not reply. Scatter-gather expects a bounded set of replies merged into one outcome, usually request-reply over point-to-point queues.

Can scatter-gather run on mainframe MQ?

Yes. COBOL or CICS workers on z/OS can consume scatter queues while cloud workers handle other legs, all correlated through MQMD fields and agreed queue naming.

MainframeMaster

Scatter-Gather

The scatter-gather pattern solves problems that one slow monolith cannot: a customer order needs credit check, inventory hold, and tax calculation, each owned by a different team and runtime, and the web page should wait only as long as the slowest leg—not the sum of three sequential calls. Scatter-gather on IBM MQ splits the incoming request message into sub-tasks on separate queues, lets multiple consumers work in parallel, then gathers replies into a single correlated response. MQ provides durable queues, transactional gets and puts, and MsgId or CorrelId fields in the message descriptor so the aggregator knows which replies belong to which parent request. Beginners confuse scatter-gather with simple load balancing on one queue—scatter explicitly creates different message types or routes to different services. This tutorial walks through scatter and gather roles, correlation design, expected reply counts, timeout and partial failure, idempotency when workers redeliver, integration with the aggregator pattern, monitoring scatter fan-out queues, and anti-patterns such as losing correlation when a worker forgets to copy CorrelId on the reply.

Roles in the Pattern

Scatter-gather components on MQ
Role	Typical queue	Action
Scatter	ORDER.SCATTER.IN	GET request; PUT N sub-messages
Worker	CREDIT.WORK, STOCK.WORK	Process; PUT reply
Gather	ORDER.GATHER.IN	Collect replies; build response
Client reply	ORDER.RESPONSE	Single answer to caller

Correlation Design

When the scatter service receives a request, it reads or generates a correlation token. Common pattern: use the request MsgId as CorrelId on every sub-message PUT. Each worker copies CorrelId from input to reply MQMD so gather can MQGET with a correlation selector. For multiple scatter rounds, nest tokens in a user property scatterGroupId. Document whether workers must preserve MsgId of sub-message or only CorrelId—mixing rules breaks gather. JMS clients use JMSCorrelationID; MQI uses CorrelId byte field; ensure encoding matches across platforms including EBCDIC mainframe.

text

1
2
3
4
5
6
7
8
Request arrives on ORDER.SCATTER.IN (MsgId = R1)
Scatter PUT to CREDIT.WORK  CorrelId = R1  body = credit slice
Scatter PUT to STOCK.WORK   CorrelId = R1  body = stock slice
Scatter PUT to TAX.WORK     CorrelId = R1  body = tax slice
Scatter records expectedCount = 3 on state store or header
 
Worker on CREDIT.WORK: GET, process, PUT ORDER.GATHER.IN CorrelId = R1
Gather: when 3 messages with CorrelId R1 received → merge → PUT ORDER.RESPONSE

Explainer: One Order, Three Helpers

Scatter-gather is asking three friends to each bring one ingredient while you stay at the table with the recipe card number in your hand. When all three return with the matching card number, you make the cake. MQ is the delivery service for the ingredient lists and returned bags.

Expected Reply Count

Static scatter always expects three replies. Dynamic scatter may query a database to decide how many legs to fire—gather must read expectedCount from message property or side table, not assume three forever. Mismatch causes gather to wait until timeout when a leg was never sent, or to complete early if count is wrong low.

Timeouts and Partial Results

All-or-nothing: timeout → error response; compensating transactions cancel holds.
Best-effort: return partial JSON with failed leg marked; business accepts degraded mode.
Retry: re-scatter only missing legs with same CorrelId; idempotent workers required.

Transactions

Scatter GET and multiple PUTs can share one syncpoint if all targets must appear together. If one PUT fails, entire scatter backs out and request retries. Alternative: scatter commits then workers are independent—gather handles missing legs via timeout. Trade atomicity for availability per business case.

Scaling Workers

Each work queue uses competing consumers. Credit service scales pods independently of tax service. Monitor depth per work queue to find slow leg. Avoid one giant queue for all leg types—that loses clear ownership and routing.

Step-by-Step: First Scatter-Gather Flow

Define queues: SCATTER.IN, three WORK queues, GATHER.IN, RESPONSE.
Implement scatter service with correlation token and expectedCount property.
Implement three stub workers echoing replies with CorrelId preserved.
Implement gather with in-memory map keyed by CorrelId (production: persistent store).
Test happy path; test one worker down; test timeout.

Monitoring

Alert on GATHER.IN depth growing—aggregator slow or missing replies. Alert on WORK queue age—stuck consumer. Metric: scatter rate versus gather completion latency p99. Trace one CorrelId through logs end to end.

Explain Like I'm Five: Scatter-Gather

Scatter-gather is sending three friends to find different puzzle pieces at the same time, then putting the pieces together when they all come back with the same puzzle number written on their bags.

Practice Exercises

Exercise 1

Draw queue diagram for one request and three workers plus gather.

Exercise 2

Define timeout policy for missing tax reply.

Exercise 3

List MQMD fields each worker must copy to reply.

Frequently Asked Questions

Test Your Knowledge

1. Scatter step sends:

Multiple sub-messages
One delete command
JCL only
DNS record

2. CorrelId links:

Request and replies
CPU and disk
Two queue managers only
Nothing

3. Gather waits for:

All replies or timeout
Forever always
Zero messages
Channel agent

4. Pub/sub differs because:

No required gather
Uses only FTP
No queues
Only z/OS

Scatter-Gather

Roles in the Pattern

Correlation Design

Explainer: One Order, Three Helpers

Expected Reply Count

Timeouts and Partial Results

Transactions

Scaling Workers

Step-by-Step: First Scatter-Gather Flow

Monitoring

Explain Like I'm Five: Scatter-Gather

Practice Exercises

Exercise 1

Exercise 2

Exercise 3

Frequently Asked Questions

Frequently Asked Questions

Test Your Knowledge

Test Your Knowledge

Aggregator

Message Router

Correlation IDs

Work Queues