Scatter-Gather

The scatter-gather pattern solves problems that one slow monolith cannot: a customer order needs credit check, inventory hold, and tax calculation, each owned by a different team and runtime, and the web page should wait only as long as the slowest leg—not the sum of three sequential calls. Scatter-gather on IBM MQ splits the incoming request message into sub-tasks on separate queues, lets multiple consumers work in parallel, then gathers replies into a single correlated response. MQ provides durable queues, transactional gets and puts, and MsgId or CorrelId fields in the message descriptor so the aggregator knows which replies belong to which parent request. Beginners confuse scatter-gather with simple load balancing on one queue—scatter explicitly creates different message types or routes to different services. This tutorial walks through scatter and gather roles, correlation design, expected reply counts, timeout and partial failure, idempotency when workers redeliver, integration with the aggregator pattern, monitoring scatter fan-out queues, and anti-patterns such as losing correlation when a worker forgets to copy CorrelId on the reply.

Roles in the Pattern

Scatter-gather components on MQ
RoleTypical queueAction
ScatterORDER.SCATTER.INGET request; PUT N sub-messages
WorkerCREDIT.WORK, STOCK.WORKProcess; PUT reply
GatherORDER.GATHER.INCollect replies; build response
Client replyORDER.RESPONSESingle answer to caller

Correlation Design

When the scatter service receives a request, it reads or generates a correlation token. Common pattern: use the request MsgId as CorrelId on every sub-message PUT. Each worker copies CorrelId from input to reply MQMD so gather can MQGET with a correlation selector. For multiple scatter rounds, nest tokens in a user property scatterGroupId. Document whether workers must preserve MsgId of sub-message or only CorrelId—mixing rules breaks gather. JMS clients use JMSCorrelationID; MQI uses CorrelId byte field; ensure encoding matches across platforms including EBCDIC mainframe.

text
1
2
3
4
5
6
7
8
Request arrives on ORDER.SCATTER.IN (MsgId = R1) Scatter PUT to CREDIT.WORK CorrelId = R1 body = credit slice Scatter PUT to STOCK.WORK CorrelId = R1 body = stock slice Scatter PUT to TAX.WORK CorrelId = R1 body = tax slice Scatter records expectedCount = 3 on state store or header Worker on CREDIT.WORK: GET, process, PUT ORDER.GATHER.IN CorrelId = R1 Gather: when 3 messages with CorrelId R1 received → merge → PUT ORDER.RESPONSE

Explainer: One Order, Three Helpers

Scatter-gather is asking three friends to each bring one ingredient while you stay at the table with the recipe card number in your hand. When all three return with the matching card number, you make the cake. MQ is the delivery service for the ingredient lists and returned bags.

Expected Reply Count

Static scatter always expects three replies. Dynamic scatter may query a database to decide how many legs to fire—gather must read expectedCount from message property or side table, not assume three forever. Mismatch causes gather to wait until timeout when a leg was never sent, or to complete early if count is wrong low.

Timeouts and Partial Results

  • All-or-nothing: timeout → error response; compensating transactions cancel holds.
  • Best-effort: return partial JSON with failed leg marked; business accepts degraded mode.
  • Retry: re-scatter only missing legs with same CorrelId; idempotent workers required.

Transactions

Scatter GET and multiple PUTs can share one syncpoint if all targets must appear together. If one PUT fails, entire scatter backs out and request retries. Alternative: scatter commits then workers are independent—gather handles missing legs via timeout. Trade atomicity for availability per business case.

Scaling Workers

Each work queue uses competing consumers. Credit service scales pods independently of tax service. Monitor depth per work queue to find slow leg. Avoid one giant queue for all leg types—that loses clear ownership and routing.

Step-by-Step: First Scatter-Gather Flow

  1. Define queues: SCATTER.IN, three WORK queues, GATHER.IN, RESPONSE.
  2. Implement scatter service with correlation token and expectedCount property.
  3. Implement three stub workers echoing replies with CorrelId preserved.
  4. Implement gather with in-memory map keyed by CorrelId (production: persistent store).
  5. Test happy path; test one worker down; test timeout.

Monitoring

Alert on GATHER.IN depth growing—aggregator slow or missing replies. Alert on WORK queue age—stuck consumer. Metric: scatter rate versus gather completion latency p99. Trace one CorrelId through logs end to end.

Explain Like I'm Five: Scatter-Gather

Scatter-gather is sending three friends to find different puzzle pieces at the same time, then putting the pieces together when they all come back with the same puzzle number written on their bags.

Practice Exercises

Exercise 1

Draw queue diagram for one request and three workers plus gather.

Exercise 2

Define timeout policy for missing tax reply.

Exercise 3

List MQMD fields each worker must copy to reply.

Frequently Asked Questions

Frequently Asked Questions

Test Your Knowledge

Test Your Knowledge

1. Scatter step sends:

  • Multiple sub-messages
  • One delete command
  • JCL only
  • DNS record

2. CorrelId links:

  • Request and replies
  • CPU and disk
  • Two queue managers only
  • Nothing

3. Gather waits for:

  • All replies or timeout
  • Forever always
  • Zero messages
  • Channel agent

4. Pub/sub differs because:

  • No required gather
  • Uses only FTP
  • No queues
  • Only z/OS
Published
Read time21 min
AuthorMainframeMaster
Verified: Enterprise Integration Patterns