Sequence number errors are among the most stressful IBM MQ incidents because the queue manager is protecting message integrity on purpose. The channel protocol will not enter RUNNING when the sender and receiver disagree about which batch numbers were committed—a situation that often follows disaster recovery, a restored backup on one side only, or an operator running RESET CHANNEL on one queue manager without telling the partner. Beginners see RETRY loops and rising XMITQ depth and may raise retry timers, which never fixes stale sequence state. This tutorial focuses on errors and recovery: recognizing log messages, coordinated RESET CHANNEL procedures, draining versus risking duplication, differences from ordinary network retry, and post-incident validation so payroll and audit traffic resume safely.
Message channels transfer messages in batches for efficiency. Each batch has a logical sequence position on the pair. When TCP drops mid-batch, partners reconcile on reconnect: what was acknowledged, what must be resent from XMITQ. That only works if both sides share compatible counters. Restore QM_A from Friday backup while QM_B ran through Saturday and counters diverged—the next bind presents numbers B does not expect. IBM MQ stops rather than silently duplicate or drop financial payloads. Treat the error as data safety, not as a bug to bypass.
| Scenario | Risk | First action |
|---|---|---|
| RESET on one QM only | Dup or gap | Stop channel both sides; align plan |
| Single-sided backup restore | State skew | Compare restore dates; coordinate |
| QM replacement new name | New instance zero state | New channel pair or dual RESET |
| Long outage both sides | In-flight ambiguity | Review XMITQ and logs |
DISPLAY CHSTATUS shows RETRY or INACTIVE with LASTCHLERR referencing sequence or channel protocol terms depending on release wording. Search both queue managers error logs around the same second—the sender and receiver each log their view. Note whether the channel ever reached RUNNING after the last change window or failed immediately on BINDING. Check CURDEPTH on the transmission queue and the age of the oldest message—stale messages may need business approval before destructive recovery. If multiple channels between the same pair fail together, suspect a common restore event rather than individual typos.
12345678* Agree maintenance window with partner ops PING CHSTATUS('PARIS.TO.LONDON') STOP CHANNEL('PARIS.TO.LONDON') * Partner stops matching channel on their QM RESET CHANNEL('PARIS.TO.LONDON') * Partner runs RESET on same channel name START CHANNEL('PARIS.TO.LONDON') DISPLAY CHSTATUS('PARIS.TO.LONDON')
STOP quiesces new batches. RESET clears local sequence state—meaning on your release must be confirmed in documentation. Starting before partner RESET completes can reproduce the error immediately. Some runbooks drain XMITQ to a holding queue before RESET when duplication risk is unacceptable; others accept at-least-once with idempotent consumers. Legal and audit requirements choose the path, not the MQ administrator alone.
If the sender believes batch 100 was not received but the receiver actually committed it, blind RESET and resend may duplicate. If the receiver never got batch 100 but counters jump forward after RESET, loss is possible when messages were non-persistent or already removed from XMITQ by administrative action. Persistent messages on XMITQ generally remain until successfully transferred—RESET does not delete them automatically—but protocol state after RESET may allow retransmission. Application idempotency keys and duplicate detection tables are enterprise defenses when MQ recovery cannot guarantee exactly-once across a reset boundary.
Active/passive failover with shared disk may preserve sequence state if channels and logs fail over together—still test annually. Active/active with independent disks is vulnerable: never start channels between QMs restored from different timestamps without a written procedure. Multi-instance queue managers reduce listener outages but do not remove sequence discipline after split-brain. Document for each channel whether messages in flight during failover are reconciled by MQ automatically or require business replay from source systems.
Misdiagnosis wastes hours: a team renewing certificates while sequence state is wrong will not restore RUNNING. Read the full LASTCHLERR text, not only the status color on the console.
Two offices share a notebook counting which package batches they exchanged. If one office rips out pages and restarts at page 1 while the other still expects page 50, they stop shipping until both agree to start a fresh chapter together—that is coordinated RESET.
You and your friend were counting puzzle pieces you sent each other. One of you forgot the count and started at one again while the other still remembers the old number—so you stop until both agree to start counting the same way again.
Write a two-sided maintenance procedure for RESET CHANNEL including stop order and validation puts.
List three DR scenarios and whether you would drain XMITQ first.
Given LASTCHLERR mentioning sequence, explain why increasing SHORTRTY will not help.
1. Sequence errors mean:
2. RESET CHANNEL should be:
3. Common after DR:
4. XMITQ depth high with sequence error suggests: