Sequence Number Errors

Sequence number errors are among the most stressful IBM MQ incidents because the queue manager is protecting message integrity on purpose. The channel protocol will not enter RUNNING when the sender and receiver disagree about which batch numbers were committed—a situation that often follows disaster recovery, a restored backup on one side only, or an operator running RESET CHANNEL on one queue manager without telling the partner. Beginners see RETRY loops and rising XMITQ depth and may raise retry timers, which never fixes stale sequence state. This tutorial focuses on errors and recovery: recognizing log messages, coordinated RESET CHANNEL procedures, draining versus risking duplication, differences from ordinary network retry, and post-incident validation so payroll and audit traffic resume safely.

Why the Protocol Refuses to Continue

Message channels transfer messages in batches for efficiency. Each batch has a logical sequence position on the pair. When TCP drops mid-batch, partners reconcile on reconnect: what was acknowledged, what must be resent from XMITQ. That only works if both sides share compatible counters. Restore QM_A from Friday backup while QM_B ran through Saturday and counters diverged—the next bind presents numbers B does not expect. IBM MQ stops rather than silently duplicate or drop financial payloads. Treat the error as data safety, not as a bug to bypass.

Situations that commonly cause sequence errors
ScenarioRiskFirst action
RESET on one QM onlyDup or gapStop channel both sides; align plan
Single-sided backup restoreState skewCompare restore dates; coordinate
QM replacement new nameNew instance zero stateNew channel pair or dual RESET
Long outage both sidesIn-flight ambiguityReview XMITQ and logs

Symptoms in CHSTATUS and Logs

DISPLAY CHSTATUS shows RETRY or INACTIVE with LASTCHLERR referencing sequence or channel protocol terms depending on release wording. Search both queue managers error logs around the same second—the sender and receiver each log their view. Note whether the channel ever reached RUNNING after the last change window or failed immediately on BINDING. Check CURDEPTH on the transmission queue and the age of the oldest message—stale messages may need business approval before destructive recovery. If multiple channels between the same pair fail together, suspect a common restore event rather than individual typos.

Coordinated Recovery Runbook

shell
1
2
3
4
5
6
7
8
* Agree maintenance window with partner ops PING CHSTATUS('PARIS.TO.LONDON') STOP CHANNEL('PARIS.TO.LONDON') * Partner stops matching channel on their QM RESET CHANNEL('PARIS.TO.LONDON') * Partner runs RESET on same channel name START CHANNEL('PARIS.TO.LONDON') DISPLAY CHSTATUS('PARIS.TO.LONDON')

STOP quiesces new batches. RESET clears local sequence state—meaning on your release must be confirmed in documentation. Starting before partner RESET completes can reproduce the error immediately. Some runbooks drain XMITQ to a holding queue before RESET when duplication risk is unacceptable; others accept at-least-once with idempotent consumers. Legal and audit requirements choose the path, not the MQ administrator alone.

RESET CHANNEL Risk: Duplication Versus Loss

If the sender believes batch 100 was not received but the receiver actually committed it, blind RESET and resend may duplicate. If the receiver never got batch 100 but counters jump forward after RESET, loss is possible when messages were non-persistent or already removed from XMITQ by administrative action. Persistent messages on XMITQ generally remain until successfully transferred—RESET does not delete them automatically—but protocol state after RESET may allow retransmission. Application idempotency keys and duplicate detection tables are enterprise defenses when MQ recovery cannot guarantee exactly-once across a reset boundary.

DR and Backup Scenarios

Active/passive failover with shared disk may preserve sequence state if channels and logs fail over together—still test annually. Active/active with independent disks is vulnerable: never start channels between QMs restored from different timestamps without a written procedure. Multi-instance queue managers reduce listener outages but do not remove sequence discipline after split-brain. Document for each channel whether messages in flight during failover are reconciled by MQ automatically or require business replay from source systems.

Distinguishing Sequence Errors From Other Failures

  • Connection refused—usually CONNAME or listener; no sequence yet.
  • TLS errors—handshake fails before sequence negotiation.
  • CHLAUTH block—security policy; fix rules not RESET.
  • Sequence error—often after DR or RESET; partners were recently RUNNING.

Misdiagnosis wastes hours: a team renewing certificates while sequence state is wrong will not restore RUNNING. Read the full LASTCHLERR text, not only the status color on the console.

Prevention for New Environments

  1. Pair channel names and document both queue managers in the same CMDB entry.
  2. Include sequence recovery in DR playbooks with named approvers.
  3. Test RESET in lab with test messages and count puts at consumer.
  4. Monitor XMITQ age and channel NOT RUNNING alerts.
  5. Version-control MQSC for channel definitions separately from QM data backups.

Explainer: Page Numbers in a Shared Notebook

Two offices share a notebook counting which package batches they exchanged. If one office rips out pages and restarts at page 1 while the other still expects page 50, they stop shipping until both agree to start a fresh chapter together—that is coordinated RESET.

Explain Like I'm Five: Sequence Number Errors

You and your friend were counting puzzle pieces you sent each other. One of you forgot the count and started at one again while the other still remembers the old number—so you stop until both agree to start counting the same way again.

Practice Exercises

Exercise 1

Write a two-sided maintenance procedure for RESET CHANNEL including stop order and validation puts.

Exercise 2

List three DR scenarios and whether you would drain XMITQ first.

Exercise 3

Given LASTCHLERR mentioning sequence, explain why increasing SHORTRTY will not help.

Frequently Asked Questions

Frequently Asked Questions

Test Your Knowledge

Test Your Knowledge

1. Sequence errors mean:

  • Sender and receiver batch state mismatch
  • Invalid COBOL
  • LDAP down
  • JCL class wrong

2. RESET CHANNEL should be:

  • Coordinated with partner
  • Secret on one side only
  • Run hourly
  • On DLQ

3. Common after DR:

  • Sequence mismatch
  • Higher HBINT only
  • New topic
  • Namelist change

4. XMITQ depth high with sequence error suggests:

  • Messages waiting for fixed channel
  • All delivered
  • No messages
  • Client only issue
Published
Read time20 min
AuthorMainframeMaster
Verified: IBM MQ 9.3 documentation