Stuck Channels (Cluster)

Point-to-point channels that sit in RETRY are painful; a stuck cluster channel can be worse because one CLUSSDR instance between a partial member and a full repository may block catalog updates, auto-definition of further paths, and every cluster put that needs a remote instance. Operators see DISPLAY CHSTATUS showing BINDING for hours on QM_APP1.QM_REPO while CURDEPTH on cluster transmission-related queues grows, or cluster puts fail with reason codes tied to unavailable destination while CLUSCH shows channels that never leave STARTING. Cluster channels share the same TCP, TLS, CHLAUTH, and sequence number failure modes as SDR and RCVR, but troubleshooting must include repository role, auto-defined names, and whether multiple clusters share a listener port. This tutorial covers stuck CLUSSDR and CLUSRCVR diagnosis, difference from the channel-stuck-in-retry page for general channels, STOP and RESET discipline, partner coordination, firewall and listener checks specific to cluster mesh, impact on cache and repository sync, and prevention through monitoring channel state to full repositories.

Cluster Channel States That Mean Stuck

BINDING means the channel is attempting to establish a session—often listener unreachable, wrong port, or TLS handshake hang. RETRY means a failed attempt is cycling through SHORTRTY and LONGRTY timers—read LASTCHLERR and LASTCHLERRM for the repeating error. STARTING may appear briefly at activation; if permanent, investigate channel initiator or queue manager limits. RUNNING is healthy. INACTIVE after retries may mean exhaustion—do not assume the problem fixed itself. Compare STATUS on both ends when possible via operations bridge to partner team.

LASTCHLERR categories for cluster channels
CategoryExamplesFix direction
NetworkConnection refused, timeoutFirewall, DNS, listener PORT
SecurityCHLAUTH blocked, MCAUSERCHLAUTH, CONNAUTH, certs
TLSCipher mismatch, expired certSSLCIPH, GSKit renewal
SequenceSequence error after restoreCoordinated RESET both sides
DefinitionWrong CONNAME from repoFix CLUSQMGR NETNAME, refresh

Diagnostic MQSC

shell
1
2
3
4
5
6
7
8
DISPLAY CHSTATUS('QM_APP1.QM_REPO') ALL DISPLAY CLUSCH('QM_APP1.QM_REPO') CLUSTER('SALES') ALL DISPLAY LSSTATUS(LISTENER.TCP) ALL * Partner queue manager: DISPLAY CHSTATUS('QM_REPO.QM_APP1') ALL * All cluster channels not RUNNING: DISPLAY CHSTATUS(*) WHERE(STATUS NE RUNNING) + WHERE(CHLTYPE EQ CLUSSDR)

CLUSCH shows repository-published definition including CONNAME and CLUSTER list. CHSTATUS shows the live instance. Mismatch—CLUSCH CONNAME hostA but DNS now points hostB after migration—explains endless BINDING. For auto-defined channels, note whether manual ALTER is allowed or will be overwritten on next repository publish.

Recovery Procedure

  1. Record LASTCHLERR from both sides before STOP.
  2. Fix root cause: listener, cert, CHLAUTH, CONNAME, firewall rule.
  3. STOP CHANNEL on both sides if active instance persists wrongly.
  4. RESET CHANNEL only if sequence error documented and partner agrees.
  5. START CHANNEL or trigger via cluster activity per site standards.
  6. Verify RUNNING and MSGS increment on test cluster put.
  7. REFRESH CLUSTER on partial member if cache was stale during outage.

Impact on Cluster Operations

While CLUSSDR to full repository is stuck, partial members may run on old cache—see cluster cache issues tutorial. Auto-defined channels to new members may not appear. Workload may hide instances on unreachable queue managers until channels RUNNING. Pub/sub cluster topics may stop propagating subscription updates. Prioritize repository paths in severity-1 incidents before application-only paths that have alternate full repository routes.

Explainer: Blocked Highway Between City Halls

Cluster channels are highways between city halls that share the phone book. One closed highway means some towns never get updated phone books and cannot deliver mail to addresses they no longer know.

Explain Like I'm Five: Stuck Cluster Channels

The tunnel between two playgrounds is blocked, so kids on one side never get the updated list of who moved to which playground—and they keep sending balls to the wrong place.

Practice Exercises

Exercise 1

CLUSDR in RETRY with RC 2540—list five checks in priority order.

Exercise 2

When is RESET CHANNEL justified versus forbidden?

Exercise 3

Draw impact diagram: one stuck channel from partial to full repo.

Frequently Asked Questions

Frequently Asked Questions

Test Your Knowledge

Test Your Knowledge

1. Stuck cluster channel often blocks:

  • Repository updates and remote puts
  • Only local GET
  • JES initiation
  • COBOL compile

2. CLUSDR stuck in BINDING—check:

  • Listener and CONNAME on partner
  • MAXDEPTH only
  • Topic retain
  • DISTL only

3. RESET CHANNEL risk:

  • Sequence number mismatch if partner not coordinated
  • Always safe anytime
  • Deletes all queues
  • Disables TLS forever

4. DISPLAY to compare definition vs runtime:

  • CLUSCH and CHSTATUS
  • DISPLAY JOB only
  • LISTCAT
  • IDCAMS
Published
Read time16 min
AuthorMainframeMaster
Verified: IBM MQ 9.3 documentation