Repository Inconsistencies

IBM MQ clusters depend on a shared catalog: which queue managers belong, which cluster queue names exist on which members, and which cluster channels connect them. That catalog lives primarily on full repository queue managers and is copied to partial repositories and local caches. A repository inconsistency is any durable disagreement between members about those facts—QM_PARIS still believes ORDERS.IN exists only on QM_LONDON while QM_LONDON and the full repository show instances on both sites, or a cluster sender channel appears on one member but not another after a rushed weekend change. Symptoms reach applications as cluster puts to unknown object, messages arriving on an unexpected queue manager, auto-defined channels that never materialize, or workload algorithms that see only one instance when two are defined. This troubleshooting tutorial explains how inconsistencies form, how to compare repository views with DISPLAY commands, when REFRESH CLUSTER helps versus when it masks a deeper fault, split-brain scenarios involving two full repositories, repair playbooks for operations teams, and prevention habits so beginners do not treat the cluster repository as magic that self-heals without monitoring.

How the Cluster Catalog Stays in Sync

When you DEFINE QLOCAL with CLUSTER(name), the defining queue manager publishes the definition toward full repository members. Repository manager processes on full and partial hosts exchange updates over cluster channels—typically CLUSSDR and CLUSRCVR pairs that may be auto-defined. Partial members retain a cache subset sufficient for routing decisions; they are not authoritative. Two full repositories should hold matching authoritative data; IBM recommends two on independent infrastructure. If one full repository is down, the survivor continues publishing, but prolonged single-repo operation increases risk during the next partition or failed sync event.

Common inconsistency patterns
PatternSymptomFirst check
Missing CLUSQMGR on partial memberRemote puts fail or route wrongDISPLAY CLUSQMGR on partial vs full repo
Queue on one member only in repoWorkload sees single instanceDISPLAY CLUSQ(name) on all members
Stale channel after DELETE on one nodeBINDING to dead CONNAMEDISPLAY CLUSCH on both full repos
Split full repositoriesDifferent object lists per full repoCompare both REPOS hosts immediately
Member left cluster but cache remainsGhost routes to old QMCLUSNL and REPOSNL membership

Diagnostic MQSC Sequence

shell
1
2
3
4
5
6
7
8
9
10
* On putting queue manager: DISPLAY QMGR CLUSTER CLUSNL REPOS REPOSNL DISPLAY CLUSQMGR(*) CLUSTER('SALES') DISPLAY CLUSQ('ORDERS.IN') CLUSTER('SALES') DISPLAY CLUSCH(*) CLUSTER('SALES') * On each full repository host - compare output: DISPLAY CLUSQMGR('QM_PARIS') CLUSTER('SALES') DISPLAY CLUSQ('ORDERS.IN') CLUSTER('SALES') ALL * Channel path to repository: DISPLAY CHSTATUS('QM_LON.QM_REPO') WHERE(CHLTYPE EQ CLUSSDR)

Capture output from at least one full repository and from the symptomatic partial member in the same minute during an incident. Diff the CLUSQMGR list first—missing members explain many routing failures. Then diff CLUSQ for the affected queue name—attribute differences in CLWLPRTY matter for workload but missing rows mean catalog drift. CLUSCH differences show channel auto-definition failures or manual deletes that did not propagate.

REFRESH CLUSTER and Repair Commands

REFRESH CLUSTER on a queue manager instructs the repository manager to republish cluster definitions according to IBM rules for your command variant—repository refresh versus security refresh are different scopes on some releases; read the command help before typing in production. Use refresh after correcting CLUSTER attributes, re-joining a member, or restoring a full repository from backup—not as the first action when you have not compared catalogs. REFRESH CLUSTER TYPE(REPOS) or equivalent on your platform may be required; never assume z/OS syntax matches distributed without checking. Schedule refresh during low traffic when possible because routing may fluctuate briefly while caches update.

shell
1
2
3
4
5
* Example - confirm exact TYPE values on your release: REFRESH CLUSTER('SALES') * After planned rejoin of partial member: ALTER QMGR CLUSTER('SALES') CLUSNL('SALES') START CHANNEL(CLUSRCVR) * if needed per site standards

Root Causes Operations See Often

  • Network partition between full repositories—each may accept writes; reconcile with IBM support playbooks for severe cases.
  • One full repository down for days—partial caches age; verify survivor health before restart of failed host.
  • Manual MQSC on one node only—DELETE CHANNEL or ALTER without cluster discipline.
  • Wrong CLUSTER name typo—object publishes to unintended cluster catalog.
  • Queue manager renamed or cloned from backup with old repository state—never clone QMgr data without cluster procedure.
  • Firewall change blocking cluster channels to repository—catalog stops updating while apps still run locally.

Split Full Repository Response

If DISPLAY on REPOS host A and REPOS host B show different CLUSQMGR counts or conflicting attributes for the same object, treat as severity-1. Stop uncontrolled DEFINE and ALTER on cluster objects until a designated lead picks the authoritative source—often the survivor that stayed connected to the majority of application members. Document every object difference. IBM has advanced recovery guidance for repository queue contents on SYSTEM.CLUSTER.REPOSITORY.QUEUE; involve experienced administrators before manual message manipulation on repository queues. Prevention beats cure: monitor both full repositories and alert on channel loss between them.

Explainer: Two Address Books Out of Date

The cluster repository is a shared address book every office copies. Inconsistency means London's book lists a warehouse Paris tore down last week—mail still goes to the wrong dock until someone reprints every copy the same way.

Prevention Checklist

  1. Two full repositories on separate failure domains.
  2. Automated DISPLAY CLUSQMGR count comparison between full repos hourly.
  3. Change control requires cluster impact section for MQSC.
  4. Lab test REFRESH CLUSTER before first production use.
  5. Document golden DISPLAY output after known-good cluster build.

Explain Like I'm Five: Repository Inconsistencies

Everyone in class is supposed to have the same list of who is in the club, but one kid has an old list with someone who already moved away—so invitations go to the wrong house until the teacher prints new lists for everyone.

Practice Exercises

Exercise 1

Write a five-command DISPLAY script to compare two full repositories for cluster SALES.

Exercise 2

List three actions that cause inconsistency and three that only fix symptoms.

Exercise 3

ORDERS.IN exists on QM2 in MQSC but DISPLAY CLUSQ on QM1 shows one instance—what do you check next?

Frequently Asked Questions

Frequently Asked Questions

Test Your Knowledge

Test Your Knowledge

1. Repository inconsistency means:

  • Members disagree on cluster definitions
  • Messages are always encrypted
  • Queues have MAXDEPTH 0
  • TLS is disabled

2. Full repository role is set with:

  • ALTER QMGR REPOS
  • DEFINE DLQ only
  • START LISTENER only
  • DELETE QLOCAL

3. REFRESH CLUSTER should be used:

  • After fixing root cause with care
  • Every minute always
  • Instead of cluster channels
  • To delete all queues

4. Compare catalogs with:

  • DISPLAY CLUSQMGR on multiple members
  • FTP only
  • JES CLASS
  • COBOL COPY
Published
Read time17 min
AuthorMainframeMaster
Verified: IBM MQ 9.3 documentation