Cluster Cache Issues

A partial repository queue manager does not hold the entire cluster catalog in working memory forever—it maintains a cache of cluster queues, queue managers, channels, and related attributes sufficient to route puts and open cluster queues without contacting a full repository on every operation. Cluster cache issues appear when that local view is stale, incomplete, or evicted entries cause the workload algorithm to ignore valid instances. Operators see DISPLAY CLUSQ listing one host for PAYMENT.IN while the full repository shows three, or traffic skews because CLWL attributes on a missing instance are invisible to the putting member. Cache problems often follow repository channel outages, aggressive CLWLMRUC tuning in very large clusters, rapid object churn during deployments, or restart order where applications start before cluster sync completes. This tutorial distinguishes cache lag from true repository inconsistency, explains partial repository behavior, diagnostic DISPLAY comparisons, CLWLMRUC and cache sizing cautions, recovery with REFRESH CLUSTER and channel restart, and monitoring to catch cache drift before applications do.

Partial Member Cache Lifecycle

On join, the queue manager exchanges cluster definitions with full repository hosts over cluster channels. Updates arrive when objects are defined, altered, or deleted anywhere in the cluster. The partial member merges updates into its cache. Applications issuing MQPUT to a cluster queue consult cached knowledge of instances and channel paths. If updates stop arriving—because CLUSSDR to the full repository is INACTIVE or network failed—the cache freezes at last known state. Local DEFINE of a new cluster queue on the partial member still publishes outward, but inbound updates from others halt, producing asymmetric knowledge.

Cache issue versus inconsistency
SignalLikely cache issueLikely inconsistency
Full repos agree; one partial differsYesNo
Both full repos differNoYes
After repo channel outageYesMaybe later
New queue missing everywhereNoNo—DEFINE problem
Wrong CLWL on cached instance onlyYes until refreshIf repos disagree on attrs

Diagnostic Steps

shell
1
2
3
4
5
6
7
8
* Symptomatic partial member: DISPLAY QMGR REPOSNL CLUSTER DISPLAY CLUSQ('PAYMENT.IN') CLUSTER('FIN') ALL DISPLAY CLUSQMGR(*) CLUSTER('FIN') DISPLAY CHSTATUS(*) WHERE(CHLTYPE EQ CLUSSDR) * Full repository: DISPLAY CLUSQ('PAYMENT.IN') CLUSTER('FIN') ALL * Compare instance count and CLWLPRTY values

If full repository lists QM_LON, QM_PAR, and QM_NYC for PAYMENT.IN but partial lists only QM_LON, cache is incomplete. Check whether channel QM_PARTIAL.QM_REPO is RUNNING and whether LASTMSGDT is recent. High MSGS on repository traffic channels during steady state suggests healthy sync; zero traffic for hours with frequent object changes elsewhere indicates stuck cache feed.

CLWLMRUC and Large Clusters

Very large clusters with hundreds of members stress cache limits. CLWLMRUC and related controls on your release cap how many remote queue manager entries are retained for workload decisions. When the cap is too low for your estate, the cache may drop distant members even though they remain valid in the full repository—puts then favor visible members only, mimicking workload misconfiguration. Raising limits consumes memory; lowering without analysis causes silent skew. Document cluster size growth reviews and test workload distribution after changing cache-related queue manager attributes in a lab clone.

Recovery Playbook

  1. Restore cluster channels between partial member and full repositories—START CHANNEL, fix firewall, fix CHLAUTH.
  2. Verify full repositories are healthy and consistent with each other first.
  3. Issue REFRESH CLUSTER on the partial member per IBM guidance for your platform.
  4. Recompare DISPLAY CLUSQ until instance lists match full repository.
  5. Replay application test puts; confirm CURDEPTH changes on expected members.
  6. Add monitoring alert when CLUSQ instance count drops below baseline.

Application-Visible Symptoms

MQRC_UNKNOWN_OBJECT_NAME or routing to a single site while peers sit idle often trace to cache—not application bugs. Bind options can mask cache problems temporarily by pinning to one instance. After cache refresh, applications with DEFBIND NOTFIXED may suddenly spread traffic—warn application owners before refresh in production. Pub/sub over clusters shows similar symptoms when topic metadata cache lags; DISPLAY CLUSTOPIC comparisons mirror CLUSQ technique.

Explainer: Outdated Map in Your Pocket

The full repository prints the official city map. Your partial member carries a photocopy in a pocket. Cache issues mean you have not picked up this week's edition—so you drive to a store that closed Tuesday while the official map already shows the new mall.

Explain Like I'm Five: Cluster Cache

Your friend has the newest list of which kids are in the club, but you still have yesterday's paper—you might invite someone who already left because your list is old.

Practice Exercises

Exercise 1

Describe how you prove a problem is cache lag rather than full repository split.

Exercise 2

After repository channel failure, list recovery steps in order.

Exercise 3

Why might CLWL tuning fail when cache omits an instance?

Frequently Asked Questions

Frequently Asked Questions

Test Your Knowledge

Test Your Knowledge

1. Partial repository cache is:

  • Local copy from full repo
  • All message bodies
  • JES spool
  • COBOL copybook

2. Cache lag symptom:

  • Fewer CLUSQ instances than full repo
  • Higher MAXDEPTH only
  • TLS always off
  • Topic retain only

3. Before REFRESH CLUSTER fix:

  • Verify channels to repository
  • Delete all queues
  • Disable OAM
  • Remove CLUSNL

4. Workload uses cache for:

  • Choosing cluster queue instance
  • Compiling COBOL
  • Batch job class
  • FTP port
Published
Read time16 min
AuthorMainframeMaster
Verified: IBM MQ 9.3 documentation