Channel Retrying

Message channels between queue managers are long-lived relationships. When TCP fails, TLS rejects a certificate, or the partner listener is down, IBM MQ does not always give up immediately—the channel instance enters RETRY and the queue manager tries again on a schedule. For beginners watching a dashboard, RETRY looks healthier than INACTIVE because something is still trying, but retrying can also mask a permanent misconfiguration for hours while transmission queues fill. This troubleshooting-oriented tutorial complements the attribute reference on channel retries: what retrying looks like in CHSTATUS, how short and long phases feel in real time, how retry interacts with XMITQ depth and batching, what to log during incidents, and when to stop tuning SHORTRTY and fix CONNAME, firewall, or CHLAUTH instead.

RETRY in the Channel Lifecycle

A sender channel typically moves from INACTIVE to STARTING when work appears on its transmission queue or an operator issues START CHANNEL. It may pass through BINDING while TCP and TLS complete. RUNNING means batches flow. Any fatal error in those steps can drop the instance to RETRY: the previous socket is gone, timers are armed, and the next attempt waits for SHORTTMR or LONGTMR seconds. RECEIVER-side problems often appear as RETRY on the remote SDR because the sender drives the connect. Operators must look at both queue managers during incidents, not only the one showing red on the console.

Retry phase attributes (sender channel)
AttributeRoleTuning note
SHORTRTYCount of short attemptsHigher = more quick tries
SHORTTMRSeconds between short triesVery low loads listener
LONGRTYCount of long attemptsAfter short phase exhausts
LONGTMRSeconds between long triesOften minutes in production

What Triggers a Retry Attempt

  • Connection refused—no listener on CONNAME host:port.
  • Network timeout—firewall drops idle or new SYN packets.
  • TLS handshake failure—cipher or cert trust mismatch.
  • CHLAUTH or bind protocol failure—name or security policy.
  • Sequence number mismatch—partner out of sync after DR.
  • Channel stopped by operator mid-session—may restart per policy.

Each cause leaves different fingerprints in AMQERR and CHSTATUS LASTCHLERR. Retry logic does not distinguish a one-second blip from a wrong port—the same RETRY state appears until success or until retry counts exhaust and the channel goes INACTIVE or stops per your platform behavior. Learn to read LASTCHLERR before changing LONGTMR.

Observing Retry in Operations

shell
1
2
3
4
5
DISPLAY CHSTATUS('PARIS.TO.LONDON') ALL DISPLAY QSTATUS('SYSTEM.XMITQ.PARIS') CURDEPTH * On partner QM: DISPLAY LSSTATUS('TCP.LISTENER') ALL DISPLAY CHSTATUS('PARIS.TO.LONDON') ALL

CURDEPTH on the transmission queue shows business impact: messages accumulate while RETRY continues. If depth was zero and RETRY still loops, the fault may be startup-only (listener down overnight). If depth climbs during business hours, SLA for partner delivery is at risk—escalate network and partner ops, not only MQ. Capture one full retry cycle timestamp to see whether SHORTTMR or LONGTMR applies.

Short Versus Long Retry in Practice

Imagine SHORTRTY(10) SHORTTMR(60): ten attempts one minute apart cover roughly ten minutes of quick recovery after a brief router reboot. Then LONGRTY(999) LONGTMR(600) might try every ten minutes for days—appropriate for a disaster recovery site that will be offline for hours but misleading if CONNAME points to a decommissioned host forever. Document expected recovery time objectives per channel class: payment rails may need aggressive short retry; batch file transfer may use long intervals to avoid hammering a fragile partner. Changing timers without communication can trigger partner security alerts from repeated TLS handshakes.

Retry Versus Message Safety

Retry reconnects the channel; it does not delete messages on XMITQ. Persistent messages remain until successfully transferred or moved by administrative action. Non-persistent messages may be lost if the channel cannot run before process restarts depending on scenario—know your DEFPSIST policy. Sequence numbers on reconnect determine whether in-flight batches replay—coordinate with the sequence number tutorials when RETRY follows unclean disconnect. Do not assume RETRY means duplicate delivery; protocol handles commit points when both sides are healthy.

When Retry Helps and When It Hurts

Helpful: planned firewall failover, partner listener bounce, transient DNS glitch. Harmful: permanent wrong CONNAME, expired cert with no renewal, CHLAUTH BLOCK on new partner—RETRY only generates log noise and connection load. Harmful: both sides restored from backup at different sequence points—retry without RESET may never reach RUNNING. Operational discipline: after N hours in RETRY with unchanged LASTCHLERR, open a problem record instead of raising LONGRTY again.

Incident Checklist During RETRY

  1. DISPLAY CHSTATUS ALL—filter STATUS(RETRY).
  2. Note LASTCHLERR and time of last attempt.
  3. Verify CONNAME resolves and port reachable (telnet or test tool).
  4. Partner listener and channel RUNNING?
  5. TLS cert expiry and cipher overlap both sides.
  6. CHLAUTH events since last change window.
  7. XMITQ depth trend and age of oldest message.
  8. Recent DR or RESET on either queue manager.

Explainer: Bus That Keeps Coming Back

Channel retrying is like a bus that missed you but returns on a schedule to try again. If you are waiting at the wrong stop, more buses do not help—you need the right address on the timetable (CONNAME).

Explain Like I'm Five: Channel Retrying

When MQ cannot talk to its friend, it waits a little while and tries calling again, again, and again until the friend answers or grown-ups fix the phone.

Practice Exercises

Exercise 1

Given RETRY with LASTCHLERR connection refused, list five checks in order on sender and receiver QMs.

Exercise 2

Propose SHORTRTY/SHORTTMR/LONGTMR for a channel that should recover within 5 minutes of a blip but avoid hammering a partner during a 4-hour maintenance window.

Exercise 3

Explain to a manager why raising retries does not replace fixing an expired certificate.

Frequently Asked Questions

Frequently Asked Questions

Test Your Knowledge

Test Your Knowledge

1. Channel in RETRY means:

  • Scheduled reconnect attempt
  • All messages lost
  • QMGR stopped
  • Only clients affected

2. Short retry count attribute is:

  • SHORTRTY
  • MAXDEPTH
  • DEFPSIST
  • BOQNAME

3. Growing XMITQ during RETRY suggests:

  • Messages queuing for partner
  • Successful delivery
  • Empty cluster
  • DLQ not needed

4. First check on chronic RETRY is usually:

  • CONNAME listener TLS CHLAUTH
  • Increase MAXMSGL only
  • Delete all queues
  • Disable persistence
Published
Read time19 min
AuthorMainframeMaster
Verified: IBM MQ 9.3 documentation