Message channels between queue managers are long-lived relationships. When TCP fails, TLS rejects a certificate, or the partner listener is down, IBM MQ does not always give up immediately—the channel instance enters RETRY and the queue manager tries again on a schedule. For beginners watching a dashboard, RETRY looks healthier than INACTIVE because something is still trying, but retrying can also mask a permanent misconfiguration for hours while transmission queues fill. This troubleshooting-oriented tutorial complements the attribute reference on channel retries: what retrying looks like in CHSTATUS, how short and long phases feel in real time, how retry interacts with XMITQ depth and batching, what to log during incidents, and when to stop tuning SHORTRTY and fix CONNAME, firewall, or CHLAUTH instead.
A sender channel typically moves from INACTIVE to STARTING when work appears on its transmission queue or an operator issues START CHANNEL. It may pass through BINDING while TCP and TLS complete. RUNNING means batches flow. Any fatal error in those steps can drop the instance to RETRY: the previous socket is gone, timers are armed, and the next attempt waits for SHORTTMR or LONGTMR seconds. RECEIVER-side problems often appear as RETRY on the remote SDR because the sender drives the connect. Operators must look at both queue managers during incidents, not only the one showing red on the console.
| Attribute | Role | Tuning note |
|---|---|---|
| SHORTRTY | Count of short attempts | Higher = more quick tries |
| SHORTTMR | Seconds between short tries | Very low loads listener |
| LONGRTY | Count of long attempts | After short phase exhausts |
| LONGTMR | Seconds between long tries | Often minutes in production |
Each cause leaves different fingerprints in AMQERR and CHSTATUS LASTCHLERR. Retry logic does not distinguish a one-second blip from a wrong port—the same RETRY state appears until success or until retry counts exhaust and the channel goes INACTIVE or stops per your platform behavior. Learn to read LASTCHLERR before changing LONGTMR.
12345DISPLAY CHSTATUS('PARIS.TO.LONDON') ALL DISPLAY QSTATUS('SYSTEM.XMITQ.PARIS') CURDEPTH * On partner QM: DISPLAY LSSTATUS('TCP.LISTENER') ALL DISPLAY CHSTATUS('PARIS.TO.LONDON') ALL
CURDEPTH on the transmission queue shows business impact: messages accumulate while RETRY continues. If depth was zero and RETRY still loops, the fault may be startup-only (listener down overnight). If depth climbs during business hours, SLA for partner delivery is at risk—escalate network and partner ops, not only MQ. Capture one full retry cycle timestamp to see whether SHORTTMR or LONGTMR applies.
Imagine SHORTRTY(10) SHORTTMR(60): ten attempts one minute apart cover roughly ten minutes of quick recovery after a brief router reboot. Then LONGRTY(999) LONGTMR(600) might try every ten minutes for days—appropriate for a disaster recovery site that will be offline for hours but misleading if CONNAME points to a decommissioned host forever. Document expected recovery time objectives per channel class: payment rails may need aggressive short retry; batch file transfer may use long intervals to avoid hammering a fragile partner. Changing timers without communication can trigger partner security alerts from repeated TLS handshakes.
Retry reconnects the channel; it does not delete messages on XMITQ. Persistent messages remain until successfully transferred or moved by administrative action. Non-persistent messages may be lost if the channel cannot run before process restarts depending on scenario—know your DEFPSIST policy. Sequence numbers on reconnect determine whether in-flight batches replay—coordinate with the sequence number tutorials when RETRY follows unclean disconnect. Do not assume RETRY means duplicate delivery; protocol handles commit points when both sides are healthy.
Helpful: planned firewall failover, partner listener bounce, transient DNS glitch. Harmful: permanent wrong CONNAME, expired cert with no renewal, CHLAUTH BLOCK on new partner—RETRY only generates log noise and connection load. Harmful: both sides restored from backup at different sequence points—retry without RESET may never reach RUNNING. Operational discipline: after N hours in RETRY with unchanged LASTCHLERR, open a problem record instead of raising LONGRTY again.
Channel retrying is like a bus that missed you but returns on a schedule to try again. If you are waiting at the wrong stop, more buses do not help—you need the right address on the timetable (CONNAME).
When MQ cannot talk to its friend, it waits a little while and tries calling again, again, and again until the friend answers or grown-ups fix the phone.
Given RETRY with LASTCHLERR connection refused, list five checks in order on sender and receiver QMs.
Propose SHORTRTY/SHORTTMR/LONGTMR for a channel that should recover within 5 minutes of a blip but avoid hammering a partner during a 4-hour maintenance window.
Explain to a manager why raising retries does not replace fixing an expired certificate.
1. Channel in RETRY means:
2. Short retry count attribute is:
3. Growing XMITQ during RETRY suggests:
4. First check on chronic RETRY is usually: