What is a channel retry loop in IBM MQ?

A retry loop is when a message channel repeatedly enters RETRY status—connect or bind fails, the queue manager waits SHORTRTY or LONGTMR seconds, tries again, fails again, and cycles. Meanwhile the transmission queue may keep accepting messages until MAXDEPTH, creating a backlog incident.

Is a retry loop always a network problem?

No. Wrong CONNAME, listener down, TLS certificate expired, CHLAUTH block, sequence number mismatch, and partner queue manager stopped all produce retry loops. Network issues are common but configuration and security causes are equally frequent.

Should I reset the channel to break a retry loop?

RESET CHANNEL can clear a stuck instance after you fix the root cause. Resetting without fixing configuration only restarts the loop. Use RESET together with AMQERR analysis and partner checks, not as the only action.

How do retry loops affect XMITQ depth?

While the channel cannot run, messages accumulate on the transmission queue. Producers putting to remote queues via QREMOTE still add to XMITQ. A retry loop plus high put rate fills XMITQ and can block the integration path with queue full 2053.

What is LASTCHLERR in a retry loop?

LASTCHLERR on DISPLAY CHSTATUS stores the last channel error code and reason from the failed attempt. It is the fastest clue whether the loop is TLS, auth, or TCP—compare both ends at the same timestamp.

MainframeMaster

Channel Retry Loops

A channel retry loop is what operations sees when DISPLAY CHSTATUS shows RETRY hour after hour, the transmission queue depth climbs, and AMQERR fills with the same channel error every few minutes. IBM MQ is doing what it was configured to do—schedule another connect after SHORTRTY or LONGTMR—but the underlying fault never cleared. Beginners increase retry counts or restart the queue manager; experienced teams read LASTCHLERR once, fix TLS or CONNAME, and the loop ends on the next successful bind. This tutorial explains how retry loops form, how short and long retry phases interact, why XMITQ backlog is the business impact, how to distinguish flapping network from wrong certificate, when RESET CHANNEL helps, and how to prevent loops through monitoring and change control.

Anatomy of a Retry Loop

A sender channel (CHLTYPE SDR) reads messages from its XMITQ and opens a session to the partner receiver. If TCP fails, TLS handshake fails, or MQ channel negotiation fails, the instance moves to RETRY. The queue manager increments retry counters and waits. When the timer expires, MQ tries again. If nothing changed on the network or configuration, the same failure occurs—another RETRY. This is a loop: same channel name, same error family, predictable interval. Loops differ from a single retry after a brief blip; loops last beyond your incident threshold and correlate with monotonic XMITQ depth increase.

Retry timer attributes (sender channel)
Attribute	Phase	Effect on loop
SHORTRTY	Short retry count	How many quick attempts before long phase
SHORTTMR	Short retry interval	Seconds between early retries—fast loop if low
LONGRTY	Long retry count	Additional attempts after short phase exhausted
LONGTMR	Long retry interval	Slower loop—still endless if fault remains

Symptoms Operations Notices

DISPLAY CHSTATUS shows STATUS(RETRY) with rising retry count.
XMITQ CURDEPTH near MAXDEPTH; remote applications may see indirect backlog.
AMQ9208 or SSL-related AMQ messages repeat at SHORTTMR or LONGTMR cadence.
Partner receiver shows no RUNNING instance or listener never sees connect.
Monitoring alerts on channel not RUNNING and queue depth percentage.

Common Root Causes

Network and listener

Firewall rule removed, wrong port in CONNAME, listener STOPPED, or DNS pointing to decommissioned host. TCP timeout produces retry loops that look like network outages. Verify telnet or nc to partner port from the sending host during the incident—not from your laptop unless that matches the channel path.

TLS and certificates

Expired personal certificate, missing intermediate CA, cipher mismatch on SSLCIPH, or SSLCAUTH REQUIRED without client cert. LASTCHLERR and AMQ9638-class messages point here. Fixing retry timers does not renew a certificate.

Channel authentication

CHLAUTH rules block partner IP, QMNAME, or SSLPEER DN. AMQERR often names the rule. The loop continues until the rule is corrected or the partner presents the expected identity.

Sequence numbers and partner state

After restore or DR, sequence number mismatch prevents RUNNING. RESET CHANNEL on both sides may be required per runbook after confirmed consistent backup state—not as a first action.

Breaking the Loop: Triage Steps

DISPLAY CHSTATUS(channel) ALL — note STATUS, LASTCHLERR, CONNAME, SSL attributes.
Read AMQERR at last retry timestamp on both queue managers if accessible.
Verify listener and network path from sending host.
Compare channel definitions: name, CHLTYPE pair, TLS settings.
Fix root cause; test one manual START CHANNEL or wait for next retry cycle.
RESET CHANNEL if instance stuck after fix; confirm RUNNING and XMITQ draining.
Review XMITQ MAXDEPTH and alerting if outage duration was longer than design.

shell

1
2
3
4
5
6
DISPLAY CHSTATUS('PARIS.TO.LONDON') ALL
DISPLAY QSTATUS('QM_LONDON.XMIT') CURDEPTH MAXDEPTH
tail -30 /var/mqm/qmgrs/QM_PARIS/errors/AMQERR01.log
* After fix:
RESET CHANNEL('PARIS.TO.LONDON')
START CHANNEL('PARIS.TO.LONDON')

Retry Loops vs Increasing Retries

Raising SHORTRTY or LONGRTY tolerates longer partner outages during planned maintenance—it does not fix wrong configuration. Use higher retries when business approved outage windows exceed current LONGRTY times LONGTMR total wait. Document maximum acceptable XMITQ depth for that wait. If loops run indefinitely in production without planned outage, treat as misconfiguration, not capacity tuning.

Explainer: Hamster Wheel

A retry loop is a hamster wheel: the channel keeps running in place—RETRY—without delivering mail. Mail piles up on the cart (XMITQ) beside the wheel. Stop fixing the wheel speed; open the door (fix TLS, network, or auth) so the hamster can exit to RUNNING.

Prevention

Alert on channel not RUNNING longer than N minutes.
Certificate expiry monitoring thirty days ahead.
CHLAUTH change review with partner IP and DN validation.
XMITQ depth percent thresholds tied to channel status dashboards.
Post-incident review: was LASTCHLERR read before first RESET?

Explain Like I'm Five: Channel Retry Loops

Your toy car tries to drive to a friend's house but the bridge is out. Every few minutes it tries again and cannot cross. Toys pile up in the car trunk because they cannot be delivered. Fix the bridge—not the timer on how often the car tries.

Practice Exercises

Exercise 1

Lab: break CONNAME, observe RETRY and XMITQ depth over three SHORTTMR cycles. Record LASTCHLERR.

Exercise 2

Write runbook: retry loop with AMQ9638 vs AMQ9208 different actions.

Exercise 3

Calculate XMITQ messages accumulated during 4 hours of retry at 200 msg/sec put rate.

Frequently Asked Questions

Test Your Knowledge

1. Retry loop means channel keeps:

Entering RETRY after failed connect
Running forever
Deleting XMITQ
Disabling TLS

2. Fix root cause before:

RESET CHANNEL only
DELETE QMGR
Removing all queues
Disabling logs

3. XMITQ grows during retry loop because:

Messages cannot be transmitted
Consumers too fast
MAXDEPTH is zero
DLQ disabled

4. LASTCHLERR helps identify:

Last failure reason on channel
Queue CURDEPTH only
LDAP port
JCL class

Channel Retry Loops

Anatomy of a Retry Loop

Symptoms Operations Notices

Common Root Causes

Network and listener

TLS and certificates

Channel authentication

Sequence numbers and partner state

Breaking the Loop: Triage Steps

Retry Loops vs Increasing Retries

Explainer: Hamster Wheel

Prevention

Explain Like I'm Five: Channel Retry Loops

Practice Exercises

Exercise 1

Exercise 2

Exercise 3

Frequently Asked Questions

Frequently Asked Questions

Test Your Knowledge

Test Your Knowledge

Channel Retrying

Queue Full

AMQERR Logs

Stuck Channels