What is a rolling upgrade in IBM MQ?

A rolling upgrade updates queue manager instances one at a time while others continue serving traffic—common with multi-instance queue managers on shared storage, Kubernetes StatefulSets with the MQ operator, or queue sharing group members on z/OS when procedures allow mixed maintenance.

Do rolling upgrades guarantee zero downtime?

They minimize downtime when applications reconnect automatically and HA is configured correctly. Misconfigured clients, single-instance queue managers, or long-running transactions can still cause visible outages. Plan and test failover behavior.

How do multi-instance queue managers support rolling upgrade?

Active and standby instances share storage. You upgrade standby first, fail over, upgrade former active, fail back—per IBM documented sequence. Shared disk and network must stay healthy throughout.

Can I rolling-upgrade a single standalone queue manager?

A single instance requires downtime for binary upgrade unless you run parallel queue managers and migrate traffic— that is migration, not classic rolling upgrade. Rolling implies multiple instances or orchestrated pod replacement.

What about Kubernetes MQ?

The MQ Operator and StatefulSet rolling update replace pods with newer image versions. PersistentVolumeClaims retain message data; readiness probes must pass before traffic returns. Follow operator version compatibility matrix.

MainframeMaster

Rolling Upgrades

Rolling upgrades are how large estates install IBM MQ 9.4 without a single global Sunday blackout: upgrade the standby multi-instance node first, fail over, upgrade the former active, roll Kubernetes pods one by one, or take queue sharing group members through maintenance in a sequence that keeps shared queues available. The phrase sounds like magic zero downtime, but reality depends on application reconnect, transaction length, channel retry, and whether you actually built HA or merely installed two binaries. A standalone queue manager on one VM cannot roll in the HA sense—you endmqm and accept outage. This tutorial explains rolling upgrade prerequisites, multi-instance step sequence at conceptual level, Kubernetes operator rolling updates, client and channel behavior during failover, QSG overview, testing failover before upgrade weekend, and common failures when standby was never validated.

Prerequisites for Rolling Upgrade

Proven HA: multi-instance, RDQM, QSG, or K8s with tested failover.
Applications use automatic client reconnect or connection naming that follows active instance.
Channels use reconnect-friendly CONNAME (e.g. VIP or DNS) not hard-coded failed node IP.
Short transactions—long UOW blocks clean handoff.
Non-prod rolling drill completed with same automation as prod.

Multi-Instance Rolling Sequence (Conceptual)

Confirm active and standby status with dspmq -m QM1 -x or platform equivalent.
Upgrade MQ binaries on standby host (or node) per migration guide.
Upgrade standby queue manager instance to new version.
Controlled failover so applications move to upgraded standby.
Upgrade former active host binaries and instance.
Fail back if required by runbook; validate symmetric versions.

Exact commands differ by release and platform—never run failover in production without matching IBM doc for your version. Shared storage must mount cleanly on both nodes; split-brain prevention is built into product but storage faults still hurt.

text

1
2
3
4
5
6
# Illustrative checks — verify IBM doc for your release
dspmq -m QM1 -x
DISPLAY QMSTATUS ALL
* After failover test in lab:
* - Application reconnect count
* - Channel RETRY events in AMQERR

Kubernetes and MQ Operator

StatefulSet rollingUpdate replaces pods with termination grace period. MQ container must quiesce: endmqm or operator preStop hook. PersistentVolumeClaim reattaches to new pod; recovery runs on strmqm. Readiness probe must wait until queue manager accepts connections—premature Service traffic causes 2059 storms. Upgrade operator chart and native image tag in coordinated steps per compatibility table. Native HA replicas behave similarly to multi-instance at logical level.

Rolling upgrade patterns by platform
Platform	Mechanism	Watch item
Linux multi-instance	Failover between nodes	Shared storage health
Kubernetes MQ	Pod rolling update	Probe and PVC bind
z/OS QSG	Member maintenance order	CF structures
Single VM	Not rolling—planned outage	Drain messages first

Channels During Failover

Message channels to remote queue managers may drop when local QM fails over. Partner sender channels enter RETRY; ensure SHORTTMR and LONGTMR allow recovery. Cluster channels may rebalance. SVRCONN clients with reconnect options retry to surviving listener IP. Channels hard-coded to a node IP that is down during upgrade fail until DNS or VIP moves—fix architecture before rolling weekend.

Transactions and Indoubt Work

In-doubt XA transactions survive failover but may need resolution after upgrade if coordinators disagree. Drain or complete transactions before upgrade when possible. Batch jobs holding syncpoint open across hours block clean handoff—schedule batch away from upgrade window.

Testing Before Production Roll

Failover active to standby in lab without upgrade—measure client recovery time.
Upgrade lab multi-instance using production runbook verbatim.
Run channel and application load test during failover.
Document maximum observed outage seconds for executives.

When Rolling Is Not Worth It

Small queue managers with tolerant maintenance windows may cost less with simple endmqm upgrade than building HA solely for rolling. Greenfield cloud may accept brief outage if no messages yet. Business decides; engineering documents honest outage seconds.

Explain Like I'm Five: Rolling Upgrades

Rolling upgrades are fixing one airplane engine while the other engine keeps flying—you swap carefully so the plane never falls, instead of turning off both engines at once.

Practice Exercises

Exercise 1

Execute lab multi-instance failover and record client reconnect behavior.

Exercise 2

Check whether production CONNAME uses IP or DNS/VIP.

Exercise 3

Write rolling upgrade runbook outline with rollback triggers.

Frequently Asked Questions

Test Your Knowledge

1. Rolling upgrade needs:

Multiple instances or pods
Only one QM ever
No clients
Deleted logs

2. Multi-instance upgrade order:

Follow IBM documented sequence
Random
Delete active first
Skip standby

3. Client reconnect matters because:

Failover must reach surviving instance
Channels ignore TCP
Logs stop
Topics vanish

4. Single standalone QM upgrade:

Usually has outage window
Never needs endmqm
No backup
No testing

Rolling Upgrades

Prerequisites for Rolling Upgrade

Multi-Instance Rolling Sequence (Conceptual)

Kubernetes and MQ Operator

Channels During Failover

Transactions and Indoubt Work

Testing Before Production Roll

When Rolling Is Not Worth It

Explain Like I'm Five: Rolling Upgrades

Practice Exercises

Exercise 1

Exercise 2

Exercise 3

Frequently Asked Questions

Frequently Asked Questions

Test Your Knowledge

Test Your Knowledge

Multi-Instance Queue Managers

StatefulSets

Failover

MQ Upgrades