Active/passive high availability is the pattern enterprises choose when one logical messaging hub must survive server loss without running two writers against the same data at once. The active node accepts connections, runs channels, and writes logs. The passive node—standby—stays ready but does not serve application traffic until failover promotes it. IBM MQ implements this on distributed platforms primarily through multi-instance queue managers with shared networked storage, and through Native HA and RDQM with replicated data instead of shared disks. On z/OS, queue sharing groups and sysplex design provide different flavors of resilience. Beginners confuse active/passive with clustering; clustering spreads work across many queue managers, while active/passive protects one name. This tutorial defines the model, maps it to IBM MQ products, explains RTO and RPO, covers fencing and split-brain risk, and compares mainframe and distributed deployments so architects can label diagrams correctly in design reviews.
| Role | What it does | On partner failure |
|---|---|---|
| Active | MQCONN, channels, puts, gets, logging | Standby may promote |
| Passive (standby) | Monitors active, holds ready state | Becomes active candidate |
| Witness / coordinator | Votes in some HA stacks | Failover policy dependent |
Passive does not mean powered off. Standby processes often run mqsi in standby mode, mount shared storage read-only or with coordinated access, and participate in health checks. Cold standby—spare hardware with manual strmqm—is cheaper but worse RTO.
Queue manager logs and queue files assume a single writer ordering recovery. Two active instances writing the same log without coordination corrupt persistent messages and destroy auditability. Active/passive enforces single-writer semantics. Active/active requires either partitioned data (separate queue managers) or replication technology that merges streams safely—different products and operations.
| Option | Storage model | Typical site |
|---|---|---|
| Multi-instance | Shared disk | Traditional enterprise Linux/Windows |
| RDQM | Replicated volumes | Cloud and Linux HA without SAN |
| Native HA | Product-specific replication | Modern container deployments |
| Manual cold standby | Copy or restore | DR only, poor RTO |
Recovery time objective is how long applications may be down during failover—includes detection, promotion, log replay, and client reconnect. Recovery point objective is how much data loss is acceptable—persistent messages with synced logs target zero; non-persistent traffic may be lost by design. Active/passive with good storage usually protects persistent RPO; RTO depends on automation versus manual runbooks.
Applications should use reconnect options and connect to the queue manager name, not hard-coded hostname of the old active server. CCDT and DNS aliases help swing traffic. Idempotent consumers tolerate redelivery after failover when in-doubt transactions resolve ambiguously. Long-running XA transactions may block failover until coordinators decide—design shorter units of work where possible.
Fencing stops the failed active from writing after standby promotes—STONITH, SCSI reservations, or cluster manager integration. Without fencing, a network partition can leave two actives; logs corrupt. Operations drills should include partitioned-network scenarios, not only clean power loss.
Active/passive is one driver at the wheel while a trained relief driver waits in the passenger seat. If the driver becomes ill, the relief takes the wheel—only one steers at a time so the car does not crash.
One toy cash register is open. Another is closed but ready. If the first breaks, the second opens—only one register takes money so the toy books stay correct.
Compare active/passive multi-instance to active/active cluster for a hub-and-spoke bank integration.
Define RTO and RPO for a payment queue with persistent messages and 2035-sensitive consumers.
List three split-brain prevention mechanisms and which HA products use them.
1. Active/passive means:
2. Multi-instance is:
3. RTO measures:
4. Split brain is dangerous because: