Production messaging often cannot wait for a human to run strmqm on a cold spare after a server dies. Multi-instance queue managers address that on distributed IBM MQ by running one logical queue manager on two computers with shared storage. When the active node fails, standby takeover brings the same queue manager name back online with the same queues and logs—applications reconnect to the familiar name. This pattern is active/passive high availability, not horizontal scale-out. This page explains architecture, storage requirements, mqsi and instance names, failover behavior, fencing concepts, client reconnection, and how multi-instance compares to Native HA and RDQM so beginners can read vendor diagrams confidently.
At steady state one instance is active: it accepts MQCONN, runs channels, and writes logs. The standby instance monitors the active and holds no application connections. If heartbeat or health checks fail, standby promotion runs recovery on the shared logs and starts the queue manager process as the new active. The former active node must not resume writing—IBM MQ uses coordination (often via the file system or cluster manager integration) to avoid split brain, where two instances corrupt the same log.
| Approach | Storage model | Typical use |
|---|---|---|
| Multi-instance (MIQM) | Shared networked FS | |
| Native HA | Replicated QM data | |
| RDQM | Replicated data QM | |
| Cluster + multiple QMs | Per-manager disks |
Persistent messages and object definitions for a queue manager live in its data directory and logs. Failover is fast when the standby already sees those files—no bulk copy. The tradeoff is storage infrastructure: NFS or SAN must be reliable, correctly mounted on both nodes with compatible locking, and sized for peak log growth during outages when both nodes may not run compaction simultaneously. Storage outages become messaging outages; monitor the array like you monitor the queue manager.
You create the queue manager once on shared storage, then register an instance on each host with a unique instance name (for example on node1 and node2). mqsi starts an instance; one becomes active. DISPLAY QMSTATUS and multi-instance status commands (per your release) show which node is active. Automation integrates with Pacemaker, PowerHA, or Windows Server Failover Cluster on some platforms—follow IBM's guide for your OS version.
Duration depends on log size, storage latency, and number of in-doubt transactions. Long-running units of work extend recovery. Operations rehearse failover quarterly because DNS, firewall, and client reconnect timeouts only surface during real tests.
Use the queue manager name in MQCONN, not the hostname of the current active node—unless you deliberately use a floating IP or hostname that follows failover. Sender channels to partners should use CONNAME that resolves to a virtual IP or load balancer where appropriate. In-flight transactions at failover may return to applications as reason codes requiring reconnect and retry; idempotent consumers handle duplicate delivery if puts were uncertain.
123456789101112131415# Shared mount example: /mqshared/qmgrs/PAY.EU1.PROD on both nodes # Create queue manager on shared path (once) crtmqm -ld /mqshared/log -md /mqshared/qmgrs PAY.EU1.PROD # Add instances (syntax varies by version—see IBM doc) # crtmqm -sx -ld /mqshared/log -md /mqshared/qmgrs -qn PAY.EU1.PROD -inst node1 # crtmqm -sx -ld /mqshared/log -md /mqshared/qmgrs -qn PAY.EU1.PROD -inst node2 # Start standby then active (order per IBM guide) mqsi PAY.EU1.PROD -n node2 mqsi PAY.EU1.PROD -n node1 # After failover, verify on survivor echo "DISPLAY QMSTATUS" | runmqsc PAY.EU1.PROD
Two firefighters share one fire truck parked in a garage they both can reach (shared storage). Only one drives at a time (active). If the driver feels sick, the passenger takes the wheel (failover) using the same truck and equipment—no new truck bought during the emergency.
List three questions to ask your storage team before MIQM go-live.
Your Java client hardcodes active node hostname. What breaks on failover? How do you fix it?
When would you choose RDQM over multi-instance for two datacenters?
1. In multi-instance, how many instances are active at once?
2. Shared storage is required because:
3. Applications after failover typically:
4. RDQM differs from classic multi-instance by: