Multi-Instance Queue Managers

Production messaging often cannot wait for a human to run strmqm on a cold spare after a server dies. Multi-instance queue managers address that on distributed IBM MQ by running one logical queue manager on two computers with shared storage. When the active node fails, standby takeover brings the same queue manager name back online with the same queues and logs—applications reconnect to the familiar name. This pattern is active/passive high availability, not horizontal scale-out. This page explains architecture, storage requirements, mqsi and instance names, failover behavior, fencing concepts, client reconnection, and how multi-instance compares to Native HA and RDQM so beginners can read vendor diagrams confidently.

Active and Standby Roles

At steady state one instance is active: it accepts MQCONN, runs channels, and writes logs. The standby instance monitors the active and holds no application connections. If heartbeat or health checks fail, standby promotion runs recovery on the shared logs and starts the queue manager process as the new active. The former active node must not resume writing—IBM MQ uses coordination (often via the file system or cluster manager integration) to avoid split brain, where two instances corrupt the same log.

HA approaches in IBM MQ (simplified)
ApproachStorage modelTypical use
Multi-instance (MIQM)Shared networked FS
Native HAReplicated QM data
RDQMReplicated data QM
Cluster + multiple QMsPer-manager disks

Explainer: Why Shared Disk

Persistent messages and object definitions for a queue manager live in its data directory and logs. Failover is fast when the standby already sees those files—no bulk copy. The tradeoff is storage infrastructure: NFS or SAN must be reliable, correctly mounted on both nodes with compatible locking, and sized for peak log growth during outages when both nodes may not run compaction simultaneously. Storage outages become messaging outages; monitor the array like you monitor the queue manager.

Installation and Instance Names

You create the queue manager once on shared storage, then register an instance on each host with a unique instance name (for example on node1 and node2). mqsi starts an instance; one becomes active. DISPLAY QMSTATUS and multi-instance status commands (per your release) show which node is active. Automation integrates with Pacemaker, PowerHA, or Windows Server Failover Cluster on some platforms—follow IBM's guide for your OS version.

Failover Timeline

  1. Active host loses power or MQ process hangs.
  2. Standby detects loss of active heartbeat within configured interval.
  3. Standby acquires exclusive access to log files (fencing).
  4. Log replay completes; queue manager starts on standby.
  5. Listeners accept connections; channels restart per configuration.
  6. Clients reconnect (automatic reconnect if configured in client.xml).

Duration depends on log size, storage latency, and number of in-doubt transactions. Long-running units of work extend recovery. Operations rehearse failover quarterly because DNS, firewall, and client reconnect timeouts only surface during real tests.

Applications and Channels

Use the queue manager name in MQCONN, not the hostname of the current active node—unless you deliberately use a floating IP or hostname that follows failover. Sender channels to partners should use CONNAME that resolves to a virtual IP or load balancer where appropriate. In-flight transactions at failover may return to applications as reason codes requiring reconnect and retry; idempotent consumers handle duplicate delivery if puts were uncertain.

When Not to Use Multi-Instance

  • You need active/active writers to the same queues on two live nodes—consider cluster workload or application design instead.
  • Storage cannot meet IBM prerequisites for multi-instance locking.
  • Cloud availability zones lack supported shared storage—evaluate Native HA or RDQM.
  • Maintenance windows can tolerate manual strmqm on a single node—single instance may suffice.

Tutorial: Conceptual Setup Commands

shell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Shared mount example: /mqshared/qmgrs/PAY.EU1.PROD on both nodes # Create queue manager on shared path (once) crtmqm -ld /mqshared/log -md /mqshared/qmgrs PAY.EU1.PROD # Add instances (syntax varies by version—see IBM doc) # crtmqm -sx -ld /mqshared/log -md /mqshared/qmgrs -qn PAY.EU1.PROD -inst node1 # crtmqm -sx -ld /mqshared/log -md /mqshared/qmgrs -qn PAY.EU1.PROD -inst node2 # Start standby then active (order per IBM guide) mqsi PAY.EU1.PROD -n node2 mqsi PAY.EU1.PROD -n node1 # After failover, verify on survivor echo "DISPLAY QMSTATUS" | runmqsc PAY.EU1.PROD

Explain Like I'm Five: Multi-Instance

Two firefighters share one fire truck parked in a garage they both can reach (shared storage). Only one drives at a time (active). If the driver feels sick, the passenger takes the wheel (failover) using the same truck and equipment—no new truck bought during the emergency.

Practice Exercises

Exercise 1: Storage

List three questions to ask your storage team before MIQM go-live.

Exercise 2: Clients

Your Java client hardcodes active node hostname. What breaks on failover? How do you fix it?

Exercise 3: Compare

When would you choose RDQM over multi-instance for two datacenters?

Frequently Asked Questions

Frequently Asked Questions

Test Your Knowledge

Test Your Knowledge

1. In multi-instance, how many instances are active at once?

  • Two active
  • One active, one standby
  • Zero
  • Unlimited

2. Shared storage is required because:

  • Both instances must see the same queue data and logs
  • Channels require NFS for TLS
  • Only for pub/sub
  • CCDT files demand it

3. Applications after failover typically:

  • Reconnect to the same queue manager name
  • Must change every queue name
  • Lose all persistent messages
  • Switch to Kafka

4. RDQM differs from classic multi-instance by:

  • Using replicated data instead of shared disk
  • Having no logs
  • Forbidding channels
  • Running only on z/OS
Published
Read time15 min
AuthorMainframeMaster
Verified: IBM MQ 9.3 documentation