Replicated Data Queue Managers (RDQM) is IBM’s answer for Linux teams that need queue manager high availability without a Fibre Channel SAN every application team shares. Classic multi-instance queue managers demand networked storage both nodes mount with locking semantics MQ trusts; many cloud and virtual environments resist that model. RDQM replicates queue manager data between Linux hosts, uses quorum to elect which node runs the primary queue manager, and fails over when a node disappears. The result is still active/passive from an application perspective—one primary serving MQCONN at a time—but the plumbing is replication rather than shared LUNs. This tutorial explains RDQM architecture, quorum and three-node layouts, installation concepts, failover behavior, differences from multi-instance and Native HA, and planning questions storage and network architects should ask before adoption.
Shared-disk multi-instance fails when storage cannot support POSIX locks across nodes, when cloud AZs forbid shared volumes, or when storage teams cannot deliver HA mounts in time. RDQM moves replication into the MQ HA product so each node maintains a synchronized copy and promotion uses RDQM coordination instead of SCSI reservations alone.
| Piece | Role | Note |
|---|---|---|
| RDQM node | Hosts MQ instance and replica | Typically three for quorum |
| Replication layer | Keeps data copies aligned | Product-managed sync |
| Quorum | Votes on primary promotion | Prevents dual primary |
| Primary QM | Active queue manager | Same name after failover |
| Clients | Reconnect after failover | CCDT and reconnect options |
Two-node clusters cannot vote when the network splits—each side thinks the other died. Three nodes (or two data nodes plus a lightweight quorum witness where supported) give majority rules: promote only if more than half agree. Beginners should read quorum loss scenarios: losing two of three nodes may stop automatic promotion to protect data—manual procedures apply.
| Factor | RDQM | Multi-instance |
|---|---|---|
| Storage | Replicated per node | Shared mount |
| Platform focus | Linux | Linux, Windows, Unix |
| Split-brain control | Quorum | FS locking / cluster |
| Ops familiarity | Newer skill set | Long-standing pattern |
| Latency | Replication lag consideration | Local SAN latency |
When the primary node fails, remaining nodes hold quorum, elect a new primary, and start the queue manager against the replicated data copy. Log recovery runs as on any restart. Clients see connection broken then succeed after reconnect to the same queue manager name if DNS or CCDT points to a floating address or updated endpoint list. Measure RTO in tests—replication catch-up after long partition can extend recovery.
Replication traffic consumes network bandwidth between nodes. High persistent message rates need low-latency links between replicas. Disk on each node must hold full queue manager data—unlike shared disk where one copy exists. Plan growth on all replicas identically.
RDQM is three clerks each keeping an identical ledger copy updated on every sale. If the head clerk leaves, the others compare books and agree who leads the shop—majority rules prevent two heads selling the same item twice differently.
Three kids each have the same list of chores. They copy any new chore to everyone’s list. If one kid is away, the others pick a leader from who is left—but only if enough kids are still there to agree.
Explain why two-node RDQM without witness is risky during network partition.
Map RDQM failover steps to the generic failover tutorial timeline.
List questions for storage team when comparing RDQM to SAN multi-instance.
1. RDQM avoids:
2. Quorum helps prevent:
3. RDQM is primarily for:
4. Three-node RDQM often provides: