RDQM

Replicated Data Queue Managers (RDQM) is IBM’s answer for Linux teams that need queue manager high availability without a Fibre Channel SAN every application team shares. Classic multi-instance queue managers demand networked storage both nodes mount with locking semantics MQ trusts; many cloud and virtual environments resist that model. RDQM replicates queue manager data between Linux hosts, uses quorum to elect which node runs the primary queue manager, and fails over when a node disappears. The result is still active/passive from an application perspective—one primary serving MQCONN at a time—but the plumbing is replication rather than shared LUNs. This tutorial explains RDQM architecture, quorum and three-node layouts, installation concepts, failover behavior, differences from multi-instance and Native HA, and planning questions storage and network architects should ask before adoption.

Problem RDQM Solves

Shared-disk multi-instance fails when storage cannot support POSIX locks across nodes, when cloud AZs forbid shared volumes, or when storage teams cannot deliver HA mounts in time. RDQM moves replication into the MQ HA product so each node maintains a synchronized copy and promotion uses RDQM coordination instead of SCSI reservations alone.

Architecture Overview

RDQM components (conceptual)
PieceRoleNote
RDQM nodeHosts MQ instance and replicaTypically three for quorum
Replication layerKeeps data copies alignedProduct-managed sync
QuorumVotes on primary promotionPrevents dual primary
Primary QMActive queue managerSame name after failover
ClientsReconnect after failoverCCDT and reconnect options

Three-Node Quorum

Two-node clusters cannot vote when the network splits—each side thinks the other died. Three nodes (or two data nodes plus a lightweight quorum witness where supported) give majority rules: promote only if more than half agree. Beginners should read quorum loss scenarios: losing two of three nodes may stop automatic promotion to protect data—manual procedures apply.

RDQM Versus Multi-Instance

Comparison
FactorRDQMMulti-instance
StorageReplicated per nodeShared mount
Platform focusLinuxLinux, Windows, Unix
Split-brain controlQuorumFS locking / cluster
Ops familiarityNewer skill setLong-standing pattern
LatencyReplication lag considerationLocal SAN latency

Failover with RDQM

When the primary node fails, remaining nodes hold quorum, elect a new primary, and start the queue manager against the replicated data copy. Log recovery runs as on any restart. Clients see connection broken then succeed after reconnect to the same queue manager name if DNS or CCDT points to a floating address or updated endpoint list. Measure RTO in tests—replication catch-up after long partition can extend recovery.

Capacity and Network

Replication traffic consumes network bandwidth between nodes. High persistent message rates need low-latency links between replicas. Disk on each node must hold full queue manager data—unlike shared disk where one copy exists. Plan growth on all replicas identically.

When Not to Use RDQM

  • z/OS primary deployment—use QSG and mainframe DR patterns.
  • Requirement for active/active hub on one name—consider clusters instead.
  • Two-node only site without witness—quorum risk.
  • Unsupported MQ version or Linux distribution per IBM compatibility list.

Tutorial: RDQM Decision Worksheet

  1. Confirm Linux platform and IBM support statement for your release.
  2. Count nodes available for quorum (recommend three).
  3. Estimate replication bandwidth at peak persistent rate.
  4. Compare cost and ops skill versus existing SAN multi-instance.
  5. Schedule failover drill with application reconnect verification.

Explainer: Three Notebooks Kept in Sync

RDQM is three clerks each keeping an identical ledger copy updated on every sale. If the head clerk leaves, the others compare books and agree who leads the shop—majority rules prevent two heads selling the same item twice differently.

Explain Like I'm Five

Three kids each have the same list of chores. They copy any new chore to everyone’s list. If one kid is away, the others pick a leader from who is left—but only if enough kids are still there to agree.

Practice Exercises

Exercise 1

Explain why two-node RDQM without witness is risky during network partition.

Exercise 2

Map RDQM failover steps to the generic failover tutorial timeline.

Exercise 3

List questions for storage team when comparing RDQM to SAN multi-instance.

Frequently Asked Questions

Frequently Asked Questions

Test Your Knowledge

Test Your Knowledge

1. RDQM avoids:

  • Shared-disk requirement of classic MIQM
  • All logging
  • Channels
  • TLS

2. Quorum helps prevent:

  • Split brain
  • Large messages
  • Topic wildcards
  • COBOL compile

3. RDQM is primarily for:

  • Linux
  • Only z/OS
  • Only IBM i
  • Only MQTT

4. Three-node RDQM often provides:

  • Quorum and witness capacity
  • Three active writers
  • No failover
  • No persistence
Published
Read time24 min
AuthorMainframeMaster
Verified: IBM MQ 9.3 documentation