Checkpoints

IBM MQ checkpoints on z/OS are the queue manager’s way of saying “page sets and logs agree up to this point—if we restart, begin replay here.” Without checkpoints, every warm start after an outage would slog through megabytes of log records that already reflect on disk. With well-tuned checkpoints, operations teams shave minutes off recovery windows that executives measure in money. Checkpoints are not the same as application syncpoints: a CICS SYNCPOINT commits business data and messages; a checkpoint is infrastructure bookkeeping for the messaging engine itself. Beginners hear both words in the same meeting and talk past each other. This tutorial defines checkpoint purpose, RBA relationship, interaction with BSDS and page sets, frequency tradeoffs, restart versus media recovery, and monitoring signals—so you can read performance reports and CSQ messages with confidence.

Checkpoint in the Recovery Timeline

  1. Applications put persistent messages; log records capture changes.
  2. Page set pages update with message data.
  3. Checkpoint process marks log RBA where page sets are consistent for recovery.
  4. BSDS records checkpoint RBA and related metadata.
  5. On restart, recovery reads checkpoint RBA and replays only newer log records.
Checkpoint versus nearby concepts
ConceptLayerPurpose
SyncpointApplication / TMCommit or rollback business UOW
Log forceMQ loggingDurability before page update
CheckpointMQ recoveryRestart optimization
ArchiveMQ log managementMove old log to archive data set

Log RBA and Checkpoint Position

Relative byte address (RBA) marks position in the unified log stream. The checkpoint RBA in BSDS means “page sets reflect all log changes up to here.” Recovery after failure starts reading log records after the latest valid checkpoint RBA, applying only what is not yet reflected on page sets. Corrupt or missing checkpoint information forces longer replay or support intervention—another reason BSDS backup matters.

Frequency and Tuning Tradeoffs

More frequent checkpoints reduce restart time but increase log I/O and CPU during normal processing—similar to database checkpoint tuning. Infrequent checkpoints lengthen outages after crashes. Systems programmers adjust parameters per IBM guidance for workload—high persistent rate queues need different policy than low-volume admin queue managers. Measure restart drills in test, not only theory.

  • High checkpoint rate—faster restart, higher running cost.
  • Low checkpoint rate—slower restart, lower running overhead.
  • Peak batch windows—watch combined log and checkpoint load.
  • DR tests—validate checkpoint RBA after controlled shutdown.

Restart Recovery Walkthrough

After an unplanned shutdown, the queue manager starts, reads BSDS for active logs and checkpoint RBA, validates page sets, and replays log records forward. Applications may see brief unavailability. Indoubt transactions resolve per coordinator rules. Channels restart. Operators compare recovery duration to SLA; if replay took thirty minutes, checkpoint tuning is a project.

text
1
2
3
4
5
6
/* Conceptual warm restart 1. Start queue manager 2. Open BSDS — find latest checkpoint RBA 3. Apply log records from checkpoint RBA forward to page sets 4. Open queues for applications Document actual CSQ messages and times in runbooks */

Media Recovery and Checkpoints

Restoring page sets from yesterday’s backup requires replaying archive logs after the backup timestamp—not merely the last checkpoint from before the crash. Checkpoints help routine restarts; backup and archive logs help tape-loss and data center disasters. DR exercises must practice both paths.

Monitoring

DISPLAY LOG and QMSTATUS-related output show log and recovery state. Sudden growth in log bytes since last checkpoint hints infrequent checkpoints or heavy persistent load. Automate alerts on recovery duration after planned restarts quarterly.

Explainer: Saving the Video Game

Checkpoints are save points in a long game. If the console crashes, you reload from the last save instead of replaying the entire game from level one.

Explain Like I'm Five

MQ takes a picture of its toy boxes and diary page number so if it trips, it knows exactly where to continue reading the diary.

Practice Exercises

Exercise 1

Explain checkpoint versus CICS SYNCPOINT to a new developer in one paragraph each.

Exercise 2

Estimate restart impact if checkpoint RBA is eight hours behind during a 500 GB log day.

Exercise 3

Add checkpoint and BSDS steps to an existing queue manager restart runbook outline.

Frequently Asked Questions

Frequently Asked Questions

Test Your Knowledge

Test Your Knowledge

1. Checkpoints speed:

  • Restart recovery
  • TLS handshake
  • Topic routing
  • JCL compile

2. Checkpoint RBA is stored in:

  • BSDS
  • CCDT only
  • DNS
  • JES spool

3. More frequent checkpoints:

  • More log activity while running
  • No logs ever
  • Delete page sets
  • Disable archive

4. Checkpoints differ from syncpoints because:

  • QM recovery vs application UOW
  • They are identical
  • Only for channels
  • Only Linux
Published
Read time23 min
AuthorMainframeMaster
Verified: IBM MQ 9.3 documentation