CICS Recovery and Restart

When a CICS region fails or is shut down, recovery and restart are the processes that bring it back to a consistent, running state. CICS supports several restart types: warm restart (using existing system data and logs), cold restart (rebuilding from scratch), and emergency restart (minimal recovery when others are not possible). The recovery manager uses system logs to determine what was in progress at failure and to perform backout or forward recovery. This page explains what recovery and restart mean, how each restart type works, when to use it, and how logging supports recovery.

Explain Like I'm Five: What Is Recovery and Restart?

Imagine the power goes out while you are building a tower of blocks. When the power comes back, you have two choices: you can try to put the tower back the way it was using a photo you took (like warm restart), or you can clear the table and start over (like cold restart). CICS does something similar. When the system stops, it has already written down what it was doing in a log. When you restart, it can read that log and either undo things that did not finish (backout) or redo things that were done (forward recovery) so that data is consistent. Warm restart uses that saved information; cold restart ignores it and starts fresh; emergency restart is like getting the room working again as fast as possible with minimal checking.

What Are Recovery and Restart?

Recovery is the set of actions that restore data and resource state to a consistent point after a failure. In CICS, recovery involves the recovery manager, the system log, forward recovery logs, and (for in-flight transactions) dynamic transaction backout. Restart is the process of bringing the CICS region back up so it can accept work again. Restart uses recovery: during restart, CICS applies log data to restore state, back out incomplete units of work, and ensure that resources (files, queues, etc.) are consistent. The choice of restart type (warm, cold, or emergency) determines how much of the saved state is used and how much is rebuilt from configuration.

Restart Types: Warm, Cold, Emergency

CICS offers three main restart types. Each uses a different mix of saved state and reinitialization.

CICS restart types
TypeWhat it doesWhen to useRelative speed
WarmUses existing logs and statePreferred when data is validFastest
ColdRebuilds from scratchWarm fails or data corruptedSlower
EmergencyMinimal recoveryNormal restart not possibleFast but limited

Warm Restart

Warm restart is the preferred method when the system data and logs are valid. CICS reads the existing system data (e.g. from the catalog, dump data sets, or coupling facility) and the system log to restore the region to its state before the failure. It recovers transaction and program state, restores resource allocations, and validates integrity. Benefits include faster recovery time, preserved system state, and minimal data loss. Requirements include valid system data files, intact logs, and available resources. If warm restart succeeds, the region comes back with its previous configuration and recovered state. IBM generally recommends warm restart as the standard procedure after a CICS region failure; cold start is only needed in rare cases such as global catalog or system log corruption.

Cold Restart

Cold restart initializes the region from scratch. It does not rely on the previous run's state; instead it rebuilds system structures and reloads resources from the CSD (CICS System Definition) or other configuration. Cold restart is used when warm restart fails (e.g. because of corrupted or missing data) or when you intentionally want a clean state. It takes longer than warm restart because everything is reloaded. After a cold restart, in-flight work from before the failure is not recovered from the previous run's log in the same way; you rely on forward recovery or application-level recovery for durable resources (e.g. databases, queues) if needed.

Emergency Restart

Emergency restart is a minimal recovery path. It aims to get CICS up and running with basic functionality in the shortest time. It is used when normal warm or cold restart cannot be performed (e.g. critical data sets unavailable or repeated failures). During emergency restart, CICS may restore processes to a pre-failure state where possible and may rerun activities that were active at failure; the exact behavior depends on the release and options. Emergency restart trades off full state recovery for speed and availability. Use it as a last resort when other options are not viable.

Recovery Manager and System Log

The CICS recovery manager coordinates recovery during restart. It uses the system log (and related logs) to determine which units of work were in progress at the time of failure and what updates they had made. For failed or in-doubt transactions, it drives backout: undoing changes so that resources are left as if the transaction had not run. For resources that support forward recovery (e.g. some file types), it can use forward recovery logs to reapply changes. The system log is therefore critical; if it is lost or corrupted, warm restart may not be possible and cold or emergency restart may be the only options. Protecting and backing up the system log (and related data sets) is part of recovery planning.

Relationship to Dynamic Transaction Backout

Dynamic transaction backout is the mechanism that undoes a single failed transaction's updates (e.g. file updates, queue changes) so the system stays consistent. Backout can happen during normal running (when a task abends) or during restart (when the recovery manager processes the log and backs out in-flight or in-doubt work). So recovery and restart use the same backout logic; at restart, the recovery manager identifies transactions that did not complete and triggers backout for them. Applications that use syncpoint and resource recovery (e.g. file control, DB2) participate in this so that after restart, durable data reflects only completed units of work.

Step-by-Step: Performing a Warm Restart

  1. Ensure the CICS region has been stopped (abnormally or normally) and that the system log and required data sets are available and intact.
  2. Start CICS with the option or procedure that requests warm restart (e.g. WARM or the equivalent for your release). The exact command or parameter depends on how CICS is started (e.g. MVS START command, CICS startup JCL, or CICSPlex SM).
  3. CICS reads system data and the system log, restores state, and performs backout or forward recovery as needed. Monitor the startup messages and any recovery-related output.
  4. Verify that the region comes up (e.g. CICS sign-on, transaction availability). Check logs and any recovery reports for errors. If warm restart fails, consider cold or emergency restart per your procedures.

Step-by-Step: When to Choose Cold Restart

  1. Attempt warm restart first per your site's standards. If it fails repeatedly or reports catalog/log corruption, cold restart may be required.
  2. Confirm that the cause of the failure (e.g. corrupted system log or catalog) is understood and that cold restart will not make it worse. Ensure CSD or configuration is available so CICS can reload resources.
  3. Start CICS with the cold restart option. CICS will initialize from scratch and not use the previous run's state. After cold restart, validate critical resources and applications.

Best Practices

  • Prefer warm restart after a failure; use cold restart only when warm is not possible or when you need a clean state by design.
  • Protect and back up system log and recovery-related data sets so that warm restart remains an option.
  • Document restart procedures (warm, cold, emergency) and when each is appropriate; include recovery testing in your change and DR plans.
  • Monitor restart duration and recovery messages to spot recurring issues (e.g. log full, resource unavailable) that could prevent warm restart.

Test Your Knowledge

Test Your Knowledge

1. Which CICS restart type uses existing system data and logs?

  • Cold
  • Emergency
  • Warm
  • None

2. When would you use a cold restart?

  • Always
  • When warm restart fails or system data is corrupted
  • Only for planned shutdowns
  • Never

3. What role does the system log play in CICS recovery?

  • None
  • It records transaction and resource updates used during restart
  • It is only for auditing
  • It replaces backout