CICS Recovery

CICS recovery is the set of mechanisms that restore the system and its resources to a consistent state after a failure. When a transaction abends, dynamic transaction backout undoes that task's updates. When a region fails and restarts, the recovery manager uses the system log to back out in-flight work and, where supported, to support forward recovery of resources. Syncpoint (commit) and rollback define the unit of work that recovery respects. This page explains how backout, forward recovery, and the system log work and how they fit into CICS recovery.

Explain Like I'm Five: What Is Recovery?

When something goes wrong—a program crashes or the power fails—the computer needs to fix things so that data is not half-updated. Recovery is that fixing. For one broken task, CICS "rewinds" everything that task did (backout). For the whole system after a crash, CICS reads its diary (the log), sees what was not finished, and rewinds those things too. For some files, it can also "replay" the diary to bring the file back to a good state (forward recovery). So recovery is either undoing bad or incomplete work, or redoing work from the log so that when the system is back up, everything is consistent.

What Is CICS Recovery?

CICS recovery ensures that after a task failure or region failure, no resource is left in an inconsistent state. A unit of work is the set of changes between syncpoints. If a task ends without committing (e.g. it abends), CICS backs out that unit of work: all file updates, queue changes, and other recoverable updates made by that task are reversed. If the region fails, at restart the recovery manager uses the system log to identify units of work that were in progress and backs them out. For resources that support forward recovery, the log can also be used to reapply changes after a restore. Recovery therefore depends on logging (journal records, system log) and on well-defined unit-of-work boundaries (syncpoint).

Recovery Mechanisms

Main recovery mechanisms
MechanismWhat it doesWhen it is used
BackoutUndo a failed task's updatesTask abends or backs out
Forward recoveryReapply logged changes to a resourceResource restored from backup
Restart recoveryUse log at region restartWarm/cold/emergency restart

Dynamic Transaction Backout

When a task fails (abend, HANDLE AID, or explicit backout), CICS automatically backs out that task's updates. It uses journal and log information to reverse each change: file updates are undone, temporary storage or queue updates are rolled back, and so on. Other tasks are not affected; only the failed task's unit of work is backed out. Backout is transparent to the application for standard CICS resources; the application must only design so that a unit of work is logically complete at syncpoint and so that non-recoverable side effects (e.g. sending a message) are acceptable if the task later backs out.

Forward Recovery

Forward recovery applies to a resource (e.g. a file) that has been restored from a backup. The backup is from an earlier point in time; the forward recovery log contains the changes made after that point. By reapplying those changes to the restored resource, you bring it to a consistent state (e.g. at the time of the failure). Not all CICS resource types support forward recovery; it is typically used for VSAM and other file types that have a forward recovery log. The exact procedure (e.g. IDCAMS REPRO, CICS utilities) depends on the resource and the product.

System Log and Restart

The system log records transaction and resource activity so that at restart the recovery manager can determine what was in progress and what had been committed. During warm (or cold) restart, CICS reads the log and backs out any unit of work that had not completed. Thus, after restart, only committed work is reflected in recoverable resources. The system log is critical: if it is lost or corrupted, warm restart may be impossible. Protecting and backing up the system log is part of recovery planning.

Syncpoint and Unit of Work

A unit of work is the set of changes between two syncpoints (or between start and the first syncpoint). EXEC CICS SYNCPOINT commits the unit of work; EXEC CICS SYNCPOINT ROLLBACK (or equivalent) backs it out. Recovery respects these boundaries: backout reverses one task's unit of work; at restart, in-flight units of work are backed out. Applications should issue SYNCPOINT only when a logical set of changes is complete (e.g. after all related file and queue updates). Too many syncpoints can hurt performance; too few can increase the amount of work lost or backed out on failure.

Step-by-Step: What Happens When a Task Abends

  1. The task encounters an error or an abend. Control may go to a HANDLE AID or the task may terminate.
  2. CICS marks the task for backout. No further updates are allowed for that task's unit of work.
  3. The recovery manager uses journal/log data to reverse each update (file, queue, etc.) that the task made since the last syncpoint.
  4. When backout is complete, the task's resources are released and the task ends. Other tasks are unaffected.

Best Practices

  • Design so that each unit of work is logically complete at syncpoint; avoid partial updates across syncpoint.
  • Protect the system log and forward recovery logs; ensure they are backed up and available for restart.
  • Use SYNCPOINT ROLLBACK only when the application decides to cancel the unit of work; do not rely on it for normal flow.
  • Test recovery: simulate abends and region failures to verify backout and restart behavior.

Test Your Knowledge

Test Your Knowledge

1. When a CICS task abends, what happens to its resource updates?

  • They are kept
  • They are automatically backed out
  • They are logged only
  • They are ignored

2. Forward recovery is used to:

  • Back out a transaction
  • Reapply logged changes to a resource after restore
  • Start CICS
  • Define resources

3. The CICS system log is used during:

  • Compile only
  • Backout and restart recovery
  • Display only
  • Security