Log Utilization

Log utilization tells operators how close an IBM MQ queue manager is to a logging crisis. Unlike queue depth, which measures application backlog, log utilization measures pressure on the circular recovery log and the archive infrastructure that keeps that log reusable. When utilization stays high, persistent puts slow down or stop, channels may stall on committed messages, and disaster recovery windows shrink because archives are not moving to safe storage fast enough. Beginners often monitor queue depth dashboards while the real fire is a log directory at ninety-nine percent disk full on a Friday night batch. This tutorial explains what log utilization means on distributed and z/OS MQ, which attributes and commands expose it, how to build metrics and alerts in Prometheus and Grafana, the relationship between primary log wrap, archive lag, and media recovery, tuning responses that do not sacrifice DR policy, and troubleshooting playbooks when utilization spikes without an obvious traffic increase.

Circular Log vs Archive Space

The primary log is a circular set of files (LOGFIL files under LOGPATH on distributed platforms). As transactions commit, MQ writes log records. When a log file fills, it may become eligible for archiving; the archiver copies it to the archive location and marks primary space reusable. Log utilization in the operational sense has two layers: how much of the active circular log is consumed before wrap or archive free-up, and how much free space remains on the filesystem holding archives. Both can hurt you—full circular log with slow archive is different from plenty of circular room but archive disk full so archiving cannot complete.

Utilization signals to monitor
SignalTypical sourceRisk if ignored
Primary log percent usedDISPLAY QMSTATUS, exporter metricWrap stall; put inhibition
Archive filesystem free %OS disk monitor on LOGARCH pathArchive failure; QM stop
Archive lag timeOldest unarchived log vs nowExtended recovery window; full log
Log write latencyHost iowait, storage metricsSlow puts; apparent high CPU
Persistent put rateStatistics, queue metricsPredictable utilization spikes

Reading Log Status with MQSC

shell
1
2
3
4
5
6
7
8
9
DISPLAY QMSTATUS ALL * Review log-related fields for your MQ version, for example: * LOG path, log extent in use, archive path status DISPLAY QMGR * Confirm LOGPATH, LOGARCHMETH, LOGARCHPATH, LOGFIL, LOGFILE size attributes df -h /var/mqm/log /var/mqm/archive * OS-level check on distributed Linux — paths match your site

Exact attribute names vary by MQ version and platform; always compare DISPLAY output to your version documentation before automating parsers. On z/OS, log data sets and archive volumes use different commands—pair with RMF and storage management reports for utilization. The beginner habit to build: every time CURDEPTH spikes on persistent queues, glance at log utilization on the same timeline.

Metrics and Dashboards

  1. Gauge: primary log utilization percent per queue manager.
  2. Gauge: archive filesystem free bytes or percent.
  3. Counter derivative: persistent put rate correlated in Grafana row below log gauge.
  4. Alert: utilization greater than eighty percent for ten minutes warning; ninety-five percent critical.
  5. Alert: archive disk less than fifteen percent free regardless of log percent.

Exporters may expose ibmmq_log_utilization or similar—validate against DISPLAY during a controlled test. Combine with node_exporter filesystem metrics on mount points that hold LOGPATH and archives. Dynatrace and Instana host agents surface disk saturation that explains utilization growth when logging I/O waits on slow storage.

Why Utilization Spikes

  • Batch load — many persistent puts in a short interval.
  • Archive slow — tape, NFS, or overloaded backup target cannot keep pace.
  • Archive disk full — no extents copied off primary; logging stops.
  • Long-running UOW — uncommitted transactions hold log space.
  • Undersized LOGFIL — too few or small primary files for peak rate.
  • Replication or media images — extra I/O lengthens archive cycle.
  • Operator error — archive path misconfigured after migration.

Response Playbook

  1. Confirm persistent put rate versus normal baseline.
  2. Check archive process running and archive path writable with free space.
  3. Identify long transactions: application teams and DISPLAY QSTATUS where supported.
  4. Temporary: throttle non-critical persistent publishers per change policy.
  5. Medium term: increase LOGFIL count or file size, faster archive target, separate disks.
  6. Never delete archive files without retention policy and IBM procedure approval.

Utilization vs Tuning

Log tuning (LOGFIL size, disk tier, sync policy) sets capacity. Log utilization monitoring tells you when capacity is exhausted in production. Capacity planning should use peak batch persistent rate times commit size, not average midday traffic. DR requirements may mandate minimum archive retention on disk—utilization alerts must account for retention consuming space even when current put rate is low.

z/OS Considerations

Log and archive data sets on different volumes; monitor volume free space and catalog constraints. Coupling facility and queue sharing groups add shared recovery context—utilization incidents may affect multiple queue managers in a group. Coordinate with storage teams for SMS-managed volumes and automatic expansion where policy allows.

Explainer: Bathtub Drain

The circular log is a bathtub filling from the faucet (persistent puts). The archive is the drain. Utilization is how full the tub is when the drain clogges or the faucet opens fully during batch hour.

Explain Like I'm Five: Log Utilization

The computer keeps a notebook of every important message so it can remember after a restart. Log utilization is how full that notebook is. If the notebook fills and nobody copies old pages to a filing cabinet (archive), the computer stops writing new important notes.

Practice Exercises

Exercise 1

Design two Grafana alerts: log percent high and archive disk low. Write threshold values and durations.

Exercise 2

During a simulated batch, list commands and OS checks you run every five minutes while utilization rises.

Exercise 3

Explain to an application team why reducing persistent puts helps log utilization but non-persistent puts do not.

Frequently Asked Questions

Frequently Asked Questions

Test Your Knowledge

Test Your Knowledge

1. High log utilization primarily threatens:

  • Persistent messaging and recovery
  • Only topic names
  • Browser TLS ciphers
  • JCL job class

2. Archive disk full can cause:

  • Logging stall
  • Higher MAXDEPTH
  • Automatic channel start
  • More subscriptions

3. Log utilization should be monitored with:

  • Archive path disk and circular log percent
  • Only CURDEPTH
  • CICS mapset
  • FTP quota

4. Reducing persistent put rate during log crisis:

  • Buys time for archive to catch up
  • Deletes all messages
  • Disables CHLAUTH
  • Removes TLS
Published
Read time21 min
AuthorMainframeMaster
Verified: IBM MQ 9.3 documentation