CPU Usage

CPU usage for IBM MQ is not a single number on one process—it is the sum of queue manager agents, channel initiators, listeners, and TLS work on hosts that may run dozens of other applications. Beginners see amqzxma0 at ninety percent CPU and restart the queue manager, causing a bigger outage than the original slowness. Smart monitoring charts host CPU, MQ-related process CPU, message throughput, and queue depth on aligned timelines so operators ask whether CPU rose because traffic doubled (good problem) or because logging waits on slow disk caused threads to spin (bad problem). This tutorial explains what consumes CPU in MQ, which processes to watch on distributed platforms, collecting metrics with Prometheus node_exporter and full-stack tools, baselines and alert thresholds, correlation with channel count and TLS, container limits in Kubernetes, z/OS SMF context for mainframe MQ, and tuning actions when CPU is genuinely the bottleneck.

What Consumes CPU in a Queue Manager

  • Message put and get — serialization, validation, security checks per message.
  • Logging — log write and checkpoint work; slow disk increases CPU waiting and retry patterns.
  • Channels — protocol processing, batching, compression, TLS handshake and encrypt.
  • Pub/sub — routing publications to many subscriptions scales with fan-out.
  • Administration — DISPLAY and PCF storms from monitoring tools if too aggressive.
  • Persistence — larger messages and syncpoint patterns increase work per message.
Distributed MQ processes (typical Linux names)
Process patternRoleCPU note
amqzxma0Queue manager execution agentOften largest share under heavy load
amqzmur0 / amqzmuc0Command and utility agentsSpikes during admin bursts
amqzmpp0Channel initiator parentRises with many active channels
runmqlsr / listenerAccepts inbound TCPHandshake storms on connect floods
Application linked to MQ clientNot MQ server but drives loadCorrelate app CPU with QM CPU

Collecting Host CPU Metrics

Install node_exporter on MQ VMs or use the Kubernetes cAdvisor/kubelet metrics path. Prometheus records node_cpu_seconds_total by mode: user, system, idle, iowait. High iowait with elevated MQ CPU often means storage latency, not insufficient cores. Grafana row one: host CPU percent; row two: sum of rates for MQ message puts and gets; row three: top queue depths. Instana and Dynatrace add automatic baselines on process groups containing amq* without manual process lists.

text
1
2
3
4
5
# Illustrative PromQL (host-level) 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) # Process filter (requires process_exporter or similar) rate(namedprocess_namegroup_cpu_seconds_total{groupname=~"amq.*"}[5m])

Baselines and Alerting

Establish weekly baseline CPU for each queue manager host at peak hour. Alert when five-minute average CPU exceeds baseline plus thirty percent for fifteen minutes during business hours. Separate batch-window baselines for Sunday night file loads. Critical alert when CPU above ninety percent and p95 put latency degrades per application feedback. Suppress alerts during known load tests. Always include disk iowait and queue depth in the same notification template so on-call does not tune CPU by adding cores when disk is the root cause.

CPU vs Throughput Efficiency

Messages per second per CPU core is a useful internal KPI. If throughput drops while CPU stays high, investigate poison messages causing repeated rollback, excessive short-lived connections, or browse loops. If throughput rises linearly with CPU, scale-out or vertical scaling may be appropriate. Channel batching (MAXMSGL, batch size attributes per version) reduces per-message overhead—see channel batching tutorial for tuning trade-offs.

TLS and Channel Count Impact

TLS adds CPU for encrypt and decrypt on every channel and client connection. Thousands of short TLS sessions per minute cost more than fewer long-lived sessions with session reuse where supported. Monitoring should track concurrent channel instances and handshake rate during incidents. Certificate rotation maintenance may spike CPU briefly during mass reconnects—annotate dashboards.

Containers and CPU Limits

MQ containers in Kubernetes need CPU requests and limits aligned with expected peak. CPU throttling at the cgroup limit manifests as latency without host CPU hitting one hundred percent—check container_cpu_cfs_throttled_seconds_total. Under-provisioned limits cause false MQ slowness diagnoses. MQ Operator documentation provides sizing guidance per queue manager profile.

z/OS MQ CPU Considerations

On z/OS, MQ runs as part of the middleware footprint in the LPAR. Use SMF and RMF to view MQ address space and queue sharing group costs. Coupling facility contention can increase CPU and wait without obvious distributed-style process names. Capacity planners compare MQ CPU growth to MSU licensing. Beginners should pair operations courses on distributed MQ with z/OS-specific monitoring modules from IBM.

Triage Playbook: High MQ CPU

  1. Confirm host versus container CPU; check iowait and disk latency.
  2. Compare current message rates to baseline—is load genuinely higher?
  3. Count active channels and client connections; TLS handshake storm?
  4. Review recent changes: new app, monitoring interval, logging attributes.
  5. Check for admin scripts DISPLAY ALL storms or runaway PCF collectors.
  6. Sample applications for syncpoint scope and message size growth.
  7. Scale or tune only after root cause category is identified.

Explainer: Kitchen Stoves

CPU is how hard the kitchen stoves run. More orders (messages) need more heat—but if stoves run hot while almost no plates leave the pass, something is wrong with the workflow, not just stove power.

Explain Like I'm Five: CPU Usage

CPU is how hard the computer brain works. When the marble sorter works super hard but marbles still pile up, maybe the problem is a stuck door (disk or channel), not a tired brain.

Practice Exercises

Exercise 1

Design a Grafana row with three panels: host CPU, put rate, top queue depth.

Exercise 2

Explain why high iowait with high MQ CPU points to storage investigation first.

Exercise 3

List three changes that could raise MQ CPU without raising business message volume.

Frequently Asked Questions

Frequently Asked Questions

Test Your Knowledge

Test Your Knowledge

1. MQ CPU should be interpreted with:

  • Throughput and depth metrics
  • Only JCL lines
  • CICS mapset size
  • FTP speed

2. amqzxma0 is associated with:

  • Queue manager agent process
  • JES initiator
  • DB2 buffer pool
  • CICS AOR

3. High CPU with low message rate may suggest:

  • Inefficiency or non-MQ issue
  • Perfect health
  • MAXDEPTH too high only
  • Need more TLS certs

4. node_exporter provides:

  • Host CPU metrics for Prometheus
  • MQSC scripting
  • COBOL compile
  • Channel DEFINE
Published
Read time21 min
AuthorMainframeMaster
Verified: IBM MQ 9.3 documentation