Prometheus monitoring for IBM MQ brings queue depth, channel health, and queue manager status into the same observability stack many cloud-native teams already use for Kubernetes and microservices. Instead of operators manually running DISPLAY QLOCAL during every incident, metrics scrape every fifteen or thirty seconds into a time-series database where Grafana charts trend lines and Alertmanager pages on-call engineers. Beginners hear Prometheus and install an exporter without labeling strategy—six months later Prometheus disk is full from high-cardinality queue names on every metric. This tutorial explains the scrape model, IBM MQ metrics exporter role, useful metric families, PromQL alert examples at a conceptual level, cardinality discipline, pairing with Grafana, and how Prometheus complements—not replaces—MQ Console, Explorer, and runmqsc.
Pull means Prometheus reaches out to exporters—firewalls must allow Prometheus hosts to exporter ports, not the reverse for basic setups.
| Metric concept | Operational use | Caution |
|---|---|---|
| Queue depth | Backlog alerts | Per-queue labels multiply cardinality |
| Channel status | Not RUNNING pages | Map numeric codes clearly |
| QM status | Instance down detection | One target per QM |
| Put/get counts | Rate derivatives in PromQL | Counter resets on restart |
| Listener up | Client connectivity risk | False positives during maintenance |
Deploy the IBM-supported MQ metrics exporter per your platform documentation—container sidecar, standalone VM, or operator-managed pod. Configure queue manager connection: hostname, port, channel, credentials. In prometheus.yml add a job scraping the exporter port. Use TLS on scrape targets in production. Store secrets in vaults, not plain text in Git.
1234567# prometheus.yml fragment (illustrative) scrape_configs: - job_name: 'ibmmq' scrape_interval: 30s static_configs: - targets: ['mq-exporter.ops.svc:9157'] # tls_config: ... # production TLS
Counters like messages put increase forever until reset on restart. Use rate() over five minutes to get puts per second: rate(mq_queue_puts_total[5m]). Compare put rate to get rate for imbalance. Histograms for latency require exporter support—verify your version exposes them.
Avoid labeling every queue name on cluster-wide aggregates—thousands of queues create millions of series. Prefer: aggregate metrics per queue manager; label only tier-1 queue names; use recording rules to pre-aggregate; drop high-cardinality labels in relabel configs where safe. Review series count after onboarding new applications.
Run exporter on Linux gateway with client connection to z/OS QM. Network path and SVRCONN must be reliable. Latency between exporter and QM adds scrape skew—acceptable for ops metrics, not microsecond latency science.
Silence alerts in Alertmanager during planned strmqm maintenance. Document which metrics flap when channels restart—avoid alert fatigue with sensible for: durations and severity levels.
Prometheus is a thermometer that checks itself every thirty seconds and writes the temperature in a notebook—Grafana draws the graph; Alertmanager calls you when it gets too hot.
Prometheus is a robot that keeps asking the marble jar how full it is and writes the answer in a book so grown-ups can see a picture of fullness over time.
Write three alert rules for a payment queue tier-1 list of five names only.
Explain why labeling all 10,000 queue names hurts Prometheus.
Draw scrape path from QM to Grafana in five boxes.
1. Prometheus collects metrics by:
2. MQ metrics in Prometheus are used for:
3. High cardinality labels can:
4. Exporter connects to MQ via: