Grafana turns IBM MQ metrics into dashboards your operations team can read at a glance during incidents and capacity reviews. After Prometheus scrapes an MQ metrics exporter every thirty seconds, Grafana queries that time-series database with PromQL and draws queue depth trends, channel state timelines, and queue manager availability panels. Beginners often install Grafana, import a community dashboard, and wonder why half the panels say No data—the exporter labels do not match the dashboard expectations, or the Prometheus job name differs from the template. This tutorial walks through the MQ observability stack role for Grafana, connecting Prometheus as a data source, building panels for CURDEPTH and CHSTATUS, dashboard variables for multiple queue managers, alert rules versus Alertmanager, folder and permission hygiene, performance pitfalls with high-cardinality queue names, and how Grafana complements MQ Console without replacing configuration tasks.
Grafana is the picture on the wall; Prometheus is the notebook of measurements; the queue manager is the factory floor. None of them fix a misconfigured CONNAME—you still use channel tools—but Grafana shows whether depth has been climbing for six hours before the application team pages you.
| Tool | Strength | Limitation |
|---|---|---|
| Grafana | Historical trends, shared dashboards, alert visuals | No DEFINE CHANNEL; read-only on metrics |
| MQ Console | Object admin, live status, REST APIs | Weaker long-term trending than time-series DB |
| MQ Explorer | Desktop deep dive, message browse | Not a team NOC wall display |
| Instana / Dynatrace | Full-stack APM plus MQ sensors | Licensing and agent deployment model |
In Grafana, open Connections then Data sources, choose Prometheus, and set the URL to your Prometheus server (for example http://prometheus.monitoring.svc:9090). Set Scrape interval to match or exceed your prometheus.yml job scrape_interval so panel refresh feels aligned with incoming points. Enable TLS and authentication in production—basic auth, OAuth proxy, or mutual TLS depending on your platform. Click Save and test; a green success message means Grafana can reach Prometheus. If Save and test fails, check network policies between Grafana pods and Prometheus, and verify the Prometheus service DNS name from inside the Grafana namespace.
Plot gauge or time series for queue depth where the exporter exposes something like ibmmq_queue_depth or a documented equivalent for your exporter version. Compare current depth to MAXDEPTH using a secondary query or a calculated field: depth divided by max depth times one hundred gives utilization percent. Alert when utilization stays above eighty percent for ten minutes on payment queues. Explain to new operators that a flat line at zero may mean no traffic or a broken metric label—not necessarily a healthy empty queue if the application should be busy.
Map numeric channel status codes from the exporter to RUNNING, RETRY, STOPPED, and other states your runbook uses. A stat panel turning red when critical SDR names are not RUNNING is a common NOC pattern. Document which channels are tier-1 so the dashboard does not alert on test channels left in RETRY intentionally.
A single stat panel per queue manager showing up equals one and down equals zero gives fast incident detection when strmqm fails or a host dies. Pair with exporter scrape up so you distinguish QM down from monitoring broken.
Use PromQL rate() on counter metrics: rate(mq_queue_puts_total[5m]) minus or compared to get rate shows backlog risk before depth spikes. Counters reset on queue manager restart—panels may show a brief dip; use irate or rate with awareness of restarts documented in your runbook.
123456789# Example PromQL ideas (metric names vary by exporter version) # Queue depth for one queue (replace labels to match your exporter) ibmmq_queue_depth{qmgr="QM1", queue="PAY.IN"} # Put rate per second (5m window) rate(ibmmq_queue_puts_total{qmgr="QM1"}[5m]) # Channels not in RUNNING (pseudo—map status code per exporter docs) count(ibmmq_channel_status{qmgr="QM1"} != 2)
Variables let one dashboard serve many queue managers. Create a variable qmgr with query label_values(ibmmq_queue_depth, qmgr) so the dropdown lists discovered queue managers. A second variable queue can depend on qmgr: label_values(ibmmq_queue_depth{qmgr="$qmgr"}, queue). Use Multi-value and Include all option carefully on large estates—selecting all ten thousand queues in one graph will time out Grafana. Prefer a variable restricted to a CSV of tier-1 queue names maintained by the applications team, or a recording rule that only exports critical queues.
Read-only viewers for application teams prevent accidental panel deletion during a bridge call. Editors need training that changing PromQL without understanding labels causes false green panels.
Many sites use Prometheus Alertmanager for MQ rules and Grafana only for visualization. Others migrate to Grafana unified alerting with contact points to Slack and PagerDuty. Pick one primary path to avoid duplicate pages for the same condition. If both fire, on-call fatigue follows. Document severity: warning for depth seventy percent, critical for ninety percent plus sustained growth, and maintenance silences during planned channel restarts.
Grafana.com hosts IBM MQ related dashboard JSON. Import by ID or upload JSON, then open every panel and fix metric names and label keys to match your exporter. Community templates assume default label schemes from a specific exporter release—your Helm chart may differ. Save As into your MQ folder after validation in a non-production Prometheus tenant.
Each unique label combination in Prometheus becomes a series Grafana queries. Dashboards that plot all queues on one graph with legend queue name explode browser memory when cardinality is huge. Mitigations: topk() PromQL for ten highest depths only; recording rules pre-aggregating per queue manager; separate dashboards per application domain; shorter time range defaults on heavy panels. Set query timeout and max data points in Grafana settings for large environments.
When exporters run on Linux gateways reading z/OS queue managers, label qmgr with the logical name operators recognize. Add a panel for exporter-to-QM round-trip latency if exposed. Split z/OS and distributed queue managers into different rows so capacity planners see platform context. Link dashboard annotations to change tickets when channels are bounced during maintenance.
Grafana is the wall of graphs in a spacecraft control room—it does not fly the ship, but everyone looks at it to see if fuel (queue depth) is draining faster than expected.
Grafana is a coloring book that draws pictures from numbers someone else counted. Prometheus counts how many marbles are in the jar; Grafana draws the jar getting fuller over the day.
Sketch three panels for one tier-1 payment queue: depth, utilization percent, put rate.
List five label keys you would allow on MQ metrics versus five you would forbid and why.
Write a runbook step: Grafana shows depth flat at zero but Console shows CURDEPTH 5000—what do you check first?
1. Grafana primary role for MQ is:
2. Typical MQ metrics data source in Grafana is:
3. Dashboard variables help operators:
4. Grafana alert rules often forward to: