Grafana

Grafana turns IBM MQ metrics into dashboards your operations team can read at a glance during incidents and capacity reviews. After Prometheus scrapes an MQ metrics exporter every thirty seconds, Grafana queries that time-series database with PromQL and draws queue depth trends, channel state timelines, and queue manager availability panels. Beginners often install Grafana, import a community dashboard, and wonder why half the panels say No data—the exporter labels do not match the dashboard expectations, or the Prometheus job name differs from the template. This tutorial walks through the MQ observability stack role for Grafana, connecting Prometheus as a data source, building panels for CURDEPTH and CHSTATUS, dashboard variables for multiple queue managers, alert rules versus Alertmanager, folder and permission hygiene, performance pitfalls with high-cardinality queue names, and how Grafana complements MQ Console without replacing configuration tasks.

Where Grafana Sits in the MQ Stack

  1. Queue managers process puts and gets, channels move messages, listeners accept clients.
  2. The MQ metrics exporter reads PCF or statistics and exposes HTTP /metrics.
  3. Prometheus scrapes the exporter on scrape_interval and stores counters and gauges.
  4. Grafana queries Prometheus and renders charts; optional Grafana Alerting notifies on-call.
  5. Operators use MQ Console or runmqsc for DEFINE, ALTER, and DISPLAY during remediation.

Grafana is the picture on the wall; Prometheus is the notebook of measurements; the queue manager is the factory floor. None of them fix a misconfigured CONNAME—you still use channel tools—but Grafana shows whether depth has been climbing for six hours before the application team pages you.

Grafana vs other MQ monitoring UIs
ToolStrengthLimitation
GrafanaHistorical trends, shared dashboards, alert visualsNo DEFINE CHANNEL; read-only on metrics
MQ ConsoleObject admin, live status, REST APIsWeaker long-term trending than time-series DB
MQ ExplorerDesktop deep dive, message browseNot a team NOC wall display
Instana / DynatraceFull-stack APM plus MQ sensorsLicensing and agent deployment model

Adding Prometheus as a Data Source

In Grafana, open Connections then Data sources, choose Prometheus, and set the URL to your Prometheus server (for example http://prometheus.monitoring.svc:9090). Set Scrape interval to match or exceed your prometheus.yml job scrape_interval so panel refresh feels aligned with incoming points. Enable TLS and authentication in production—basic auth, OAuth proxy, or mutual TLS depending on your platform. Click Save and test; a green success message means Grafana can reach Prometheus. If Save and test fails, check network policies between Grafana pods and Prometheus, and verify the Prometheus service DNS name from inside the Grafana namespace.

Essential MQ Panels for Beginners

Queue depth over time

Plot gauge or time series for queue depth where the exporter exposes something like ibmmq_queue_depth or a documented equivalent for your exporter version. Compare current depth to MAXDEPTH using a secondary query or a calculated field: depth divided by max depth times one hundred gives utilization percent. Alert when utilization stays above eighty percent for ten minutes on payment queues. Explain to new operators that a flat line at zero may mean no traffic or a broken metric label—not necessarily a healthy empty queue if the application should be busy.

Channel status

Map numeric channel status codes from the exporter to RUNNING, RETRY, STOPPED, and other states your runbook uses. A stat panel turning red when critical SDR names are not RUNNING is a common NOC pattern. Document which channels are tier-1 so the dashboard does not alert on test channels left in RETRY intentionally.

Queue manager up

A single stat panel per queue manager showing up equals one and down equals zero gives fast incident detection when strmqm fails or a host dies. Pair with exporter scrape up so you distinguish QM down from monitoring broken.

Put and get rates

Use PromQL rate() on counter metrics: rate(mq_queue_puts_total[5m]) minus or compared to get rate shows backlog risk before depth spikes. Counters reset on queue manager restart—panels may show a brief dip; use irate or rate with awareness of restarts documented in your runbook.

text
1
2
3
4
5
6
7
8
9
# Example PromQL ideas (metric names vary by exporter version) # Queue depth for one queue (replace labels to match your exporter) ibmmq_queue_depth{qmgr="QM1", queue="PAY.IN"} # Put rate per second (5m window) rate(ibmmq_queue_puts_total{qmgr="QM1"}[5m]) # Channels not in RUNNING (pseudo—map status code per exporter docs) count(ibmmq_channel_status{qmgr="QM1"} != 2)

Dashboard Variables

Variables let one dashboard serve many queue managers. Create a variable qmgr with query label_values(ibmmq_queue_depth, qmgr) so the dropdown lists discovered queue managers. A second variable queue can depend on qmgr: label_values(ibmmq_queue_depth{qmgr="$qmgr"}, queue). Use Multi-value and Include all option carefully on large estates—selecting all ten thousand queues in one graph will time out Grafana. Prefer a variable restricted to a CSV of tier-1 queue names maintained by the applications team, or a recording rule that only exports critical queues.

Folder Structure and Access Control

  • Folder MQ-Production — on-call dashboards, restricted edit to platform team.
  • Folder MQ-Development — sandbox panels, wider edit for learning PromQL.
  • Service accounts for automation; human SSO for interactive use.
  • Version dashboards as JSON in Git when your team practices dashboard-as-code.

Read-only viewers for application teams prevent accidental panel deletion during a bridge call. Editors need training that changing PromQL without understanding labels causes false green panels.

Grafana Alerting vs Alertmanager

Many sites use Prometheus Alertmanager for MQ rules and Grafana only for visualization. Others migrate to Grafana unified alerting with contact points to Slack and PagerDuty. Pick one primary path to avoid duplicate pages for the same condition. If both fire, on-call fatigue follows. Document severity: warning for depth seventy percent, critical for ninety percent plus sustained growth, and maintenance silences during planned channel restarts.

Importing and Customizing Community Dashboards

Grafana.com hosts IBM MQ related dashboard JSON. Import by ID or upload JSON, then open every panel and fix metric names and label keys to match your exporter. Community templates assume default label schemes from a specific exporter release—your Helm chart may differ. Save As into your MQ folder after validation in a non-production Prometheus tenant.

Performance and Cardinality

Each unique label combination in Prometheus becomes a series Grafana queries. Dashboards that plot all queues on one graph with legend queue name explode browser memory when cardinality is huge. Mitigations: topk() PromQL for ten highest depths only; recording rules pre-aggregating per queue manager; separate dashboards per application domain; shorter time range defaults on heavy panels. Set query timeout and max data points in Grafana settings for large environments.

z/OS and Hybrid Dashboards

When exporters run on Linux gateways reading z/OS queue managers, label qmgr with the logical name operators recognize. Add a panel for exporter-to-QM round-trip latency if exposed. Split z/OS and distributed queue managers into different rows so capacity planners see platform context. Link dashboard annotations to change tickets when channels are bounced during maintenance.

Explainer: Wall of Graphs in the Control Room

Grafana is the wall of graphs in a spacecraft control room—it does not fly the ship, but everyone looks at it to see if fuel (queue depth) is draining faster than expected.

Explain Like I'm Five: Grafana

Grafana is a coloring book that draws pictures from numbers someone else counted. Prometheus counts how many marbles are in the jar; Grafana draws the jar getting fuller over the day.

Practice Exercises

Exercise 1

Sketch three panels for one tier-1 payment queue: depth, utilization percent, put rate.

Exercise 2

List five label keys you would allow on MQ metrics versus five you would forbid and why.

Exercise 3

Write a runbook step: Grafana shows depth flat at zero but Console shows CURDEPTH 5000—what do you check first?

Frequently Asked Questions

Frequently Asked Questions

Test Your Knowledge

Test Your Knowledge

1. Grafana primary role for MQ is:

  • Visualization
  • DEFINING channels
  • Writing messages to QLOCAL
  • Running strmqm

2. Typical MQ metrics data source in Grafana is:

  • Prometheus
  • JES spool only
  • COBOL copybook
  • CICS BMS map

3. Dashboard variables help operators:

  • Filter by queue manager or queue
  • Delete DLQ messages
  • Change MAXDEPTH
  • Issue COMMIT

4. Grafana alert rules often forward to:

  • PagerDuty, email, Slack
  • Only FTP
  • z/OS JCL catalog
  • CICS transient data
Published
Read time22 min
AuthorMainframeMaster
Verified: IBM MQ 9.3 documentation, Grafana Labs docs