Some Assembly Required: A Prometheus Stack for IBM Power

The Allen Key Is In The Box. The Box Is This Newsletter.

May 01, 2026

Over the last few weeks I've been writing about the parts: node_exporter on AIX, node_exporter on Linux on Power, Prometheus itself ported to AIX. Each one is a piece of a thing, and the thing is a working monitoring stack for an IBM Power environment. Imagine the flatpack arrived in three deliveries, on three different days, each one missing a different sheet of instructions.

This week, let's assemble it all. Like an IKEA wardrobe, except the instructions are in English, the screws are all accounted for, and nobody ends up sleeping on the floor.

If you already have nmon/njmon running on your AIX boxes — and you probably do — that's a perfectly good wardrobe you don't have to throw out. There's a way to bolt it onto the new one. That's at the end. Stick around.

Step 1: Decide Where The Wardrobe Goes

Prometheus needs to live somewhere. Pick the wall before you start drilling, because moving it later is annoying. The somewhere has three constraints, in order of how much they actually matter:

Network reach. Whatever host runs Prometheus has to be able to open TCP connections to every server you want to scrape. If your AIX LPARs are on one VLAN and your x86 monitoring infrastructure is on another with a firewall between them, you'll be filing tickets. Putting Prometheus inside the Power environment — on a Linux on Power LPAR or on AIX itself — usually saves you that fight.

Disk. Prometheus writes a lot. Plan for ~1-2 bytes per sample, ~15s scrape interval, ~500 samples per scrape per node_exporter target. That’s roughly 3-6 GB per host per year at default retention (15 days), but you’ll want longer retention than that, and you’ll want headroom. SSD or fast SAN. Don’t put the TSDB on NFS. It will hurt you.

CPU and RAM. Modest. A single Prometheus instance scraping a hundred hosts every 15 seconds will run happily in 2 VPs and 4 GB. Scale up from there only when you have evidence you need to.

What does that mean concretely?

Linux on Power LPAR — my default recommendation. Same architecture as your AIX boxes, so it sits right next to them on the network. Standard prometheus package from your distro or from the Prometheus site. Boring. Works.
AIX LPAR — use the port from last week's newsletter. Useful when you want everything inside AIX and not depending on a Linux jump box. Less boring. Beta quality warning.
x86 Linux VM — fine if it has clean network reach into your Power environment. Most shops already have one running for the rest of their fleet, and adding Power targets to an existing Prometheus is the path of least resistance.

Pick one. Don't overthink it. You can move the wardrobe later — the config is a YAML file and the data is a directory.

Step 2: Attach A node_exporter To Every Panel

For each AIX and Linux on Power host you want to monitor, drop node_exporter in, run it, point Prometheus at port 9100. That's the whole shape. These are the dowels — small, identical, and you need one in every hole.

On a single host, by hand, it looks like this on AIX:

mkdir -p /opt/node_exporter
cd /opt/node_exporter
gzip -dc /tmp/node_exporter-aix-ppc64.tar.gz | tar xf -
./node_exporter &

And then in /etc/inittab:

nodeexp:2:respawn:/opt/node_exporter/node_exporter >/dev/console 2>&1

On Linux on Power, it's the same idea with systemd instead of inittab. Doing this fifty times by hand is the part where you give up and call your spouse for help. So:

Step 3: Ansible, The Cordless Drill

Here's a minimal playbook that handles both AIX and Linux on Power. The conditional logic is unavoidable — the platforms genuinely differ in how services start — but it's all in one place.

---
- name: Deploy node_exporter on IBM Power
  hosts: all
  gather_facts: true
  become: true

  vars:
    node_exporter_version: "1.11.1"
    node_exporter_user: "nodeexp"
    node_exporter_dir: "/opt/node_exporter"

  tasks:
    - name: Create node_exporter user (AIX)
      ibm.power_aix.user:
        state: present
        name: "{{ node_exporter_user }}"
        change_passwd_on_login: false
        attributes:
          home: "{{ node_exporter_dir }}"
          shell: /bin/false
      when: ansible_system == "AIX"

    - name: Create node_exporter user (Linux on Power)
      ansible.builtin.user:
        name: "{{ node_exporter_user }}"
        system: true
        shell: /bin/false
        home: "{{ node_exporter_dir }}"
        create_home: false
      when: ansible_system == "Linux"

    - name: Create install directory
      ansible.builtin.file:
        path: "{{ node_exporter_dir }}"
        state: directory
        owner: "{{ node_exporter_user }}"
        mode: "0755"

    - name: Copy node_exporter binary (AIX)
      ansible.builtin.copy:
        src: "files/node_exporter-aix-ppc64"
        dest: "{{ node_exporter_dir }}/node_exporter"
        owner: "{{ node_exporter_user }}"
        mode: "0755"
      when: ansible_system == "AIX"

    - name: Copy node_exporter binary (Linux on Power)
      ansible.builtin.copy:
        src: "files/node_exporter-linux-ppc64le"
        dest: "{{ node_exporter_dir }}/node_exporter"
        owner: "{{ node_exporter_user }}"
        mode: "0755"
      when: ansible_system == "Linux"

    - name: Install systemd unit (Linux)
      ansible.builtin.copy:
        dest: /etc/systemd/system/node_exporter.service
        content: |
          [Unit]
          Description=Prometheus node_exporter
          After=network.target

          [Service]
          User={{ node_exporter_user }}
          ExecStart={{ node_exporter_dir }}/node_exporter
          Restart=on-failure

          [Install]
          WantedBy=multi-user.target
      when: ansible_system == "Linux"
      notify: restart node_exporter linux

    - name: Add node_exporter to inittab (AIX)
      ibm.power_aix.inittab:
        state: present
        name: nodeexp
        runlevel: "2"
        action: respawn
        command: "{{ node_exporter_dir }}/node_exporter >/dev/null 2>&1"
      when: ansible_system == "AIX"
      notify: reload inittab

  handlers:
    - name: restart node_exporter linux
      ansible.builtin.systemd:
        name: node_exporter
        state: restarted
        enabled: true
        daemon_reload: true

    - name: reload inittab
      ansible.builtin.command: init q

That's the whole deployment story for ten or ten thousand hosts.

Then point Prometheus at them:

scrape_configs:
  - job_name: 'power-aix'
    static_configs:
      - targets:
          - aix01.example.com:9100
          - aix02.example.com:9100

  - job_name: 'power-linux'
    static_configs:
      - targets:
          - lop01.example.com:9100
          - lop02.example.com:9100

Should I write a playbook for you to add all your AIX and Linux on Power servers to the config?

Reload Prometheus, query up, see ones, profit.

Step 4: The Wardrobe You Already Own

Here's the thing. If you've been running AIX for any length of time, you probably already have njmon. Probably with crontabs and dashboards and people who have opinions about it. That's the perfectly good wardrobe from the intro. Don't throw it out — bolt it on.

node_exporter gives you the standard ~150 metrics that Prometheus people expect. njmon gives you on the order of 1500 — every disk, every adapter, every CPU, every workload class. It's much more detailed than node_exporter, especially for the Power-specific stuff that the Linux-world node_exporter doesn't know exists.

You don't have to choose. You can run both, and you can feed njmon data into the same Prometheus.

There are two routes. Pick one.

Route A: Telegraf as a translator (the well-trodden path)

This is the route Nigel Griffiths himself documents on the IBM site, and it's the one most shops end up on. The shape:

njmon (or better, nimon — same tool, InfluxDB Line Protocol output) runs on each AIX host and pushes its metrics out over the network.
One telegraf instance — could be the same Linux-on-Power or AIX box hosting Prometheus — receives those pushes from all your AIX hosts on a single port via the inputs.influxdb_listener plugin. You don't need a Telegraf per AIX node; one funnel handles the whole fleet.
Telegraf re-exposes the metrics as a Prometheus-format /metrics endpoint via the outputs.prometheus_client plugin.
Prometheus scrapes Telegraf like any other target.

The funnel: AIX → push → Telegraf → pull ← Prometheus.

On the AIX side:

nimon -s 30 -c 2880 -i telegraf-host.example.com -p 8080

That tells nimon to send a sample every 30 seconds, for 2880 iterations (24 hours), to your Telegraf host on port 8080. Run it from /etc/inittab for production.

On the Telegraf side, a minimal telegraf.conf:

[[inputs.influxdb_listener]]
  service_address = ":8080"

[[outputs.prometheus_client]]
  listen = ":9273"

And in Prometheus:

- job_name: 'aix-njmon'
    scrape_interval: 60s
    static_configs:
      - targets: ['telegraf-host.example.com:9273']

A note on topology: you only need one Telegraf instance, not one per AIX host. All your nimon clients push to the same telegraf listener, and Prometheus scrapes that one endpoint. Put it on the same Linux on Power LPAR as Prometheus and you're done. If you'd rather run it on AIX itself — say, to keep the whole stack inside AIX — the AIX BFF is here: https://dl.power-devops.com/powerdevops.telegraf.1.29.4.1.bff.gz.

A note on scrape timing: njmon is pushing every 30 seconds; Prometheus should scrape Telegraf at 1.5–2× that interval (so 60s is fine) to avoid catching half-written batches.

Route B: njmon_exporter (the direct path)

There's also njmon_exporter — a Go exporter purpose-built for this. It listens on port 8086 for njmon JSON pushes and re-exposes them on 9772 for Prometheus. No Telegraf in the middle.

Worth knowing before you reach for it: the README still says WIP — currently in development, and the repo has 5 stars. It works, and it's a smaller moving-parts count than the Telegraf approach, but it has fewer eyes on it. If your shop already runs Telegraf for other reasons, Route A is the boring choice. If you want one less daemon and you don't mind reading Go when something breaks, Route B is right there.

Either way, the AIX side stays the same — njmon -k -s 60 -i <collector> -p <port> from inittab or cron — and you end up with the full ~1500-metric firehose available in the same Prometheus, queryable next to your node_exporter data.

Common Europe Congress 2026 is there!

The agenda is published! Do you want to know where AIX is going to? It means you MUST visit the Common Europe Congress in Lyon, France. There will be sessions about new AIX features and open source community development. We will talk about AIX and IBM Power automation and Zero Downtime for AIX. Join me in Lyon!

Step Back And Look At It

If you've followed along over the last few weeks:

A Prometheus server, somewhere it can reach your Power LPARs.
node_exporter on every AIX and Linux on Power host, scraped on :9100.
(Optional but recommended) njmon data flowing into the same Prometheus via Telegraf or njmon_exporter, giving you the deep AIX-internal metrics that node_exporter doesn't expose.
One Grafana, one PromQL, one alerting setup, one place to look.

Point a Grafana at this Prometheus and pick from the existing node_exporter dashboards. They mostly work as-is — the metric names are the same on AIX as everywhere else, which was the entire point of using node_exporter in the first place. The njmon metrics will be named differently (whatever njmon/Telegraf chose for them), so you'll either build dashboards for those or import community ones.

That's the stack. It's the same Prometheus everyone else runs. It just happens to be looking at IBM Power. No leftover screws.

If you build this and it works, tell me. If you build this and something breaks in a way I didn't warn you about, tell me — that's how the next newsletter writes itself.

Have fun monitoring IBM Power with Prometheus!

Andrey

Hi, I am Andrey Klyachkin, IBM Champion and IBM AIX Community Advocate. This means I don’t work for IBM. Over the last twenty years, I have worked with many different IBM Power customers all over the world, both on-premise and in the cloud. I specialize in automating IBM Power infrastructures, making them even more robust and agile. I co-authored several IBM Redbooks and IBM Power certifications. I am an active Red Hat Certified Engineer and Instructor.

Follow me on LinkedIn, Twitter and YouTube.

You can meet me at events like IBM TechXchange, the Common Europe Congress, and GSE Germany’s IBM Power Working Group sessions.

Power DevOps Substack

Discussion about this post

Ready for more?