This project was carried out during my final-year internship with the MessMass team at Michelin. The core issue was clear: other IT teams had very little visibility on the actual state of our middleware and often opened tickets without knowing whether the problem came from their applications… or from us.
I designed and implemented a status page in Grafana to monitor critical middleware in real time, visualize incident history, and give consuming teams an immediate answer to a simple question: “Is this incident coming from MessMass infrastructure or from my own perimeter?”
As part of the OneSystem IT Platforms program, the MessMass team needed to improve middleware observability (EDA, MFT, EAI, IFE/M2I, ETL) and reduce “unnecessary” tickets. The status page had to:
When I joined, Grafana had just been introduced at Michelin. I had to learn the tool mostly on my own and gradually build a complete, maintainable status page that would act as a reference implementation for future Grafana work within the company.
The status page is built on a unified observability architecture:
Across iterations, the dashboard evolved from a simple binary up/down view into a structured, domain-based page combining current status, incident history and service indicators in a single interface designed for real-world users.
Development went through several stages: a first PoC based on Blackbox, the introduction of richer statuses, domain-based organization and enrichment with other teams’ data. The final version, validated by the HIP squad, became a reference status page for future Grafana usage at Michelin.
Although my internship ended right as the solution was entering production, the intended impact is clear: fewer unnecessary tickets, better understanding of middleware health by consuming teams, and easier diagnostics thanks to a single, consolidated view of the most important information.
To better visualize the status page evolution through iterations, here are some representative screenshots of the different dashboard versions.
PoC Version
First proof of concept limited to availability verification via the
Blackbox Exporter. Statuses are binary (up/down) and the interface very clean.
First version
Introduction of dynamic cards for each environment (Prod, Dev, Indus).
Visual structure starts to align with how teams actually work.
Test version
Experimentation with thresholds and color codes. Sections are
reorganized and new indicators appear (number of nodes in error,
etc.).
Final version
Version validated by the squad, presenting statuses, incident history,
SLI/SLO indicators, and an information banner for each functional
domain.