Steve's thoughts and experiments

Engineering Lead. Mentor. AI Observability Researcher.

Illuminating the Black Box. Observability & AI Operations. Reliability. Strategy. Making AI Observable.

Alt text

Latest 3 Posts ↓

View all posts →
Lab 6 The Golden Signals of LLM Operations image

Lab 6 The Golden Signals of LLM Operations

In Lab 5, we turned the lights on. We instrumented our agent with OpenTelemetry and visualised the execution traces in .NET Aspire. We can see what happened.

But in a production system, "seeing what happened" isn't enough. You need to know if the system is healthy. In traditional software engineering, we rely on Google's SRE Golden Signals: Latency, Traffic, Errors, and Saturation.

Do these apply to Stochastic Parrots? Yes, but they require translation. In this lab, we will define the operational dimensions of an LLM Agent and implement custom metrics to track them.

Read More

Metrics That Matter: Monitoring AI Model Performance image

Metrics That Matter: Monitoring AI Model Performance

You've built an AI agent. It's deployed. It's answering questions and processing requests. But how do you know if it's working well? Traditional application monitoring gives you some signals, but AI systems introduce unique challenges that require us to rethink what we measure.

In this post, we'll define the operational metrics that truly matter for LLMs and agentic workflows, grounded in the industry-standard SRE Golden Signals framework.

Read More

Lab 5 Instrumenting Your First LLM with OpenTelemetry image

Lab 5 Instrumenting Your First LLM with OpenTelemetry

In previous labs, we built the "brains" of the machine. We explored embeddings, set up a vector database, and even constructed a functional RAG pipeline. But there is a lurking problem with LLM development: The Black Box.

You send a prompt, you wait (sometimes an agonising amount of time) and then you get a response. But what is happening inside the LLM?

  • How many tokens did that specific step use?
  • Why did the latency spike?
  • Did the model actually see the system prompt correctly?

In this lab, we'll explore the foundation of AI Observability by instrumenting your first LLM with OpenTelemetry (OTel). This serves as the bedrock for observing AI systems, turning that "invisible" processing time into structured, analysable data.

Read More

8 more posts can be found in the archive.