OpenAI Agents SDK

This guide explains how we built simple monitoring for AI agents using the OpenAI Agents SDK. We send every important action to AIceberg for safety checks before continuing.

What is the OpenAI Agents SDK

Understanding what an agent does and why we need to monitor it

The OpenAI Agents SDK helps you build AI agents that can think through problems and use tools. Unlike a simple chatbot that just responds, an agent:

  • Reads your question, Thinks about what information it needs, Uses tools to get that information, Thinks again about the answer and ultimately give you a final response

Because the agent does many things automatically, we need to watch what it does at every step to make sure nothing bad happens.

How Monitoring Works with Hooks

Hooks are checkpoints where the agent tells us what it is about to do

The SDK gives us special functions called hooks. Think of hooks like alarm bells that ring at important moments. When the bell rings, we can check if everything is safe before letting the agent continue.

The SDK has six hooks:

  • on_agent_start — Agent starts (initialization): Nothing (just log agent name)

  • on_llm_start — Before asking the AI model: User question + what we send to AI

  • on_llm_end — After the AI responds: What the AI decided to do

  • on_tool_start — Before using a tool: Is the tool call safe

  • on_tool_end — After the tool finishes: Is the tool result safe

  • on_agent_end — Before showing user the answer: Is the final answer safe

What We Monitor at Each Hook

Details about what information we send to AIceberg at each checkpoint

1

on_agent_start — Agent Initializes

What we get from the SDK:

  • context — current execution context

  • agent — the agent object with name, tools, and instructions

What we do:

Notes:

  • We do NOT send anything to AIceberg here. We do not have the user question yet. We just remember the agent name for later. The user question comes in the next hook.

2

on_llm_start — Before Asking AI Model (THIS IS WHERE WE GET USER QUESTION)

What we get from the SDK:

  • context — current state

  • agent — the agent object with tools

  • system_prompt — instructions for the AI

  • input_items — conversation history (THIS HAS THE USER QUESTION)

What we do:

Why we build it this way:

  • We extract user question from input_items because that is where the SDK puts it

  • We get agent metadata from the agent object because it has all the configuration

  • We extract tool schemas from agent.tools to show AIceberg what tools are available

  • We include full input_items for complete conversation history

We send TWO events here: user question (first time only, extracted from input_items) and LLM input (every time, built from agent object)

Why Two Separate Events?

  • user_agt event — Focuses on user to agent interaction (checks if user is asking something harmful)

  • agt_llm event — Focuses on agent to LLM interaction (checks what we send to the AI model)

This separation allows different policies and clearer blocking points.

3

on_llm_end — After AI Model Responds

What we get from the SDK:

  • context — has usage stats

  • agent — agent object

  • response — what the AI decided

What we do:

Notes:

  • We use response.output to get what the AI decided

  • We link it to the input event using the event_id we saved earlier

  • We do not send usage stats because AIceberg does not need them

4

on_tool_start — Before Using a Tool

What we get from the SDK:

  • context — ToolContext object with everything we need

  • agent — agent object

  • tool — the tool object

What we do:

Why we use context here:

  • ToolContext already has tool_name, tool_call_id, and tool_arguments

  • No need to extract from the tool object

  • The framework already structured it perfectly for us

We do not send usage stats here because they are not needed for safety checks.

5

on_tool_end — After Tool Finishes

What we get from the SDK:

  • context — ToolContext object

  • agent — agent object

  • tool — the tool object

  • result — what the tool returned

What we do:

Notes:

  • We still use context for tool_name and tool_call_id

  • We add the result parameter to show what the tool returned

  • We link it back to the tool input using the saved event_id

6

on_agent_end — Final Answer

What we get from the SDK:

  • context — final state

  • agent — agent object

  • output — the final answer

What we do:

Notes:

  • We use the output parameter directly

  • We link back to the original user question using the saved event_id

  • This completes the full circle from question to answer

When We Use Context vs Agent Object

Understanding why we get data from different places at different times

We Use Agent Object When:

  • We need agent metadata like name and instructions

  • We need the list of available tools

  • We need tool schemas with parameters

Example:

We Use Context When:

  • We need structured data the framework already prepared

  • For tools: tool_name, tool_call_id, tool_arguments are ready

  • No extra work needed to extract the data

Example:

Why We Filter Out Usage Stats:

The context object has usage information like token counts and costs. We do not send this to AIceberg because:

  • AIceberg checks for safety, not cost tracking

  • Usage stats cannot be harmful

  • Keeping payloads smaller makes everything faster

How AIceberg Responds

What happens after we send an event to AIceberg

After we send each event, AIceberg sends back a response like:

We check the event_result:

  • If "passed" — everything is safe, continue

  • If "blocked" — stop immediately, raise error

When something is blocked, we stop right there. The agent does not continue. The user gets an error message instead of a dangerous response.

Example: Complete Run

Following one question through all 8 checkpoints

User asks: "What is 10 plus 5?"

1

Step 1 — on_agent_start

What We Send: Nothing (just log) Result: Continue

2

Step 2 — on_llm_start

What We Send: User question + Agent metadata Result: Passed

3

Step 3 — on_llm_end

What We Send: AI wants to call "add" tool Result: Passed

4

Step 4 — on_tool_start

What We Send: add(10, 5) Result: Passed

5

Step 5 — on_tool_end

What We Send: Result: 15 Result: Passed

6

Step 6 — on_llm_start (again)

What We Send: Agent metadata + updated conversation Result: Passed

7

Step 7 — on_llm_end

What We Send: AI final answer text Result: Passed

8

Step 8 — on_agent_end

What We Send: "10 plus 5 is 15" Result: Passed

All checks passed, so the user gets their answer safely.

How to Use the Monitor

Simple code to add monitoring to your agent

Basic usage:

The monitoring happens automatically. You do not need to change your agent code at all.

What About the Logging Version

We have two versions of the monitor

There are two files:

  • aiceberg_monitor.py — Simple monitoring only

  • aiceberg_monitor_with_logging.py — Same monitoring + saves to JSON file

The logging version does the exact same monitoring. It just also saves everything to a file so you can debug and review what happened later. The logging is for your own debugging, not for AIceberg.

Using the logging version:

This creates a file with all events and responses for you to look at later.

Important Settings

Configuration that makes everything work

Environment Variables

The monitor needs these environment variables set:

Event ID Linking

We save event IDs when we send input events. When we send the matching output event, we include that event ID to link them together. This helps AIceberg understand which input and output go together.

What We Do Not Send

Information we skip to keep payloads clean

We do not send:

  • Token usage and billing data

  • Model version and technical details

  • Internal execution IDs

  • Framework implementation details

  • Timing and performance data

We only send information that could be harmful or violate policies. Everything else is left out to keep monitoring focused and fast.

Summary

The main points about how monitoring works

  • We use hooks to check safety at 8 points for every user question

  • We send structured data to AIceberg using the simplest approach

  • We use agent object when we need metadata and tool schemas

  • We use context object when the framework already structured the data for us

  • We filter out usage stats and technical details

  • If AIceberg blocks something, we stop immediately

  • The monitoring is transparent — no changes to agent code needed

The whole system is designed to be simple and easy to understand. We do not do any complex processing. We just take data from the right places and send it to AIceberg in a clean format.

Example: Actual Payloads and Responses

Looking at the actual data sent to AIceberg from a real test run ("What is 10 plus 5?")

chevron-rightEvent 1: User Question (Input)hashtag

Type: user_agt | Direction: input

What we sent:

AIceberg responded:

chevron-rightEvent 2: Agent to LLM (Input)hashtag

Type: agt_llm | Direction: input

What we sent (structured agent data):

AIceberg responded:

chevron-rightEvent 3: LLM Response (Output)hashtag

Type: agt_llm | Direction: output

What we sent:

AIceberg responded:

(Notice we linked this output to the input using event_id)

chevron-rightEvent 4: Tool Call (Input)hashtag

Type: agt_tool | Direction: input

What we sent:

AIceberg responded:

chevron-rightEvent 5: Tool Result (Output)hashtag

Type: agt_tool | Direction: output

What we sent:

AIceberg responded:

chevron-rightEvent 6: Agent to LLM Again (Input)hashtag

Type: agt_llm | Direction: input

What we sent (now with tool call history):

AIceberg responded:

chevron-rightEvent 7: LLM Final Response (Output)hashtag

Type: agt_llm | Direction: output

What we sent:

AIceberg responded:

chevron-rightEvent 8: Final Answer to User (Output)hashtag

Type: user_agt | Direction: output

What we sent:

AIceberg responded:

Last updated