Amazon Strands Agents

Overview

StrandsAicebergHandler monitors your Strands agent conversations for safety by listening to hook events and sending them to Aiceberg.

Drop it into your agent with one line: hooks=[StrandsAicebergHandler()] and get real-time safety monitoring for user queries, LLM calls, and tool execution.

Why hooks and not callbacks?

Strands gives you two options for watching what your agent is doing:

Callback Handlers are like listeners that respond immediately to everything happening during your agent's execution. They fire constantly as things happen—when the model is thinking, when a tool runs, when output streams to the user. They're lightweight and let you see partial results in real-time.

Hooks are more structured. Instead of listening to everything, they fire at specific lifecycle moments which revolve around agent interactions—like right before calling the LLM, or right after a tool finishes. They give you organized events at key checkpoints, and more importantly, they can interrupt the agent if something's wrong.

Quick comparison

Callback Handlers

Hooks

What they do

Listen to everything happening in real-time as it happens

Listen to major events before and after they happen

Best for

- Streaming output to your UI - Logging and debugging - Watching things as they happen

- Safety checks and guardrails - Blocking bad content - Enforcing rules

Can they stop the agent?

No - just watch and log

Yes - can stop execution immediately

Structure

Lots of small, unstructured events

Clean, organized lifecycle events

Why we picked hooks for Aiceberg

We need to actually stop bad content from reaching users, not just log it after the fact. Hooks let us check content at key moments—like right before the agent sends something to the LLM, or right after the LLM responds—and we can say "nope, stop right there" if something's unsafe. Also, hook events map nicely to Aiceberg's event model: user↔agent, agent↔LLM, agent↔tool. It's a natural fit.

How it works in practice

When your agent is running, hooks fire at important moments. We catch those moments, send the content to Aiceberg for a safety check, and if Aiceberg says "blocked", we throw a SafetyException and the agent stops immediately. The user never sees the unsafe content—they just get a safe fallback message instead.

Callback handlers are great for watching. Hooks are great for controlling. We need control, so we use hooks.

What we built

A simple hook provider that forwards Strands events to Aiceberg for safety monitoring. Aiceberg can block unsafe events during the flow and hence stop subsequent events like LLM or tool calls from happening, enabling safety and security.

How it works

StrandsAicebergHandler implements Strands' HookProvider interface and listens for six specific events: source - strands.

We monitor these events:

MessageAddedEvent — When a message is added to the conversation
AfterInvocationEvent — After the agent completes its invocation
BeforeModelCallEvent — Before calling the LLM
AfterModelCallEvent — After the LLM responds
BeforeToolCallEvent — Before executing a tool
AfterToolCallEvent — After a tool completes

When you register the handler with your agent, Strands automatically calls our callbacks at each critical moment.

Strands provides 8 total hook events, which can be found as available events on strands docs. Out of these, we currently use six for safety monitoring. The remaining two are not in use:

AgentInitializedEvent — Triggered when agent is first constructed (not useful for content safety)
BeforeInvocationEvent — Triggered at start of request (we use MessageAddedEvent instead)

User-to-Agent (user_agt)

This is where user inputs enter the system and final responses leave. We monitor two events:

MessageAddedEvent (Gate 1): When a message is added to the conversation, we send the raw user utterance to Aiceberg, capture the returned event_id, and block immediately if moderation fails.
AfterInvocationEvent (Gate 4): After the agent completes its invocation, we send the final assistant reply to Aiceberg. This is tied back to the Gate 1 event_id. If rejected, we swap the user-facing answer for your fallback.

These two gates bookend the entire conversation turn—what comes in and what goes out.

Agent-to-LLM (agt_llm)

This is where the agent communicates with the language model. We monitor two events:

BeforeModelCallEvent (Gate 2): Before calling the LLM, we send the exact messages array Strands will send to the model. This includes system prompts, conversation history, and tool definitions. We capture the event_id and can block if needed.
AfterModelCallEvent (Gate 3): After the LLM responds, we send the raw response to Aiceberg. This includes text content and any tool call directives. Both halves are linked via the stored event_id, and either can be blocked.

These gates control what the LLM sees and what it produces.

Agent-to-Tool (agt_tool, agt_mem, A2A)

This is where the agent executes tools. The LLM decides when to call tools based on the user's question. We monitor two events:

BeforeToolCallEvent: Before executing a tool, we send the tool name and input parameters to Aiceberg and capture the event_id.
AfterToolCallEvent: After the tool completes, we send the result back to Aiceberg, linked to the same event_id.

We don't block tool execution by default — if a tool returns unsafe content, it gets caught at Gate 3 (when LLM processes the result) or Gate 4 (before showing to user). The tool hooks provide audit visibility.

What counts as a tool?

Regular tools: Calculator, web search, database queries, API calls — anything the LLM can invoke as a function.
Memory operations (Agent-to-memory): Memory uses the same tool hooks. The LLM calls mem0_memory(action="store") or mem0_memory(action="retrieve") just like any other tool. If you set AB_monitoring_profile_A2MEM, they show as agt_mem events instead of agt_tool for a dedicated dashboard view.
Agent-to-agent communication (Agent-to-agent): When one agent calls another, it happens through tool calls. According to the Strands documentation, this triggers the same BeforeToolCallEvent and AfterToolCallEvent hooks we already monitor. This needs more extensive testing on our side, but the monitoring pattern is the same. Configure AB_monitoring_profile_A2A if you want A2A calls to show as dedicated events on the dashboard. amazon-strands

Can we block tools?

Yes, technically. The hook system allows raising SafetyException at BeforeToolCallEvent. If you add self._check_safety(result, "TOOL_INPUT") after sending to Aiceberg, the tool won't execute.

Why we don't block by default: Blocking tools mid-flight breaks the agent's flow. The LLM expects tool results. If you block a tool, you have three bad options:

Send error message as tool result → Confuses the LLM
Send empty/fake data → Breaks logic
Stop entire agent → User gets incomplete response

Better approach: Log tools for audit visibility but don't block. If a tool returns unsafe content, it gets caught at Gate 3 (when LLM processes the result) or Gate 4 (before showing to user).

When to block tools: If your use case requires it (e.g., preventing database writes), add the safety check. Just handle what happens next—typically show an error message to the user.

Walking through a query

example: what happens when a user asks.

Step-by-step breakdown

User query arrives (Safety Gate 1)

Strands calls on_user_query with a MessageAddedEvent. We grab the user's question and send it to Aiceberg:

def on_user_query(self, event: MessageAddedEvent):
    # Extract user query text
    content = message.get("content", [])
    user_query = self._extract_text_from_content(content)

    print(f"🔍 USER QUERY (Safety Gate 1)")

    # Send to Aiceberg
    config = self.monitor.EVENT_CONFIGS["user_agent"]
    result = self.monitor.send_event(
        content=user_query,
        event_type=config.aice_event_type,
        is_input=True,
        profile_id=self.monitor.profiles["user_agent"]
    )

    # SAFETY CHECK: Block if user query is rejected
    self._check_safety(result, "USER_QUERY")

    # Store event ID for linking final response
    self.event_ids["user_agent"] = result.get("event_id")

Aiceberg receives:

{
  "profile_id": "01K5A720XSXX4TZYJTCED3ENVB",
  "event_type": "user_agt",
  "forward_to_llm": false,
  "input": "What's 10 + 5?"
}

Response includes event_id: "evt_user_123", which we store for later.

LLM input prepared (Safety Gate 2)

Your agent builds a prompt with system instructions and the user question. Strands fires BeforeModelCallEvent:

def on_llm_input(self, event: BeforeModelCallEvent):
    print(f"🔍 LLM INPUT (Safety Gate 2)")

    # Get the raw messages being sent to LLM
    messages = event.agent.messages or []

    # Send raw messages as JSON (no extra formatting)
    content = json.dumps(messages, indent=None)

    config = self.monitor.EVENT_CONFIGS["agent_llm"]
    result = self.monitor.send_event(
        content=content,
        event_type=config.aice_event_type,
        is_input=True,
        profile_id=self.monitor.profiles["agent_llm"]
    )

    # SAFETY CHECK: Halt if LLM input is rejected
    self._check_safety(result, "LLM_INPUT")

    # Track this specific LLM call with unique counter
    self.llm_call_counter += 1
    llm_call_id = f"agent_llm_{self.llm_call_counter}"
    self.event_ids[llm_call_id] = result.get("event_id")
    self._current_llm_call_id = llm_call_id

We send the entire messages array to Aiceberg under the agt_llm event type. No extra formatting—just the raw data Strands is sending to the model.

LLM responds with tool call (Safety Gate 3)

The model decides it needs to use a calculator tool. Strands fires AfterModelCallEvent:

def on_llm_output(self, event: AfterModelCallEvent):
    print(f"🔍 LLM OUTPUT (Safety Gate 3)")

    # Extract raw LLM response message
    response_content = {}
    if event.stop_response and hasattr(event.stop_response, "message"):
        msg = event.stop_response.message
        if isinstance(msg, dict):
            response_content = msg  # Send the entire message object

    content = json.dumps(response_content, indent=None)

    # Get the correct link_id for this specific LLM call
    current_call_id = getattr(self, '_current_llm_call_id', None)
    link_id = self.event_ids.get(current_call_id) if current_call_id else None

    config = self.monitor.EVENT_CONFIGS["agent_llm"]
    result = self.monitor.send_event(
        content=content,
        event_type=config.aice_event_type,
        is_input=False,
        profile_id=self.monitor.profiles["agent_llm"],
        link_event_id=link_id
    )

    # SAFETY CHECK: Halt if LLM output is rejected
    self._check_safety(result, "LLM_OUTPUT")

The LLM's response includes a tool use request. Aiceberg checks it for safety before we proceed.

Tool execution (monitoring with full context)

The agent executes the calculator tool. Strands fires BeforeToolCallEvent and AfterToolCallEvent. We log both but don't block:

def on_tool_input(self, event: BeforeToolCallEvent):
    print(f"🔍 TOOL INPUT")

    # Get raw tool_use object
    tool_use = getattr(event, "tool_use", {}) or {}

    # Send raw tool_use as JSON (no extra formatting)
    content = json.dumps(tool_use, indent=None)

    config = self.monitor.EVENT_CONFIGS["agent_tool"]
    result = self.monitor.send_event(
        content=content,
        event_type=config.aice_event_type,
        is_input=True,
        profile_id=self.monitor.profiles["agent_tool"]
    )

    # Store event ID
    tool_id = tool_use.get("toolUseId", "unknown_tool_id")
    tool_key = f"agent_tool_{tool_id}"
    self.event_ids[tool_key] = result.get("event_id")

    # Note: We don't check safety for tools to avoid breaking agent flow

def on_tool_output(self, event: AfterToolCallEvent):
    print(f"🔍 TOOL OUTPUT")

    tool_use = getattr(event, "tool_use", {}) or {}
    tool_id = tool_use.get("toolUseId", "unknown_tool_id")

    # Get raw result
    result_obj = {"error": str(event.exception)} if event.exception else getattr(event, "result", {})

    content = json.dumps(result_obj, indent=None)

    config = self.monitor.EVENT_CONFIGS["agent_tool"]
    tool_key = f"agent_tool_{tool_id}"
    link_id = self.event_ids.get(tool_key)

    self.monitor.send_event(
        content=content,
        event_type=config.aice_event_type,
        is_input=False,
        profile_id=self.monitor.profiles["agent_tool"],
        link_event_id=link_id
    )

    # Note: We don't check safety for tools to avoid breaking agent flow

Tool events are sent to Aiceberg under agt_tool type. We skip the safety check here to keep the agent flow smooth.

Can we block tool calls?

Yes. The hook system allows raising exceptions at BeforeToolCallEvent. The current implementation does not do this by design because blocking tools mid-flight breaks the agentic flow. If your use case requires blocking tools based on Aiceberg moderation, add one line after sending to Aiceberg:

self._check_safety(result, "TOOL_INPUT")

This will raise SafetyException if Aiceberg blocks the tool, preventing execution. You need to handle what happens next—typically show an error to the user.

LLM generates final answer (Safety Gate 3, round 2)

The agent sends the tool result back to the LLM for a final answer. This triggers another BeforeModelCallEvent → AfterModelCallEvent cycle, with full safety checks both times. The LLM counter increments, so this is tracked as a separate LLM call.

Final response to user (Safety Gate 4)

The agent wraps up and returns the answer. Strands fires AfterInvocationEvent:

def on_final_response(self, event: AfterInvocationEvent):
    print(f"🔍 FINAL RESPONSE (Safety Gate 4)")

    # Extract final assistant response
    messages = event.agent.messages or []
    final_response = "No response"

    for msg in reversed(messages):
        if isinstance(msg, dict) and msg.get("role") == "assistant":
            content = msg.get("content", [])
            final_response = self._extract_text_from_content(content)
            break

    config = self.monitor.EVENT_CONFIGS["user_agent"]
    result = self.monitor.send_event(
        content=final_response,
        event_type=config.aice_event_type,
        is_input=False,
        profile_id=self.monitor.profiles["user_agent"],
        link_event_id=self.event_ids.get("user_agent")
    )

    # SAFETY CHECK: Final safety gate before user sees response
    self._check_safety(result, "FINAL_RESPONSE")

This is the last safety gate. If everything passes, the answer goes to the user. If blocked, a SafetyException is raised and your app shows a fallback message.

The dashboard view

After this flow completes, you'll see three event types in Aiceberg (all under the same profile if configured that way):

User to Agent (type: user_agt)
- Input: "What's 10 + 5?"
- Output: "10 + 5 equals 15."
Agent to LLM (type: agt_llm, two pairs in this case)
- Pair 1: Input: [messages array] → Output: [tool use request]
- Pair 2: Input: [messages with tool result] → Output: "10 + 5 equals 15."
Agent to Tool (type: agt_tool)
- Input: {"name": "calculator", "toolUseId": "call_123", "input": {"operation": "add", "a": 10, "b": 5}}
- Output: {"status": "success", "content": [{"text": "15.0"}]}

Each input/output pair is linked by the event_id, so you can trace a single user question through the entire agent pipeline.

Key design choices

Why raw content with no formatting?

We send exactly what Strands sends—no prefixes, no labels, no wrapper strings. This keeps Aiceberg's signal clean and makes debugging easier. What you see in the dashboard is exactly what the agent processed.

Why three event types?

Each event type (user↔agent, agent↔LLM, agent↔tool) has different moderation needs. You might allow certain language from users but block it in LLM prompts, or apply stricter policies to final responses. Separate event types give you that flexibility without complicated conditional logic.

Why don't we block tool execution?

Blocking a tool mid-flight can break Strands' event state. The agent expects tools to complete, and interrupting that can leave things in a weird state. Instead, we log tool activity but don't enforce safety there. If a tool returns something unsafe, we'll catch it at Safety Gate 3 (when the LLM processes the result) or Safety Gate 4 (before the final response goes to the user).

This is a design choice, not a technical limitation. The hook system allows raising exceptions at BeforeToolCallEvent. If you raised SafetyException there, the tool would not execute. The current implementation does not block tools because the LLM expects to receive tool results. If you block a tool mid-execution, you have three bad options. Send an error message as the tool result which confuses the LLM. Send empty or fake data which breaks the logic. Or stop the entire agent which leaves the user with an incomplete response. Instead, the design logs tools for observability but does not block them. If a tool returns unsafe content, it gets caught at Safety Gate 3 when the LLM processes the tool result or at Safety Gate 4 before the final response goes to the user. If you need to block tools, add this one line in on_tool_input after sending to Aiceberg: self._check_safety(result, "TOOL_INPUT").

Why forward_to_llm: false?

We're observing the data flow, not proxying it. Strands already handles LLM calls; Aiceberg just gets a copy for monitoring and policy enforcement. If Aiceberg blocks something, we raise an error—the application decides what to do next.

How memory tools work

Memory tools are just like any other tool—the LLM decides when to use them. What makes them different is what they do:

Storing memories:

# You just chat - LLM decides when to remember
agent("Remember that I prefer window seats on flights")
→ LLM thinks: "User said 'remember' - I should store this"
→ LLM calls: mem0_memory(action="store", content="prefers window seats")

Retrieving memories:

# Later, new conversation with no history
agent("What are my seating preferences?")
→ LLM thinks: "No context about seats... I should search memories"
→ LLM calls: mem0_memory(action="retrieve", query="seating preferences")
→ Returns: "prefers window seats"
→ LLM responds: "You prefer window seats on flights"

What's special about memory:

Same tool, different actions — mem0_memory can store OR retrieve, controlled by the action parameter
Persistence — Unlike calculator or other stateless tools, memory persists across agent instances
Semantic search — Retrieval uses RAG (embeddings + vector search), not exact matching

The LLM orchestrates everything—when to store, when to retrieve, what search query to use. Just like with calculator, you give it the tool and let it decide.

Where memory shows up in monitoring

Memory operations are tool calls, so they appear in your agt_tool events (or agt_mem if you use the dedicated profile).

Same tool, different actions:

mem0_memory(action="store") — Stores a fact
mem0_memory(action="retrieve") — Searches with semantic similarity (RAG)
mem0_memory(action="list") — Shows all stored memories

Each one triggers BeforeToolCallEvent → AfterToolCallEvent, just like calculator or any other tool.

What the code does

AicebergMonitor is the simple HTTP client. It loads profile IDs from environment variables, builds the payload, calls base_url/eap/v0/event with your API key, and returns the response. If there's no API key, it returns {"event_result": "passed"} so local dev works without credentials. Network errors also return "passed" so monitoring failures don't break your agent.

StrandsAicebergHandler implements the hook lifecycle. It registers callbacks for six Strands events, extracts content from each one, sends it to AicebergMonitor, checks the result, and maintains event ID mappings for linking inputs to outputs.

_check_safety checks the response from Aiceberg. If event_result is "blocked" or "rejected", it raises SafetyException with a descriptive message. This stops the agent immediately so you can catch the exception and show a safe fallback.

_extract_text_from_content handles Strands' message formats. Messages can be lists of content blocks or simple strings. This helper normalizes everything to plain text for user-facing events.

Quick setup (5 minutes)

Install deps (once):

pip install -r requirements.txt

Add environment variables (.env):

AICEBERG_API_KEY=Bearer ...
AICEBERG_PROFILE_ID=... (use this for all events)

Or use specific profiles for each event type:

AB_monitoring_profile_U2A=... (user↔agent events)
AB_monitoring_profile_A2M=... (agent↔LLM events)
AB_monitoring_profile_A2T=... (agent↔tool events)
AB_monitoring_profile_A2MEM=... (agent ↔ memory => optional, defaults to A2T)

example.py

from strands import Agent, tool
from strands.models.litellm import LiteLLMModel
from src.strands_aiceberg.aiceberg_monitor import StrandsAicebergHandler, SafetyException

# Define your tools
@tool
def calculator(operation: str, a: float, b: float) -> float:
    """Perform basic math operations"""
    if operation == "add":
        return a + b
    # ... other operations

# Create model
model = LiteLLMModel(
    client_args={"api_key": os.getenv("OPENAI_API_KEY")},
    model_id="gpt-4o-mini",
    params={"temperature": 0.2}
)

# Create agent with monitoring
agent = Agent(
    system_prompt="You are a helpful math assistant",
    model=model,
    tools=[calculator],
    hooks=[StrandsAicebergHandler()],  # Aiceberg monitoring handler
)

# Use normally
try:
    response = agent("What is 25 + 37?")
    print(f"Response: {response}")
except SafetyException as e:
    print(f"BLOCKED: {e}")

Run your agent and check the Aiceberg dashboard for events.

Event flow at a glance

Strands hook

Input we send

Output we send

Aiceberg type

Safety check?

MessageAddedEvent

User question

—

user_agt

Gate 1

BeforeModelCallEvent

Messages array

—

agt_llm

Gate 2

AfterModelCallEvent

—

LLM response

agt_llm

Gate 3

BeforeToolCallEvent

Tool use object

—

agt_tool

Log only

BeforeToolCallEvent (memory)

Memory tool call

—

agt_mem**

Optional*

AfterToolCallEvent (memory)

—

Memory result

agt_mem**

Optional*

AfterToolCallEvent

—

Tool result

agt_tool

Log only

AfterInvocationEvent

—

Final answer

user_agt

Gate 4

Optional tool blocking: Tool safety checks are technically possible but not enabled by default. Add self._check_safety(result, "TOOL_INPUT") to block tools. By default, we log for audit but don't block—unsafe tool outputs get caught at Gate 3 or Gate 4.

Memory event type: Memory operations use agt_mem if you set AB_monitoring_profile_A2MEM in your environment. Otherwise, they appear as agt_tool alongside regular tools.

Logging & observability

Startup banner shows whether credentials were found and which profiles are loaded.

Each event prints a short preview: event type, profile ID (truncated), and whether it's input or output.

Successful sends show Aiceberg's response: "passed", "blocked", or "rejected".

Safety violations raise SafetyException with clear messages like "Content blocked by Aiceberg safety filter at LLM_OUTPUT".

Tool execution is logged but not blocked to avoid breaking agent state.

Safety exceptions surface as SafetyException; wrap your agent calls to show user-friendly messages.

Missing environment variables cause events to skip silently (they return "passed" so your agent keeps working).

Additional Info

The tool events include a feature to capture and monitor the invocation state, which holds metadata about tool calls. This can be used to enhance observability across multi-agent scenarios, enabling more robust coordination patterns.

This was discussed in an issue from Strands Feature division to make related events available.

Memory in Strands

Your agent can remember things across conversations. Strands gives you two ways to do this:

FileSessionManager — Simple session persistence. Saves the entire conversation to a file. When you create a new agent with the same session ID, it loads the history. Good for basic chatbots where you just need conversation context.
mem0 with vector storage — Smart memory using embeddings. Stores facts in a vector database and retrieves them with semantic search (RAG). The LLM decides what to remember and when to recall it, which is better for complex apps where you need long-term memory.

PreviousLlamaIndex NextOpenAI Agents SDK

Last updated 1 day ago

Good evening