"The biggest challenge in AI isn't building agents—it's deploying them reliably at scale while maintaining visibility into their behavior."

📚 The Deep Agents Trilogy

This is Part 3 of a three-part series on building production-ready AI agent systems.

#1 Foundations — Understanding deep vs shallow agents and multi-agent orchestration
#2 Building DeepSearch — Implement a multi-agent research system from scratch
#3 Production Deployment — You are here: Deploy to Amazon Bedrock AgentCore with LangFuse observability

If you can't deploy your agent in a scalable, resilient, and replicable way, it's just a personal toy that will never go to production. Plus, if you can't monitor it, it's a ghost—you have no idea what it's doing, why it's failing, or how to improve it.

The two previous articles in this trilogy focused on the conceptual foundations in Part 1: Foundations and the practical implementation in Part 2: Building DeepSearch. This third and final article completes the journey by tackling what matters most for real-world adoption: production deployment and observability.

We'll deploy our Deep Research Agent to Amazon Bedrock AgentCore using Terraform and instrument it with LangFuse for comprehensive monitoring—all following Infrastructure as Code (IaC) best practices.

Deployment

Why Deployment Matters

Building an agent that works on your laptop is one thing. Making it available to users, scalable under load, and maintainable over time is an entirely different challenge. Production deployment addresses critical concerns that local development ignores:

Scalability: Your agent needs to handle multiple concurrent users without degradation
Reliability: Session isolation, automatic restarts, and fault tolerance become essential
Security: API keys, credentials, and user data must be protected
Reproducibility: Every deployment should be identical, traceable, and reversible
Cost Management: Clear visibility into resource consumption is a key

Without proper deployment infrastructure, your brilliant agent remains a demo. With it, you have a production-ready service (il ne restera plus qu'a trouver les users, lol).

Bedrock AgentCore

What is Bedrock AgentCore?

Amazon Bedrock AgentCore is AWS's latest service for productionizing AI agents. I see it as the successor to Bedrock Agents, their first attempt at agent productionization. That first service was simple in some ways but also restrictive: it primarily treated tools as Lambdas, was not very flexible, and was not really suitable for complex agents. After testing it myself, I found that it offered some customization, but not enough. It sat awkwardly between ClickOps-style simplicity and full-fledged agent platforms, without fully serving either audience. The former help non-technical users create agents with little to no coding knowledge, while the latter are built for deep-tech teams and developers who want to build complex agents, orchestrate them themselves, and keep almost full control. With AgentCore, AWS clearly decided to go all-in on the full-fledged agent-platform side, while still supporting ClickOps-style experiences through Amazon Q.

Amazon Bedrock AgentCore is a secure, serverless runtime purpose-built for deploying and scaling dynamic AI agents. It's framework-agnostic, supporting Strands Agents, LangChain, LangGraph, CrewAI, and any other Python-based agent framework.

Key capabilities include:

Serverless Execution: No infrastructure management—just deploy your code
Session Isolation: Each user session runs in a dedicated microVM, protecting sensitive state
Auto-Scaling: Scales from zero to thousands of sessions in seconds
Protocol Support: Native support for MCP, A2A, and custom protocols
Model Flexibility: Use any model from any provider (Bedrock, OpenAI, Anthropic, etc.)

We will focus on two core components: Runtime and Memory.

Bedrock AgentCore Runtime

The AgentCore Runtime is the execution environment where your agent code runs. It provides:

HTTP Interface: Your agent exposes /invocations (POST) and /ping (GET) endpoints
Environment Injection: Secrets, configuration, and credentials are injected at runtime
Logging Integration: You can setup Automatic log collection to CloudWatch

Think of this as a lambda with longer timeout maximum (8 hours) and some ready to use HTTP interfaces The runtime expects your agent to follow a simple contract:

from bedrock_agentcore import BedrockAgentCoreApp

app = BedrockAgentCoreApp()

@app.entrypoint
def invoke(payload, context=None):
    """Process user input and return agent response."""
    prompt = payload.get("prompt", "")
    # Your agent logic here
    return {"result": response}

if __name__ == "__main__":
    app.run()

This decorator-based approach means you can wrap any existing agent with minimal code changes. This entrypoint can execute whatever agent logic you need.

Bedrock AgentCore Memory

AgentCore Memory provides persistent storage for agent state across sessions. For our Deep Research Agent, memory enables:

Conversation Continuity: Resume research tasks across multiple interactions
Todo Persistence: The orchestrator's task list survives session boundaries
Knowledge Accumulation: Build on previous research without starting from scratch

Memory is optional but highly valuable for complex, long-running research tasks.

# Terraform resource for AgentCore Memory
resource "aws_bedrockagentcore_memory" "this" {
  count = var.enable_memory ? 1 : 0

  name                  = local.agent_name_sanitized
  description           = "${var.agent_description} Memory"
  event_expiry_duration = var.memory_event_expiry_duration

  tags = local.tags
}

Deploying Agents using Bedrock AgentCore

There are multiple paths to deploy agents to AgentCore:

Method	Best For	Complexity
CLI (`agentcore launch`)	Quick prototyping, manual deployments	Low
boto3 SDK	Programmatic deployments, CI/CD pipelines	Medium
Terraform	Production infrastructure, IaC workflows	Medium-High
CloudFormation	AWS-native IaC, existing CFN pipelines	Medium-High

For production workloads, IAC (Terraform in our case mainly because my company is biased, and so am I, lol) offers the best balance of power, flexibility, and maintainability. It provides:

Declarative Infrastructure: Describe what you want, not how to build it
State Management: Track deployed resources and detect drift
Plan/Apply Workflow: Preview changes before applying them
Module Reusability: Package common patterns for reuse across projects

AgentCore Alternatives

Before committing to AgentCore, consider these alternatives:

Platform	Strengths	Considerations
AWS Lambda	Familiar, cost-effective for light loads	Cold starts, 15-min timeout, no session isolation
AWS Fargate	Full container control, long-running tasks	More infrastructure to manage
Amazon EKS	Maximum flexibility, multi-cloud	Significant operational overhead
Modal	Developer-friendly, fast iterations	Vendor lock-in, less AWS integration
Replicate	Simple deployment, built-in scaling	Limited customization

AgentCore wins for AI-native workloads for company already using AWS, because it's designed specifically for agents—session isolation, long-running execution, and integrated observability come built-in and it is in the AWS ecosystem. You can further strengthen security by deploying your agents in a VPC and using AgentCore Identity to pass user identity while interacting with your agents.

Monitoring

Why Monitoring Matters

Agents are mostly non-deterministic systems. The same input can produce different outputs, take different execution paths, and consume varying amounts of resources. Without observability, you're flying blind:

Debugging: When an agent fails, you need traces to understand why
Optimization: Identify slow sub-agents, expensive model calls, and inefficient tool usage
Quality Assurance: Track response quality over time and detect regressions
Cost Control: Understand token consumption and optimize model selection

Observability isn't optional for production agents—it's a fundamental requirement.

LangFuse

What is LangFuse?

LangFuse is an open-source LLM engineering platform that provides comprehensive observability for AI applications. It offers:

Distributed Tracing: Visualize the complete execution flow of your agent
Token Analytics: Track input/output tokens and costs per model
Evaluation Framework: Score and assess agent responses systematically
Prompt Management: Version and A/B test system prompts
Session Tracking: Group related traces by user session

I like it because it is open-source, framework agnostic, has lot of production needed features like datasets & LLM as a judge for Non Regression Testing, Prompt management, and more. May be I will do a deep dive article on this. What I like the most is the simplicity, the dashboards, and the overall UI—it is one of the nicest UIs I've ever seen for LLM observability.

LangFuse integrates via OpenTelemetry (OTEL), making it compatible with any framework that supports OTEL instrumentation—including Strands Agents and Bedrock AgentCore.

Integrating LangFuse with Strands Agents & Bedrock AgentCore

Strands Agents has first-class support for OpenTelemetry through the StrandsTelemetry module. Integration requires just a few lines:

from strands.telemetry import StrandsTelemetry

def initialize_telemetry():
    """Initialize OTEL exporter for LangFuse."""
    strands_telemetry = StrandsTelemetry()
    strands_telemetry.setup_otlp_exporter()

The magic happens through environment variables that configure the OTEL exporter:

OTEL_EXPORTER_OTLP_ENDPOINT=https://cloud.langfuse.com/api/public/otel
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic <base64-encoded-credentials>

In our Terraform module, these are automatically configured when you enable LangFuse:

# From terraform/modules/agentcore/locals.tf
langfuse_env_vars = var.enable_langfuse ? {
  DISABLE_ADOT_OBSERVABILITY  = "true"
  OTEL_EXPORTER_OTLP_ENDPOINT = "https://cloud.langfuse.com/api/public/otel"
  OTEL_EXPORTER_OTLP_PROTOCOL = "http/protobuf"
  OTEL_EXPORTER_OTLP_HEADERS  = local.langfuse_auth_header
} : {}

Et voilà, you are ready to go. Don't forget any of these variables—if one is missing, it will not work.

And obviously you can ask: how can I deploy LangFuse and authenticate the tenant?

You have two options for deployment. You can use the cloud version, so there is nothing to deploy, but all your data will go to LangFuse's servers, and you will need to pay if you want all the features without limits. The other option is to deploy your own LangFuse instance on ECS or EKS. For our demo purposes, we will go with the cloud version: we do not have any critical information, and the free tier is more than sufficient.

What You'll See in LangFuse

Once deployed, every agent invocation creates a detailed trace of all your agent actions and interactions:

Trace Structure for DeepSearch:

Root Span: The main agent invocation
Orchestrator Spans: Lead searcher planning and coordination
Sub-Agent Spans: Individual research agents executing as a task tool call
Tool Spans: Internet searches, file writes
Model Spans: Each LLM call with tokens and latency

You really have everything you need to debug your agent, and to understand what it did, and how it did it, the intermediate output and the final ones.

It can be really useful for debug too, you have your errors and you can see the retries, for example: LangFuse LLM Interaction with Error

For all our internet tool calls, we can, for example, get the inputs and the outputs. This is very powerful. As Anthropic said:

Think like your agents. To iterate on prompts, you must understand their effects, watch your agents work step-by-step. This immediately revealed failure modes: agents continuing when they already had sufficient results, using overly verbose search queries, or selecting incorrect tools. Effective prompting relies on developing an accurate mental model of the agent, which can make the most impactful changes obvious.

Use Langfuse for this. LangFuse Tool Call Detail

Plus, you have some: Key Metrics Available:

Latency: Total execution time and per-step breakdown
Token Usage: Input/output tokens by model
Cost: Estimated cost per trace and aggregate
Success Rate: Track failures and error patterns

LangFuse Alternatives

Other observability platforms worth considering:

Platform	Strengths	Considerations
LangSmith	Deep LangChain integration	Closed source, LangChain-focused
AWS CloudWatch	Native AWS integration, free tier	Less AI-specific, manual setup
Helicone	Simple proxy-based approach	Limited multi-agent support
Arize Phoenix	Strong evaluation features	Steeper learning curve
Weights & Biases	Excellent experiment tracking	More ML-focused than LLM-focused

LangFuse stands out for its open-source nature, framework agnosticism, and excellent OTEL integration, Excellent UIX—making it ideal for Strands Agents on AgentCore.

The Complete Terraform Module

Deployment Walkthrough

Prerequisites

AWS Account with permissions to create IAM roles, S3 buckets, and AgentCore resources
Terraform >= 1.0 installed
uv (Python package manager) installed
AWS CLI configured with appropriate credentials
Secrets Created in AWS Secrets Manager:
- langfuse/api-key containing LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY
- linkup/api-key (or your preferred search tool API key)

Step 1: Configure Variables

Create your terraform.tfvars:

agent_name = "deepsearch"
region     = "us-east-1"

environment_variables = {
  LOG_LEVEL            = "INFO"
  BYPASS_TOOL_CONSENT  = "true"
}

secrets_names = {
  LINKUP_API_KEY = "linkup/api-key"
}

enable_memory         = true
create_outputs_bucket = true

tags = {
  Environment = "production"
  Project     = "deepsearch"
}

Step 2: Initialize and Plan

cd terraform
terraform init
terraform plan

Review the plan carefully—it will create:

IAM role with scoped permissions
S3 bucket for deployment artifacts
S3 bucket for research outputs
AgentCore Memory resource
AgentCore Runtime with your agent code

Step 3: Apply

terraform apply

Terraform will:

Package your Python dependencies for ARM64
Create the ZIP deployment package
Upload to S3
Create the AgentCore runtime
Configure environment variables and secrets access

Step 4: Verify Deployment

Check the AgentCore console to confirm your agent is running:

Step 5: Test the Agent

Use the provided invoke script:

import boto3
import json
import uuid

client = boto3.client("bedrock-agentcore", region_name="us-east-1")
payload = json.dumps({"prompt": "What is the current state of AI safety in 2025?"})

runtime_session_id = f"session-{uuid.uuid4()}"

response = client.invoke_agent_runtime(
    agentRuntimeArn="arn:aws:bedrock-agentcore:us-east-1:YOUR_ACCOUNT:runtime/deepsearch-XXXXX",
    runtimeSessionId=runtime_session_id,
    payload=payload,
    qualifier="DEFAULT",
)

response_data = json.loads(response["response"].read())
print("Agent Response:", response_data)

Step 6: View Traces in LangFuse

Navigate to your LangFuse project to see the trace

My Personal Flow for Agent Deployment

After many iterations, here's the workflow that works best for me:

Phase 1: Local Development

Create the agent code, test it locally, don't think about deployment yet
Focus on functionality—get the agent working correctly first
Iterate rapidly with short feedback loops

Phase 2: Add Observability

Add monitoring (LangFuse integration) to the agent
Test locally with tracing enabled
Review traces to understand agent behavior and identify issues

Phase 3: AgentCore Integration

Create the AgentCore runtime file (runtime.py)

Test locally using python runtime.py and curl:

curl -X POST http://localhost:8080/invocations \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Test query"}'

Fix all errors until the agent works identically to standalone mode

Phase 4: Infrastructure

Create Terraform code (you can use existing module)
Deploy to AgentCore
Test the deployed agent with real requests
Monitor in LangFuse and iterate

Production Considerations

Cost Optimization

Model Selection: Use Haiku or other lighted for simple tasks (citations), Sonnet or other Frontier for complex reasoning (orchestration)
Session Timeouts: Configure appropriate idle_runtime_session_timeout to avoid paying for idle sessions
Memory Expiry: Set memory_event_expiry_duration based on your retention needs

Security Best Practices

Never hardcode credentials—use Secrets Manager
Scope IAM policies to minimum required permissions
Enable VPC mode for sensitive workloads (network_mode = "PRIVATE")
Audit CloudWatch logs regularly

Scaling Considerations

AgentCore scales automatically, but be mindful of:
- Bedrock quotas: Request limit increases for production traffic
- Secrets Manager: Rate limits on GetSecretValue calls
- S3 throughput: Use appropriate storage class for outputs

Key Takeaways

AgentCore Runtime provides serverless, session-isolated execution for AI agents—perfect for production workloads
Terraform modules enable reproducible, version-controlled infrastructure that follows IaC best practices
LangFuse integration via OpenTelemetry gives you complete visibility into agent behavior, costs, and performance
Automated packaging handles the complexity of Python dependencies for ARM64 architecture
Secrets management keeps credentials secure while making them available to your agent at runtime
The complete pipeline—from local development to production deployment—can be accomplished with minimal code changes thanks to the AgentCore runtime contract

This concludes the Deep Agents trilogy. We've journeyed from the conceptual foundations of multi-agent orchestration, through the practical implementation of a DeepSearch system, to production-ready deployment with comprehensive observability.

The code for everything discussed in this series is available on GitHub: strands-deep-agents

Feel free to reach out if you have questions or feedback.

PA,