- Published on
Deep Agents Trilogy #3: From Laptop to Production — Deploy & Monitor with Amazon Bedrock AgentCore, LangFuse & Terraform
"The biggest challenge in AI isn't building agents—it's deploying them reliably at scale while maintaining visibility into their behavior."
📚 The Deep Agents Trilogy
This is Part 3 of a three-part series on building production-ready AI agent systems.
#1 Foundations — Understanding deep vs shallow agents and multi-agent orchestration
#2 Building DeepSearch — Implement a multi-agent research system from scratch
#3 Production Deployment — You are here: Deploy to Amazon Bedrock AgentCore with LangFuse observability
If you can't deploy your agent in a scalable, resilient, and replicable way, it's just a personal toy that will never go to production. Plus, if you can't monitor it, it's a ghost—you have no idea what it's doing, why it's failing, or how to improve it.
The two previous articles in this trilogy focused on the conceptual foundations in Part 1: Foundations and the practical implementation in Part 2: Building DeepSearch. This third and final article completes the journey by tackling what matters most for real-world adoption: production deployment and observability.
We'll deploy our Deep Research Agent to Amazon Bedrock AgentCore using Terraform and instrument it with LangFuse for comprehensive monitoring—all following Infrastructure as Code (IaC) best practices.
Deployment
Why Deployment Matters
Building an agent that works on your laptop is one thing. Making it available to users, scalable under load, and maintainable over time is an entirely different challenge. Production deployment addresses critical concerns that local development ignores:
- Scalability: Your agent needs to handle multiple concurrent users without degradation
- Reliability: Session isolation, automatic restarts, and fault tolerance become essential
- Security: API keys, credentials, and user data must be protected
- Reproducibility: Every deployment should be identical, traceable, and reversible
- Cost Management: Clear visibility into resource consumption is a key
Without proper deployment infrastructure, your brilliant agent remains a demo. With it, you have a production-ready service (il ne restera plus qu'a trouver les users, lol).
Bedrock AgentCore
What is Bedrock AgentCore?
Amazon Bedrock AgentCore is AWS's latest service for productionizing AI agents. I see it as the successor to Bedrock Agents, their first attempt at agent productionization. That first service was simple in some ways but also restrictive: it primarily treated tools as Lambdas, was not very flexible, and was not really suitable for complex agents. After testing it myself, I found that it offered some customization, but not enough. It sat awkwardly between ClickOps-style simplicity and full-fledged agent platforms, without fully serving either audience. The former help non-technical users create agents with little to no coding knowledge, while the latter are built for deep-tech teams and developers who want to build complex agents, orchestrate them themselves, and keep almost full control. With AgentCore, AWS clearly decided to go all-in on the full-fledged agent-platform side, while still supporting ClickOps-style experiences through Amazon Q.
Amazon Bedrock AgentCore is a secure, serverless runtime purpose-built for deploying and scaling dynamic AI agents. It's framework-agnostic, supporting Strands Agents, LangChain, LangGraph, CrewAI, and any other Python-based agent framework.
Key capabilities include:
- Serverless Execution: No infrastructure management—just deploy your code
- Session Isolation: Each user session runs in a dedicated microVM, protecting sensitive state
- Auto-Scaling: Scales from zero to thousands of sessions in seconds
- Protocol Support: Native support for MCP, A2A, and custom protocols
- Model Flexibility: Use any model from any provider (Bedrock, OpenAI, Anthropic, etc.)

We will focus on two core components: Runtime and Memory.
Bedrock AgentCore Runtime
The AgentCore Runtime is the execution environment where your agent code runs. It provides:
- HTTP Interface: Your agent exposes
/invocations(POST) and/ping(GET) endpoints - Environment Injection: Secrets, configuration, and credentials are injected at runtime
- Logging Integration: You can setup Automatic log collection to CloudWatch
Think of this as a lambda with longer timeout maximum (8 hours) and some ready to use HTTP interfaces The runtime expects your agent to follow a simple contract:
from bedrock_agentcore import BedrockAgentCoreApp
app = BedrockAgentCoreApp()
@app.entrypoint
def invoke(payload, context=None):
"""Process user input and return agent response."""
prompt = payload.get("prompt", "")
# Your agent logic here
return {"result": response}
if __name__ == "__main__":
app.run()
This decorator-based approach means you can wrap any existing agent with minimal code changes. This entrypoint can execute whatever agent logic you need.
Bedrock AgentCore Memory
AgentCore Memory provides persistent storage for agent state across sessions. For our Deep Research Agent, memory enables:
- Conversation Continuity: Resume research tasks across multiple interactions
- Todo Persistence: The orchestrator's task list survives session boundaries
- Knowledge Accumulation: Build on previous research without starting from scratch
Memory is optional but highly valuable for complex, long-running research tasks.
# Terraform resource for AgentCore Memory
resource "aws_bedrockagentcore_memory" "this" {
count = var.enable_memory ? 1 : 0
name = local.agent_name_sanitized
description = "${var.agent_description} Memory"
event_expiry_duration = var.memory_event_expiry_duration
tags = local.tags
}
Deploying Agents using Bedrock AgentCore
There are multiple paths to deploy agents to AgentCore:
| Method | Best For | Complexity |
|---|---|---|
CLI (agentcore launch) | Quick prototyping, manual deployments | Low |
| boto3 SDK | Programmatic deployments, CI/CD pipelines | Medium |
| Terraform | Production infrastructure, IaC workflows | Medium-High |
| CloudFormation | AWS-native IaC, existing CFN pipelines | Medium-High |
For production workloads, IAC (Terraform in our case mainly because my company is biased, and so am I, lol) offers the best balance of power, flexibility, and maintainability. It provides:
- Declarative Infrastructure: Describe what you want, not how to build it
- State Management: Track deployed resources and detect drift
- Plan/Apply Workflow: Preview changes before applying them
- Module Reusability: Package common patterns for reuse across projects
AgentCore Alternatives
Before committing to AgentCore, consider these alternatives:
| Platform | Strengths | Considerations |
|---|---|---|
| AWS Lambda | Familiar, cost-effective for light loads | Cold starts, 15-min timeout, no session isolation |
| AWS Fargate | Full container control, long-running tasks | More infrastructure to manage |
| Amazon EKS | Maximum flexibility, multi-cloud | Significant operational overhead |
| Modal | Developer-friendly, fast iterations | Vendor lock-in, less AWS integration |
| Replicate | Simple deployment, built-in scaling | Limited customization |
AgentCore wins for AI-native workloads for company already using AWS, because it's designed specifically for agents—session isolation, long-running execution, and integrated observability come built-in and it is in the AWS ecosystem. You can further strengthen security by deploying your agents in a VPC and using AgentCore Identity to pass user identity while interacting with your agents.
Monitoring
Why Monitoring Matters
Agents are mostly non-deterministic systems. The same input can produce different outputs, take different execution paths, and consume varying amounts of resources. Without observability, you're flying blind:
- Debugging: When an agent fails, you need traces to understand why
- Optimization: Identify slow sub-agents, expensive model calls, and inefficient tool usage
- Quality Assurance: Track response quality over time and detect regressions
- Cost Control: Understand token consumption and optimize model selection
Observability isn't optional for production agents—it's a fundamental requirement.
LangFuse
What is LangFuse?
LangFuse is an open-source LLM engineering platform that provides comprehensive observability for AI applications. It offers:
- Distributed Tracing: Visualize the complete execution flow of your agent
- Token Analytics: Track input/output tokens and costs per model
- Evaluation Framework: Score and assess agent responses systematically
- Prompt Management: Version and A/B test system prompts
- Session Tracking: Group related traces by user session
I like it because it is open-source, framework agnostic, has lot of production needed features like datasets & LLM as a judge for Non Regression Testing, Prompt management, and more. May be I will do a deep dive article on this. What I like the most is the simplicity, the dashboards, and the overall UI—it is one of the nicest UIs I've ever seen for LLM observability.

LangFuse integrates via OpenTelemetry (OTEL), making it compatible with any framework that supports OTEL instrumentation—including Strands Agents and Bedrock AgentCore.
Integrating LangFuse with Strands Agents & Bedrock AgentCore
Strands Agents has first-class support for OpenTelemetry through the StrandsTelemetry module. Integration requires just a few lines:
from strands.telemetry import StrandsTelemetry
def initialize_telemetry():
"""Initialize OTEL exporter for LangFuse."""
strands_telemetry = StrandsTelemetry()
strands_telemetry.setup_otlp_exporter()
The magic happens through environment variables that configure the OTEL exporter:
OTEL_EXPORTER_OTLP_ENDPOINT=https://cloud.langfuse.com/api/public/otel
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic <base64-encoded-credentials>
In our Terraform module, these are automatically configured when you enable LangFuse:
# From terraform/modules/agentcore/locals.tf
langfuse_env_vars = var.enable_langfuse ? {
DISABLE_ADOT_OBSERVABILITY = "true"
OTEL_EXPORTER_OTLP_ENDPOINT = "https://cloud.langfuse.com/api/public/otel"
OTEL_EXPORTER_OTLP_PROTOCOL = "http/protobuf"
OTEL_EXPORTER_OTLP_HEADERS = local.langfuse_auth_header
} : {}
Et voilà, you are ready to go. Don't forget any of these variables—if one is missing, it will not work.
And obviously you can ask: how can I deploy LangFuse and authenticate the tenant?
You have two options for deployment. You can use the cloud version, so there is nothing to deploy, but all your data will go to LangFuse's servers, and you will need to pay if you want all the features without limits. The other option is to deploy your own LangFuse instance on ECS or EKS. For our demo purposes, we will go with the cloud version: we do not have any critical information, and the free tier is more than sufficient.
What You'll See in LangFuse
Once deployed, every agent invocation creates a detailed trace of all your agent actions and interactions:

Trace Structure for DeepSearch:
- Root Span: The main agent invocation
- Orchestrator Spans: Lead searcher planning and coordination
- Sub-Agent Spans: Individual research agents executing as a task tool call
- Tool Spans: Internet searches, file writes
- Model Spans: Each LLM call with tokens and latency
You really have everything you need to debug your agent, and to understand what it did, and how it did it, the intermediate output and the final ones.

For all our internet tool calls, we can, for example, get the inputs and the outputs. This is very powerful. As Anthropic said:
Think like your agents. To iterate on prompts, you must understand their effects, watch your agents work step-by-step. This immediately revealed failure modes: agents continuing when they already had sufficient results, using overly verbose search queries, or selecting incorrect tools. Effective prompting relies on developing an accurate mental model of the agent, which can make the most impactful changes obvious.

Plus, you have some: Key Metrics Available:
- Latency: Total execution time and per-step breakdown
- Token Usage: Input/output tokens by model
- Cost: Estimated cost per trace and aggregate
- Success Rate: Track failures and error patterns
LangFuse Alternatives
Other observability platforms worth considering:
| Platform | Strengths | Considerations |
|---|---|---|
| LangSmith | Deep LangChain integration | Closed source, LangChain-focused |
| AWS CloudWatch | Native AWS integration, free tier | Less AI-specific, manual setup |
| Helicone | Simple proxy-based approach | Limited multi-agent support |
| Arize Phoenix | Strong evaluation features | Steeper learning curve |
| Weights & Biases | Excellent experiment tracking | More ML-focused than LLM-focused |
LangFuse stands out for its open-source nature, framework agnosticism, and excellent OTEL integration, Excellent UIX—making it ideal for Strands Agents on AgentCore.
The Complete Terraform Module
Deployment Walkthrough
Prerequisites
- AWS Account with permissions to create IAM roles, S3 buckets, and AgentCore resources
- Terraform >= 1.0 installed
- uv (Python package manager) installed
- AWS CLI configured with appropriate credentials
- Secrets Created in AWS Secrets Manager:
langfuse/api-keycontainingLANGFUSE_PUBLIC_KEYandLANGFUSE_SECRET_KEYlinkup/api-key(or your preferred search tool API key)
Step 1: Configure Variables
Create your terraform.tfvars:
agent_name = "deepsearch"
region = "us-east-1"
environment_variables = {
LOG_LEVEL = "INFO"
BYPASS_TOOL_CONSENT = "true"
}
secrets_names = {
LINKUP_API_KEY = "linkup/api-key"
}
enable_memory = true
create_outputs_bucket = true
tags = {
Environment = "production"
Project = "deepsearch"
}
Step 2: Initialize and Plan
cd terraform
terraform init
terraform plan
Review the plan carefully—it will create:
- IAM role with scoped permissions
- S3 bucket for deployment artifacts
- S3 bucket for research outputs
- AgentCore Memory resource
- AgentCore Runtime with your agent code
Step 3: Apply
terraform apply
Terraform will:
- Package your Python dependencies for ARM64
- Create the ZIP deployment package
- Upload to S3
- Create the AgentCore runtime
- Configure environment variables and secrets access
Step 4: Verify Deployment
Check the AgentCore console to confirm your agent is running:
Step 5: Test the Agent
Use the provided invoke script:
import boto3
import json
import uuid
client = boto3.client("bedrock-agentcore", region_name="us-east-1")
payload = json.dumps({"prompt": "What is the current state of AI safety in 2025?"})
runtime_session_id = f"session-{uuid.uuid4()}"
response = client.invoke_agent_runtime(
agentRuntimeArn="arn:aws:bedrock-agentcore:us-east-1:YOUR_ACCOUNT:runtime/deepsearch-XXXXX",
runtimeSessionId=runtime_session_id,
payload=payload,
qualifier="DEFAULT",
)
response_data = json.loads(response["response"].read())
print("Agent Response:", response_data)
Step 6: View Traces in LangFuse
Navigate to your LangFuse project to see the trace
My Personal Flow for Agent Deployment
After many iterations, here's the workflow that works best for me:

Phase 1: Local Development
- Create the agent code, test it locally, don't think about deployment yet
- Focus on functionality—get the agent working correctly first
- Iterate rapidly with short feedback loops
Phase 2: Add Observability
- Add monitoring (LangFuse integration) to the agent
- Test locally with tracing enabled
- Review traces to understand agent behavior and identify issues
Phase 3: AgentCore Integration
- Create the AgentCore runtime file (
runtime.py) - Test locally using
python runtime.pyand curl:curl -X POST http://localhost:8080/invocations \ -H "Content-Type: application/json" \ -d '{"prompt": "Test query"}' - Fix all errors until the agent works identically to standalone mode
Phase 4: Infrastructure
- Create Terraform code (you can use existing module)
- Deploy to AgentCore
- Test the deployed agent with real requests
- Monitor in LangFuse and iterate
Production Considerations
Cost Optimization
- Model Selection: Use Haiku or other lighted for simple tasks (citations), Sonnet or other Frontier for complex reasoning (orchestration)
- Session Timeouts: Configure appropriate
idle_runtime_session_timeoutto avoid paying for idle sessions - Memory Expiry: Set
memory_event_expiry_durationbased on your retention needs
Security Best Practices
- Never hardcode credentials—use Secrets Manager
- Scope IAM policies to minimum required permissions
- Enable VPC mode for sensitive workloads (
network_mode = "PRIVATE") - Audit CloudWatch logs regularly
Scaling Considerations
- AgentCore scales automatically, but be mindful of:
- Bedrock quotas: Request limit increases for production traffic
- Secrets Manager: Rate limits on
GetSecretValuecalls - S3 throughput: Use appropriate storage class for outputs
Key Takeaways
AgentCore Runtime provides serverless, session-isolated execution for AI agents—perfect for production workloads
Terraform modules enable reproducible, version-controlled infrastructure that follows IaC best practices
LangFuse integration via OpenTelemetry gives you complete visibility into agent behavior, costs, and performance
Automated packaging handles the complexity of Python dependencies for ARM64 architecture
Secrets management keeps credentials secure while making them available to your agent at runtime
The complete pipeline—from local development to production deployment—can be accomplished with minimal code changes thanks to the AgentCore runtime contract
This concludes the Deep Agents trilogy. We've journeyed from the conceptual foundations of multi-agent orchestration, through the practical implementation of a DeepSearch system, to production-ready deployment with comprehensive observability.
The code for everything discussed in this series is available on GitHub: strands-deep-agents
Feel free to reach out if you have questions or feedback.
PA,