PA
Published on

Deep Agents Trilogy #3: From Laptop to Production — Deploy & Monitor with Amazon Bedrock AgentCore, LangFuse & Terraform

"The biggest challenge in AI isn't building agents—it's deploying them reliably at scale while maintaining visibility into their behavior."

📚 The Deep Agents Trilogy

This is Part 3 of a three-part series on building production-ready AI agent systems.

  • #1 Foundations — Understanding deep vs shallow agents and multi-agent orchestration

  • #2 Building DeepSearch — Implement a multi-agent research system from scratch

  • #3 Production Deployment — You are here: Deploy to Amazon Bedrock AgentCore with LangFuse observability

If you can't deploy your agent in a scalable, resilient, and replicable way, it's just a personal toy that will never go to production. Plus, if you can't monitor it, it's a ghost—you have no idea what it's doing, why it's failing, or how to improve it.

The two previous articles in this trilogy focused on the conceptual foundations in Part 1: Foundations and the practical implementation in Part 2: Building DeepSearch. This third and final article completes the journey by tackling what matters most for real-world adoption: production deployment and observability.

We'll deploy our Deep Research Agent to Amazon Bedrock AgentCore using Terraform and instrument it with LangFuse for comprehensive monitoring—all following Infrastructure as Code (IaC) best practices.


Deployment

Why Deployment Matters

Building an agent that works on your laptop is one thing. Making it available to users, scalable under load, and maintainable over time is an entirely different challenge. Production deployment addresses critical concerns that local development ignores:

  • Scalability: Your agent needs to handle multiple concurrent users without degradation
  • Reliability: Session isolation, automatic restarts, and fault tolerance become essential
  • Security: API keys, credentials, and user data must be protected
  • Reproducibility: Every deployment should be identical, traceable, and reversible
  • Cost Management: Clear visibility into resource consumption is a key

Without proper deployment infrastructure, your brilliant agent remains a demo. With it, you have a production-ready service (il ne restera plus qu'a trouver les users, lol).

Bedrock AgentCore

What is Bedrock AgentCore?

Amazon Bedrock AgentCore is AWS's latest service for productionizing AI agents. I see it as the successor to Bedrock Agents, their first attempt at agent productionization. That first service was simple in some ways but also restrictive: it primarily treated tools as Lambdas, was not very flexible, and was not really suitable for complex agents. After testing it myself, I found that it offered some customization, but not enough. It sat awkwardly between ClickOps-style simplicity and full-fledged agent platforms, without fully serving either audience. The former help non-technical users create agents with little to no coding knowledge, while the latter are built for deep-tech teams and developers who want to build complex agents, orchestrate them themselves, and keep almost full control. With AgentCore, AWS clearly decided to go all-in on the full-fledged agent-platform side, while still supporting ClickOps-style experiences through Amazon Q.

Amazon Bedrock AgentCore is a secure, serverless runtime purpose-built for deploying and scaling dynamic AI agents. It's framework-agnostic, supporting Strands Agents, LangChain, LangGraph, CrewAI, and any other Python-based agent framework.

Key capabilities include:

  • Serverless Execution: No infrastructure management—just deploy your code
  • Session Isolation: Each user session runs in a dedicated microVM, protecting sensitive state
  • Auto-Scaling: Scales from zero to thousands of sessions in seconds
  • Protocol Support: Native support for MCP, A2A, and custom protocols
  • Model Flexibility: Use any model from any provider (Bedrock, OpenAI, Anthropic, etc.)
AgentCore Overview

We will focus on two core components: Runtime and Memory.

Bedrock AgentCore Runtime

The AgentCore Runtime is the execution environment where your agent code runs. It provides:

  • HTTP Interface: Your agent exposes /invocations (POST) and /ping (GET) endpoints
  • Environment Injection: Secrets, configuration, and credentials are injected at runtime
  • Logging Integration: You can setup Automatic log collection to CloudWatch

Think of this as a lambda with longer timeout maximum (8 hours) and some ready to use HTTP interfaces The runtime expects your agent to follow a simple contract:

from bedrock_agentcore import BedrockAgentCoreApp

app = BedrockAgentCoreApp()

@app.entrypoint
def invoke(payload, context=None):
    """Process user input and return agent response."""
    prompt = payload.get("prompt", "")
    # Your agent logic here
    return {"result": response}

if __name__ == "__main__":
    app.run()

This decorator-based approach means you can wrap any existing agent with minimal code changes. This entrypoint can execute whatever agent logic you need.

Bedrock AgentCore Memory

AgentCore Memory provides persistent storage for agent state across sessions. For our Deep Research Agent, memory enables:

  • Conversation Continuity: Resume research tasks across multiple interactions
  • Todo Persistence: The orchestrator's task list survives session boundaries
  • Knowledge Accumulation: Build on previous research without starting from scratch

Memory is optional but highly valuable for complex, long-running research tasks.

# Terraform resource for AgentCore Memory
resource "aws_bedrockagentcore_memory" "this" {
  count = var.enable_memory ? 1 : 0

  name                  = local.agent_name_sanitized
  description           = "${var.agent_description} Memory"
  event_expiry_duration = var.memory_event_expiry_duration

  tags = local.tags
}

Deploying Agents using Bedrock AgentCore

There are multiple paths to deploy agents to AgentCore:

MethodBest ForComplexity
CLI (agentcore launch)Quick prototyping, manual deploymentsLow
boto3 SDKProgrammatic deployments, CI/CD pipelinesMedium
TerraformProduction infrastructure, IaC workflowsMedium-High
CloudFormationAWS-native IaC, existing CFN pipelinesMedium-High

For production workloads, IAC (Terraform in our case mainly because my company is biased, and so am I, lol) offers the best balance of power, flexibility, and maintainability. It provides:

  • Declarative Infrastructure: Describe what you want, not how to build it
  • State Management: Track deployed resources and detect drift
  • Plan/Apply Workflow: Preview changes before applying them
  • Module Reusability: Package common patterns for reuse across projects

AgentCore Alternatives

Before committing to AgentCore, consider these alternatives:

PlatformStrengthsConsiderations
AWS LambdaFamiliar, cost-effective for light loadsCold starts, 15-min timeout, no session isolation
AWS FargateFull container control, long-running tasksMore infrastructure to manage
Amazon EKSMaximum flexibility, multi-cloudSignificant operational overhead
ModalDeveloper-friendly, fast iterationsVendor lock-in, less AWS integration
ReplicateSimple deployment, built-in scalingLimited customization

AgentCore wins for AI-native workloads for company already using AWS, because it's designed specifically for agents—session isolation, long-running execution, and integrated observability come built-in and it is in the AWS ecosystem. You can further strengthen security by deploying your agents in a VPC and using AgentCore Identity to pass user identity while interacting with your agents.


Monitoring

Why Monitoring Matters

Agents are mostly non-deterministic systems. The same input can produce different outputs, take different execution paths, and consume varying amounts of resources. Without observability, you're flying blind:

  • Debugging: When an agent fails, you need traces to understand why
  • Optimization: Identify slow sub-agents, expensive model calls, and inefficient tool usage
  • Quality Assurance: Track response quality over time and detect regressions
  • Cost Control: Understand token consumption and optimize model selection

Observability isn't optional for production agents—it's a fundamental requirement.

LangFuse

What is LangFuse?

LangFuse is an open-source LLM engineering platform that provides comprehensive observability for AI applications. It offers:

  • Distributed Tracing: Visualize the complete execution flow of your agent
  • Token Analytics: Track input/output tokens and costs per model
  • Evaluation Framework: Score and assess agent responses systematically
  • Prompt Management: Version and A/B test system prompts
  • Session Tracking: Group related traces by user session

I like it because it is open-source, framework agnostic, has lot of production needed features like datasets & LLM as a judge for Non Regression Testing, Prompt management, and more. May be I will do a deep dive article on this. What I like the most is the simplicity, the dashboards, and the overall UI—it is one of the nicest UIs I've ever seen for LLM observability.

LangFuse Trace View

LangFuse integrates via OpenTelemetry (OTEL), making it compatible with any framework that supports OTEL instrumentation—including Strands Agents and Bedrock AgentCore.

Integrating LangFuse with Strands Agents & Bedrock AgentCore

Strands Agents has first-class support for OpenTelemetry through the StrandsTelemetry module. Integration requires just a few lines:

from strands.telemetry import StrandsTelemetry

def initialize_telemetry():
    """Initialize OTEL exporter for LangFuse."""
    strands_telemetry = StrandsTelemetry()
    strands_telemetry.setup_otlp_exporter()

The magic happens through environment variables that configure the OTEL exporter:

OTEL_EXPORTER_OTLP_ENDPOINT=https://cloud.langfuse.com/api/public/otel
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic <base64-encoded-credentials>

In our Terraform module, these are automatically configured when you enable LangFuse:

# From terraform/modules/agentcore/locals.tf
langfuse_env_vars = var.enable_langfuse ? {
  DISABLE_ADOT_OBSERVABILITY  = "true"
  OTEL_EXPORTER_OTLP_ENDPOINT = "https://cloud.langfuse.com/api/public/otel"
  OTEL_EXPORTER_OTLP_PROTOCOL = "http/protobuf"
  OTEL_EXPORTER_OTLP_HEADERS  = local.langfuse_auth_header
} : {}

Et voilà, you are ready to go. Don't forget any of these variables—if one is missing, it will not work.

And obviously you can ask: how can I deploy LangFuse and authenticate the tenant?

You have two options for deployment. You can use the cloud version, so there is nothing to deploy, but all your data will go to LangFuse's servers, and you will need to pay if you want all the features without limits. The other option is to deploy your own LangFuse instance on ECS or EKS. For our demo purposes, we will go with the cloud version: we do not have any critical information, and the free tier is more than sufficient.

What You'll See in LangFuse

Once deployed, every agent invocation creates a detailed trace of all your agent actions and interactions:

LangFuse Trace View

Trace Structure for DeepSearch:

  1. Root Span: The main agent invocation
  2. Orchestrator Spans: Lead searcher planning and coordination
  3. Sub-Agent Spans: Individual research agents executing as a task tool call
  4. Tool Spans: Internet searches, file writes
  5. Model Spans: Each LLM call with tokens and latency

You really have everything you need to debug your agent, and to understand what it did, and how it did it, the intermediate output and the final ones.

It can be really useful for debug too, you have your errors and you can see the retries, for example: LangFuse LLM Interaction with Error

For all our internet tool calls, we can, for example, get the inputs and the outputs. This is very powerful. As Anthropic said:

Think like your agents. To iterate on prompts, you must understand their effects, watch your agents work step-by-step. This immediately revealed failure modes: agents continuing when they already had sufficient results, using overly verbose search queries, or selecting incorrect tools. Effective prompting relies on developing an accurate mental model of the agent, which can make the most impactful changes obvious.

Use Langfuse for this. LangFuse Tool Call Detail

Plus, you have some: Key Metrics Available:

  • Latency: Total execution time and per-step breakdown
  • Token Usage: Input/output tokens by model
  • Cost: Estimated cost per trace and aggregate
  • Success Rate: Track failures and error patterns

LangFuse Alternatives

Other observability platforms worth considering:

PlatformStrengthsConsiderations
LangSmithDeep LangChain integrationClosed source, LangChain-focused
AWS CloudWatchNative AWS integration, free tierLess AI-specific, manual setup
HeliconeSimple proxy-based approachLimited multi-agent support
Arize PhoenixStrong evaluation featuresSteeper learning curve
Weights & BiasesExcellent experiment trackingMore ML-focused than LLM-focused

LangFuse stands out for its open-source nature, framework agnosticism, and excellent OTEL integration, Excellent UIX—making it ideal for Strands Agents on AgentCore.


The Complete Terraform Module


Deployment Walkthrough

Prerequisites

  1. AWS Account with permissions to create IAM roles, S3 buckets, and AgentCore resources
  2. Terraform >= 1.0 installed
  3. uv (Python package manager) installed
  4. AWS CLI configured with appropriate credentials
  5. Secrets Created in AWS Secrets Manager:
    • langfuse/api-key containing LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY
    • linkup/api-key (or your preferred search tool API key)

Step 1: Configure Variables

Create your terraform.tfvars:

agent_name = "deepsearch"
region     = "us-east-1"

environment_variables = {
  LOG_LEVEL            = "INFO"
  BYPASS_TOOL_CONSENT  = "true"
}

secrets_names = {
  LINKUP_API_KEY = "linkup/api-key"
}

enable_memory         = true
create_outputs_bucket = true

tags = {
  Environment = "production"
  Project     = "deepsearch"
}

Step 2: Initialize and Plan

cd terraform
terraform init
terraform plan

Review the plan carefully—it will create:

  • IAM role with scoped permissions
  • S3 bucket for deployment artifacts
  • S3 bucket for research outputs
  • AgentCore Memory resource
  • AgentCore Runtime with your agent code

Step 3: Apply

terraform apply

Terraform will:

  1. Package your Python dependencies for ARM64
  2. Create the ZIP deployment package
  3. Upload to S3
  4. Create the AgentCore runtime
  5. Configure environment variables and secrets access

Step 4: Verify Deployment

Check the AgentCore console to confirm your agent is running:

Step 5: Test the Agent

Use the provided invoke script:

import boto3
import json
import uuid

client = boto3.client("bedrock-agentcore", region_name="us-east-1")
payload = json.dumps({"prompt": "What is the current state of AI safety in 2025?"})

runtime_session_id = f"session-{uuid.uuid4()}"

response = client.invoke_agent_runtime(
    agentRuntimeArn="arn:aws:bedrock-agentcore:us-east-1:YOUR_ACCOUNT:runtime/deepsearch-XXXXX",
    runtimeSessionId=runtime_session_id,
    payload=payload,
    qualifier="DEFAULT",
)

response_data = json.loads(response["response"].read())
print("Agent Response:", response_data)

Step 6: View Traces in LangFuse

Navigate to your LangFuse project to see the trace


My Personal Flow for Agent Deployment

After many iterations, here's the workflow that works best for me:

Agent Development and Deployment Flow

Phase 1: Local Development

  1. Create the agent code, test it locally, don't think about deployment yet
  2. Focus on functionality—get the agent working correctly first
  3. Iterate rapidly with short feedback loops

Phase 2: Add Observability

  1. Add monitoring (LangFuse integration) to the agent
  2. Test locally with tracing enabled
  3. Review traces to understand agent behavior and identify issues

Phase 3: AgentCore Integration

  1. Create the AgentCore runtime file (runtime.py)
  2. Test locally using python runtime.py and curl:
    curl -X POST http://localhost:8080/invocations \
      -H "Content-Type: application/json" \
      -d '{"prompt": "Test query"}'
    
  3. Fix all errors until the agent works identically to standalone mode

Phase 4: Infrastructure

  1. Create Terraform code (you can use existing module)
  2. Deploy to AgentCore
  3. Test the deployed agent with real requests
  4. Monitor in LangFuse and iterate

Production Considerations

Cost Optimization

  • Model Selection: Use Haiku or other lighted for simple tasks (citations), Sonnet or other Frontier for complex reasoning (orchestration)
  • Session Timeouts: Configure appropriate idle_runtime_session_timeout to avoid paying for idle sessions
  • Memory Expiry: Set memory_event_expiry_duration based on your retention needs

Security Best Practices

  • Never hardcode credentials—use Secrets Manager
  • Scope IAM policies to minimum required permissions
  • Enable VPC mode for sensitive workloads (network_mode = "PRIVATE")
  • Audit CloudWatch logs regularly

Scaling Considerations

  • AgentCore scales automatically, but be mindful of:
    • Bedrock quotas: Request limit increases for production traffic
    • Secrets Manager: Rate limits on GetSecretValue calls
    • S3 throughput: Use appropriate storage class for outputs

Key Takeaways

  • AgentCore Runtime provides serverless, session-isolated execution for AI agents—perfect for production workloads

  • Terraform modules enable reproducible, version-controlled infrastructure that follows IaC best practices

  • LangFuse integration via OpenTelemetry gives you complete visibility into agent behavior, costs, and performance

  • Automated packaging handles the complexity of Python dependencies for ARM64 architecture

  • Secrets management keeps credentials secure while making them available to your agent at runtime

  • The complete pipeline—from local development to production deployment—can be accomplished with minimal code changes thanks to the AgentCore runtime contract


This concludes the Deep Agents trilogy. We've journeyed from the conceptual foundations of multi-agent orchestration, through the practical implementation of a DeepSearch system, to production-ready deployment with comprehensive observability.

The code for everything discussed in this series is available on GitHub: strands-deep-agents

Feel free to reach out if you have questions or feedback.

PA,