PA
Published on

Deploy AI Agents with Terraform: AWS Bedrock AgentCore Guide

You've built a beautiful AI agent. It runs perfectly on your laptop, answers questions intelligently, and handles edge cases gracefully. Now comes the natural next step: deploying it to the cloud.

This should be straightforward. You're just moving working code from one environment to another. The value is in your agent's intelligence, not in the packaging mechanics.

Yet sometimes, reality has other plans. You run terraform apply, watch the infrastructure spin up, and then see your AgentCore Runtime stuck in CREATE_FAILED. The error is cryptic: "unexpected state 'CREATE_FAILED', wanted target 'READY'". Your code is identical to what worked locally. Your dependencies are declared. Your configuration looks correct. But something in the deployment pipeline has gone wrong.

CREATE_FAILED error from Terraform deployment
The dreaded CREATE_FAILED state with cryptic error messages

These errors rarely come from fundamental problems. Usually, it's something subtle: dependencies built for the wrong architecture, binary wheels resolved incorrectly, or packages compiled for your Mac instead of AWS processors.

This is exactly why I built the terraform-aws-bedrock-agentcore modules. They handle all these edge cases automatically, correct architecture targeting, proper dependency resolution, security best practices—so deployment becomes boring. And boring is good. You shouldn't spend hours debugging packaging issues when your agent already works.

Two Paths to Production

AWS Bedrock AgentCore Runtime offers two deployment methods:

ZIP deployment packages your Python code and dependencies into an archive that gets uploaded to S3. It's fast, needs no Docker, and works great for straightforward dependency trees. The packaging process targets the correct ARM64 architecture, bundles your code, and creates an artifact that should work identically in the cloud.

Container deployment captures your entire runtime environment in a Docker image. You define everything—base OS, system libraries, environment variables—in a Dockerfile. The image gets built, pushed to ECR, and deployed to AgentCore. This gives you complete control, especially valuable for custom configurations or non-Python dependencies.

Both methods, when done right, eliminate the gap between local development and cloud. Your agent should behave identically whether responding locally or handling production traffic.

Why Terraform?

AWS provides a CLI toolkit for quick deployments, but Terraform offers something better for production systems: your deployment becomes code. Version-controlled, reviewable, consistently applied across environments.

The Terraform modules handle packaging, IAM roles, networking, and optional features like AgentCore Memory. They hide the complexity while exposing what you need to customize.

A minimal ZIP deployment looks simple in your Terraform config: agent name, AWS region, path to your code. Behind this, the module examines your pyproject.toml, uses uv to install dependencies for ARM64, packages everything, uploads to S3, creates an IAM role with precise permissions, and spins up the AgentCore Runtime.

Container deployments follow the same pattern but with ECR instead of S3. The module manages the repository, builds your Docker image for the right architecture, pushes it, creates IAM permissions, and deploys.

Security is built in: IAM roles start minimal—just logs and Bedrock access—and add permissions only when you enable features. Enable Memory? Memory permissions added. Create an S3 outputs bucket? Scoped S3 permissions added. No wildcards unless absolutely necessary.

They aim to eliminate the trial-and-error cycle of debugging deployment failures by handling architecture compatibility, dependency resolution, and security configuration automatically. This is how deployment should work: you focus on your agent's intelligence, the infrastructure handles the mechanics.

Each module is self-contained and well-documented. You can browse the repository and pick exactly what fits your use case, use the full ZIP deployment module, cherry-pick the container deployment approach, or extract just the packaging scripts. The modular design means you're not forced into a specific architecture; take what's useful, adapt what needs customization.

Available here: terraform-aws-bedrock-agentcore

The Real Culprits

The most common issues? Binary compatibility. When you pip install a package, you get a wheel file built for your platform. M-series Mac? ARM64 wheels. Windows? x86_64 wheels. These contain compiled binary code that only runs on their target architecture.

AgentCore currently runs on AWS Graviton—ARM64 architecture. Every binary dependency must be compiled for ARM64. Package without targeting the right platform? You'll bundle incompatible binaries.

The ZIP module's packaging script solves this by telling uv to install for aarch64-manylinux2014 platform—matching AgentCore's environment.

This automatic platform targeting is one of the key reasons the modules eliminate deployment errors. You don't need to remember the correct platform string or debug why your local wheels don't work in production. The module handles it.

For packages without ARM64 wheels, container deployment is your friend. The Dockerfile specifies an ARM64 base image and compiles dependencies during build. You control everything.

Secrets management can trips people up too. Environment variables work fine locally but create security risks in production where they're visible in console interfaces. The modules enforce a distinction: non-sensitive config in environment variables, secrets in AWS Secrets Manager accessed at runtime.

Package size matters for ZIP deployments: 250MB compressed limit. Most agents with standard libraries stay well under this. Need large ML models or extensive scientific computing packages? Use containers 1GB limit (vs 250MB for ZIP).

From Config to Running Agent

Agent Development and Deployment Flow - Infrastructure Phase
The four-phase deployment workflow: We're focusing on Phase 4 where Terraform automates infrastructure provisioning

Let's say you've built a Deep Research agent following This. Your main.py imports the AgentCore SDK, creates an agent, defines an entrypoint with @app.entrypoint. Your pyproject.toml lists dependencies. Simple and focused.

To deploy with the ZIP module, you create a Terraform config calling the module. Provide your agent name, region, code path. Reference secrets for API credentials. Add environment variables for endpoints and log levels.

Run terraform init to prepare your workspace. Run terraform plan to see what Terraform will create: S3 bucket, packaged ZIP, IAM role, memory resource, AgentCore Runtime.

Run terraform apply to execute deployment. Packaging scripts install dependencies for ARM64, create the ZIP, upload to S3. IAM role created. Memory provisioned (if configured). AgentCore Runtime spins up, pulls your code, initializes. Minutes later: READY state, accepting requests.

Container deployment? Same flow, different details. Your directory includes a Dockerfile. Terraform builds the image for ARM64, pushes to ECR, creates a runtime referencing the image. Same agent code inside—just packaged differently.

The github repo provide example for both deployments.

Making Deployment Boring

Deploying agents should be boring—in the best sense. It should fade into the background, letting you focus on improving capabilities, expanding knowledge, refining behavior. Friction-free deployment means faster iteration, more experimentation, consistent value delivery.

The terraform-aws-bedrock-agentcore modules exist specifically to make deployment boring. They aim to eliminate the common pitfalls.

AgentCore provides a managed runtime designed for AI agents: scaling, networking, AWS integration handled. Terraform provides declarative infrastructure: repeatable, reviewable, version-controlled. These modules connect the two seamlessly.

When issues arise (and they will), well-structured infrastructure code makes debugging easier. You can examine IAM permissions, verify packaging parameters, check binary architecture, trace the workflow step by step. Error messages might be cryptic, but you have visibility and reproducibility.

We should Focus our energy on making our agent intelligent and useful.

PA,