jeremy-adk-terraform
by jeremylongshorev1.0.0
Terraform infrastructure as code for ADK and Vertex AI Agent Engine deployments
Documentation
# Jeremy ADK Terraform
**🎯 VERTEX AI AGENT ENGINE INFRASTRUCTURE ONLY**
Terraform infrastructure-as-code specialist for deploying **ADK agents to Vertex AI Agent Engine** with complete observability, security, and production-ready configurations.
## ⚠️ Important: What This Plugin Is For
**✅ THIS PLUGIN IS FOR:**
- **Vertex AI Agent Engine** infrastructure (fully-managed runtime)
- **ADK agents** Terraform deployment (`google_vertex_ai_reasoning_engine`)
- **Agent Engine features**: Code Execution Sandbox, Memory Bank, VPC-SC, IAM
- **Observability infrastructure**: Cloud Monitoring dashboards, alert policies, BigQuery connectors
- **Production ADK deployments** with security hardening
**❌ THIS PLUGIN IS NOT FOR:**
- Cloud Run deployments (use `jeremy-genkit-terraform`)
- LangChain/LlamaIndex on other platforms
- Self-hosted agent infrastructure
- Non-Agent Engine Terraform
## Overview
This plugin provides Terraform modules and configurations for deploying ADK agents to Vertex AI Agent Engine with production-ready infrastructure including VPC Service Controls, IAM least privilege, Code Execution Sandbox configuration, Memory Bank setup, observability dashboards, and BigQuery analytics connectors.
**Key Infrastructure Components:**
- `google_vertex_ai_reasoning_engine` resource for Agent Engine
- VPC Service Controls perimeter
- IAM roles and service accounts
- Cloud Monitoring dashboards and alerts
- BigQuery datasets for agent analytics
- Cloud Storage buckets for artifacts
- Secret Manager for credentials
## Installation
```bash
/plugin install jeremy-adk-terraform@claude-code-plugins-plus
```
## Prerequisites & Dependencies
### Required Tools
**1. Terraform:**
```bash
# Install Terraform 1.5+
wget https://releases.hashicorp.com/terraform/1.6.0/terraform_1.6.0_linux_amd64.zip
unzip terraform_1.6.0_linux_amd64.zip
sudo mv terraform /usr/local/bin/
# Verify
terraform version # Should show 1.5.0+
```
**2. gcloud CLI:**
```bash
# Install gcloud
curl https://sdk.cloud.google.com | bash
exec -l $SHELL
# Update to latest
gcloud components update
# Install alpha commands (for Agent Engine)
gcloud components install alpha
# Authenticate
gcloud auth application-default login
```
**3. Terraform Google Provider:**
```hcl
terraform {
required_version = ">= 1.5.0"
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
google-beta = {
source = "hashicorp/google-beta"
version = "~> 5.0"
}
}
}
```
### Required Google Cloud APIs
```bash
# Enable all required APIs
gcloud services enable \
aiplatform.googleapis.com \
discoveryengine.googleapis.com \
logging.googleapis.com \
monitoring.googleapis.com \
cloudtrace.googleapis.com \
bigquery.googleapis.com \
storage.googleapis.com \
secretmanager.googleapis.com \
compute.googleapis.com \
servicenetworking.googleapis.com \
--project=YOUR_PROJECT_ID
```
### Required IAM Permissions
```yaml
# Service account for Terraform needs:
- roles/aiplatform.admin # Deploy Agent Engine resources
- roles/iam.serviceAccountAdmin # Create service accounts
- roles/resourcemanager.projectIamAdmin # Manage IAM bindings
- roles/compute.networkAdmin # VPC configuration
- roles/monitoring.admin # Create dashboards/alerts
- roles/bigquery.admin # Create datasets
- roles/storage.admin # Create buckets
```
## Features
✅ **Agent Engine Deployment**: `google_vertex_ai_reasoning_engine` with ADK framework
✅ **Code Execution Sandbox**: Secure isolation, state persistence (1-14 days TTL)
✅ **Memory Bank**: Firestore-backed persistent memory with retention policies
✅ **VPC Service Controls**: Perimeter security for Agent Engine
✅ **IAM Least Privilege**: Service accounts with minimal permissions
✅ **Observability Infrastructure**: Dashboards, alerts, SLOs, token tracking
✅ **BigQuery Analytics**: Automated log export and analytics datasets
✅ **Cloud Storage**: Artifact storage with lifecycle policies
✅ **Secret Management**: API keys and credentials in Secret Manager
✅ **Multi-Region**: Agent Engine deployments in multiple regions
## Quick Start
### Natural Language Activation
Simply mention what you need:
```
"Create Terraform for ADK agent deployment to Agent Engine"
"Provision Agent Engine infrastructure with VPC-SC"
"Deploy ADK agent with Code Execution and Memory Bank"
"Set up observability infrastructure for Agent Engine"
"Create multi-region Agent Engine deployment"
```
The skill auto-activates and generates production-ready Terraform.
## Terraform Module Structure
### Core Agent Engine Resource
```hcl
# main.tf
resource "google_vertex_ai_reasoning_engine" "adk_agent" {
display_name = var.agent_name
region = var.region
project = var.project_id
spec {
# REQUIRED: Specify ADK framework
agent_framework = "google-adk"
# Agent package (ADK agent code)
package_spec {
pickle_object_gcs_uri = google_storage_bucket_object.agent_package.self_link
python_version = "3.12"
requirements_gcs_uri = google_storage_bucket_object.requirements.self_link
}
# Runtime configuration
runtime_config {
# Code Execution Sandbox
code_execution_config {
enabled = true
state_persistence_ttl_days = var.code_exec_ttl_days # 1-14 days
}
# Memory Bank
memory_bank_config {
enabled = true
max_memories = var.memory_bank_max_memories
retention_days = var.memory_bank_retention_days
}
# Auto-scaling
auto_scaling_config {
min_replica_count = var.min_replicas
max_replica_count = var.max_replicas
}
# VPC configuration
vpc_config {
network = google_compute_network.agent_vpc.id
subnetwork = google_compute_subnetwork.agent_subnet.id
}
}
# Security
encryption_config {
kms_key_name = google_kms_crypto_key.agent_key.id
}
}
# Service account
service_account = google_service_account.agent_sa.email
# Model Armor (prompt injection protection)
model_armor_enabled = true
depends_on = [
google_project_service.aiplatform,
google_storage_bucket_object.agent_package
]
}
```
### IAM Configuration
```hcl
# iam.tf
resource "google_service_account" "agent_sa" {
account_id = "${var.agent_name}-sa"
display_name = "Service Account for ${var.agent_name}"
project = var.project_id
}
# Least privilege permissions
resource "google_project_iam_member" "agent_permissions" {
for_each = toset([
"roles/aiplatform.user",
"roles/logging.logWriter",
"roles/monitoring.metricWriter",
"roles/cloudtrace.agent",
"roles/discoveryengine.editor" # For Memory Bank
])
project = var.project_id
role = each.value
member = "serviceAccount:${google_service_account.agent_sa.email}"
}
# Code Execution Sandbox permissions (minimal)
resource "google_service_account" "code_exec_sa" {
account_id = "${var.agent_name}-code-exec-sa"
display_name = "Code Execution SA for ${var.agent_name}"
}
resource "google_project_iam_member" "code_exec_permissions" {
for_each = toset([
"roles/storage.objectViewer", # Read artifacts only
"roles/logging.logWriter"
])
project = var.project_id
role = each.value
member = "serviceAccount:${google_service_account.code_exec_sa.email}"
}
```
### VPC Service Controls
```hcl
# vpc_sc.tf
resource "google_access_context_manager_service_perimeter" "agent_perimeter" {
parent = "accessPolicies/${var.access_policy_id}"
name = "accessPolicies/${var.access_policy_id}/servicePerimeters/${var.agent_name}_perimeter"
title = "${var.agent_name} Agent Engine Perimeter"
status {
restricted_services = [
"aiplatform.googleapis.com",
"discoveryengine.googleapis.com",
"storage.googleapis.com"
]
resources = [
"projects/${data.google_project.project.number}"
]
vpc_accessible_services {
enable_restriction = true
allowed_services = [
"aiplatform.googleapis.com",
"discoveryengine.googleapis.com"
]
}
}
}
```
### Observability Infrastructure (2025 Features)
```hcl
# monitoring.tf
# Cloud Monitoring Dashboard
resource "google_monitoring_dashboard" "agent_dashboard" {
dashboard_json = jsonencode({
displayName = "${var.agent_name} Agent Engine Dashboard"
mosaicLayout = {
columns = 12
tiles = [
# Request Volume
{
width = 6
height = 4
widget = {
title = "Request Volume"
xyChart = {
dataSets = [{
timeSeriesQuery = {
timeSeriesFilter = {
filter = "metric.type=\"aiplatform.googleapis.com/agent/request_count\" resource.type=\"aiplatform.googleapis.com/Agent\""
}
}
}]
}
}
},
# Error Rate
{
xPos = 6
width = 6
height = 4
widget = {
title = "Error Rate"
xyChart = {
dataSets = [{
timeSeriesQuery = {
timeSeriesFilter = {
filter = "metric.type=\"aiplatform.googleapis.com/agent/error_count\" resource.type=\"aiplatform.googleapis.com/Agent\""
}
}
}]
}
}
},
# Latency Distribution
{
yPos = 4
width = 6
height = 4
widget = {
title = "Latency (p50, p95, p99)"
xyChart = {
dataSets = [{
timeSeriesQuery = {
timeSeriesFilter = {
filter = "metric.type=\"aiplatform.googleapis.com/agent/prediction_latencies\" resource.type=\"aiplatform.googleapis.com/Agent\""
}
}
}]
}
}
},
# Token Usage
{
xPos = 6
yPos = 4
width = 6
height = 4
widget = {
title = "Token Usage"
xyChart = {
dataSets = [{
timeSeriesQuery = {
timeSeriesFilter = {
filter = "metric.type=\"aiplatform.googleapis.com/agent/token_count\" resource.type=\"aiplatform.googleapis.com/Agent\""
}
}
}]
}
}
}
]
}
})
}
# Alert Policy: High Error Rate
resource "google_monitoring_alert_policy" "high_error_rate" {
display_name = "${var.agent_name} - High Error Rate"
combiner = "OR"
conditions {
display_name = "Error rate > 5% for 5 minutes"
condition_threshold {
filter = "metric.type=\"aiplatform.googleapis.com/agent/error_count\" resource.type=\"aiplatform.googleapis.com/Agent\""
duration = "300s"
comparison = "COMPARISON_GT"
threshold_value = 0.05
aggregations {
alignment_period = "60s"
per_series_aligner = "ALIGN_RATE"
}
}
}
notification_channels = [google_monitoring_notification_channel.email.id]
alert_strategy {
auto_close = "86400s"
}
}
# Alert Policy: High Latency
resource "google_monitoring_alert_policy" "high_latency" {
display_name = "${var.agent_name} - High Latency"
combiner = "OR"
conditions {
display_name = "P95 latency > 10s"
condition_threshold {
filter = "metric.type=\"aiplatform.googleapis.com/agent/prediction_latencies\" resource.type=\"aiplatform.googleapis.com/Agent\""
duration = "300s"
comparison = "COMPARISON_GT"
threshold_value = 10000 # milliseconds
aggregations {
alignment_period = "60s"
per_series_aligner = "ALIGN_DELTA"
cross_series_reducer = "REDUCE_PERCENTILE_95"
}
}
}
notification_channels = [google_monitoring_notification_channel.email.id]
}
# Notification Channel
resource "google_monitoring_notification_channel" "email" {
display_name = "${var.agent_name} Alerts"
type = "email"
labels = {
email_address = var.alert_email
}
}
```
### BigQuery Analytics (2025 Feature)
```hcl
# bigquery.tf
# Dataset for agent analytics
resource "google_bigquery_dataset" "agent_analytics" {
dataset_id = "${replace(var.agent_name, "-", "_")}_analytics"
location = var.region
project = var.project_id
description = "Analytics dataset for ${var.agent_name}"
delete_contents_on_destroy = false
}
# Table for agent logs
resource "google_bigquery_table" "agent_logs" {
dataset_id = google_bigquery_dataset.agent_analytics.dataset_id
table_id = "agent_logs"
project = var.project_id
deletion_protection = true
time_partitioning {
type = "DAY"
field = "timestamp"
}
schema = jsonencode([
{
name = "timestamp"
type = "TIMESTAMP"
mode = "REQUIRED"
},
{
name = "agent_id"
type = "STRING"
mode = "REQUIRED"
},
{
name = "task_id"
type = "STRING"
mode = "NULLABLE"
},
{
name = "session_id"
type = "STRING"
mode = "NULLABLE"
},
{
name = "latency_ms"
type = "FLOAT"
mode = "NULLABLE"
},
{
name = "input_tokens"
type = "INTEGER"
mode = "NULLABLE"
},
{
name = "output_tokens"
type = "INTEGER"
mode = "NULLABLE"
},
{
name = "error_count"
type = "INTEGER"
mode = "NULLABLE"
},
{
name = "error_message"
type = "STRING"
mode = "NULLABLE"
},
{
name = "component"
type = "STRING"
mode = "NULLABLE"
description = "AGENT_QUERIES, MEMORY_BANK_OPERATIONS, CODE_EXECUTION_EVENTS, A2A_PROTOCOL_CALLS"
}
])
}
# Log sink to BigQuery
resource "google_logging_project_sink" "agent_to_bigquery" {
name = "${var.agent_name}-to-bigquery"
destination = "bigquery.googleapis.com/projects/${var.project_id}/datasets/${google_bigquery_dataset.agent_analytics.dataset_id}"
filter = "resource.type=\"aiplatform.googleapis.com/Agent\" resource.labels.agent_id=\"${google_vertex_ai_reasoning_engine.adk_agent.name}\""
unique_writer_identity = true
bigquery_options {
use_partitioned_tables = true
}
}
# Grant sink permission to write to BigQuery
resource "google_bigquery_dataset_iam_member" "sink_writer" {
dataset_id = google_bigquery_dataset.agent_analytics.dataset_id
role = "roles/bigquery.dataEditor"
member = google_logging_project_sink.agent_to_bigquery.writer_identity
}
```
### Cloud Storage for Artifacts
```hcl
# storage.tf
# Bucket for agent artifacts
resource "google_storage_bucket" "agent_artifacts" {
name = "${var.project_id}-${var.agent_name}-artifacts"
location = var.region
project = var.project_id
uniform_bucket_level_access = true
versioning {
enabled = true
}
lifecycle_rule {
condition {
age = 90
matches_prefix = ["executions/", "logs/"]
}
action {
type = "Delete"
}
}
lifecycle_rule {
condition {
age = 30
matches_prefix = ["executions/"]
num_newer_versions = 3
}
action {
type = "Delete"
}
}
}
# Grant agent access to artifacts
resource "google_storage_bucket_iam_member" "agent_artifacts_viewer" {
bucket = google_storage_bucket.agent_artifacts.name
role = "roles/storage.objectViewer"
member = "serviceAccount:${google_service_account.agent_sa.email}"
}
# Upload agent package
resource "google_storage_bucket_object" "agent_package" {
name = "agent-packages/${var.agent_name}/agent.pkl"
bucket = google_storage_bucket.agent_artifacts.name
source = var.agent_package_path
}
# Upload requirements
resource "google_storage_bucket_object" "requirements" {
name = "agent-packages/${var.agent_name}/requirements.txt"
bucket = google_storage_bucket.agent_artifacts.name
content = file(var.requirements_path)
}
```
## Variables
```hcl
# variables.tf
variable "project_id" {
description = "Google Cloud project ID"
type = string
}
variable "region" {
description = "Region for Agent Engine deployment"
type = string
default = "us-central1"
}
variable "agent_name" {
description = "Name of the ADK agent"
type = string
}
variable "agent_package_path" {
description = "Local path to agent.pkl file"
type = string
}
variable "requirements_path" {
description = "Local path to requirements.txt"
type = string
}
# Code Execution Sandbox
variable "code_exec_ttl_days" {
description = "State persistence TTL for Code Execution Sandbox (1-14 days)"
type = number
default = 7
validation {
condition = var.code_exec_ttl_days >= 1 && var.code_exec_ttl_days <= 14
error_message = "TTL must be between 1 and 14 days"
}
}
# Memory Bank
variable "memory_bank_max_memories" {
description = "Maximum number of memories to retain"
type = number
default = 100
}
variable "memory_bank_retention_days" {
description = "Memory retention period in days"
type = number
default = 90
}
# Auto-scaling
variable "min_replicas" {
description = "Minimum number of replicas"
type = number
default = 1
}
variable "max_replicas" {
description = "Maximum number of replicas"
type = number
default = 10
}
# Alerting
variable "alert_email" {
description = "Email for monitoring alerts"
type = string
}
```
## Deployment Workflow
### 1. Initialize Terraform
```bash
terraform init
```
### 2. Plan Infrastructure
```bash
terraform plan \
-var="project_id=my-project" \
-var="agent_name=my-adk-agent" \
-var="agent_package_path=./agent.pkl" \
-var="requirements_path=./requirements.txt" \
-var="[email protected]"
```
### 3. Apply Configuration
```bash
terraform apply \
-var="project_id=my-project" \
-var="agent_name=my-adk-agent" \
-var="agent_package_path=./agent.pkl" \
-var="requirements_path=./requirements.txt" \
-var="[email protected]"
```
### 4. Verify Deployment
```bash
# Check Agent Engine status
gcloud alpha ai agent-engines describe \
projects/my-project/locations/us-central1/reasoningEngines/my-adk-agent
# Check monitoring dashboard
gcloud monitoring dashboards list --filter="displayName:my-adk-agent"
# Query BigQuery logs
bq query --use_legacy_sql=false \
'SELECT COUNT(*) as total_queries FROM `my-project.my_adk_agent_analytics.agent_logs`'
```
## Multi-Region Deployment
```hcl
# Multi-region deployment
module "agent_us_central1" {
source = "./modules/agent-engine"
project_id = var.project_id
region = "us-central1"
agent_name = "${var.agent_name}-us-central1"
agent_package_path = var.agent_package_path
requirements_path = var.requirements_path
}
module "agent_europe_west1" {
source = "./modules/agent-engine"
project_id = var.project_id
region = "europe-west1"
agent_name = "${var.agent_name}-europe-west1"
agent_package_path = var.agent_package_path
requirements_path = var.requirements_path
}
# Global load balancer for multi-region
resource "google_compute_global_forwarding_rule" "agent_lb" {
name = "${var.agent_name}-global-lb"
target = google_compute_target_http_proxy.agent_proxy.id
port_range = "80"
}
```
## Integration with Other Plugins
### jeremy-vertex-engine
- Terraform provisions → Engine inspector validates
- Infrastructure deployment → Runtime inspection
### jeremy-adk-orchestrator
- Terraform creates Agent Engine resources → Orchestrator manages A2A communication
- Infrastructure layer → Communication layer
### jeremy-vertex-validator
- Terraform generates configs → Validator checks production readiness
- Infrastructure code → Validation checks
## Use Cases
### Basic ADK Agent Deployment
```
"Create Terraform for ADK agent with Code Execution and Memory Bank"
"Deploy ADK agent to Agent Engine in us-central1"
```
### Production Infrastructure
```
"Create production-ready Agent Engine infrastructure with VPC-SC"
"Deploy ADK agent with observability dashboards and alerts"
```
### Multi-Region Deployment
```
"Deploy ADK agent to multiple regions with global load balancer"
"Create multi-region Agent Engine infrastructure"
```
### Observability Setup
```
"Add monitoring dashboard and alerts to Agent Engine deployment"
"Configure BigQuery analytics for agent logs"
```
## Best Practices
✅ **State Management**: Use remote backend (GCS or Terraform Cloud)
✅ **Secret Management**: Store API keys in Secret Manager, never in code
✅ **IAM Least Privilege**: Grant minimal permissions to service accounts
✅ **VPC Service Controls**: Always enable for production
✅ **Encryption**: Use CMEK keys for data at rest
✅ **Monitoring**: Deploy dashboards and alerts with every agent
✅ **Multi-Region**: Deploy to 2+ regions for high availability
✅ **Lifecycle Policies**: Configure auto-cleanup for old artifacts
✅ **Code Execution TTL**: Set to 7-14 days for production
✅ **Memory Bank**: Configure appropriate retention (90+ days recommended)
## Requirements
- Terraform >= 1.5.0
- Google Cloud Provider >= 5.0
- Google Cloud Project with billing enabled
- Appropriate IAM permissions for Terraform service account
- ADK agent package (agent.pkl) and requirements.txt
- gcloud CLI with alpha components
## License
MIT
## Support
- Issues: https://github.com/jeremylongshore/claude-code-plugins/issues
- Discussions: https://github.com/jeremylongshore/claude-code-plugins/discussions
## Version
1.0.1 (2025) - Agent Engine Terraform with 2025 observability and storage features