Architecture Overview

AutoDB is built around three specialized agents and a text-to-SQL pipeline that work together: the Migration Agent (analyzes and executes schema changes), the Optimizer Agent (analyzes query performance), the Context Engine (provides schema and business context to the other agents), and the Text-to-SQL pipeline (generates validated SQL from natural language questions).

Every operation runs through a FastAPI service backed by PostgreSQL with pgvector for document storage. Migrations are tested in an ephemeral sandbox before anything touches production. The system is designed so that each agent can operate independently, but they produce better results when the Context Engine has been set up with schema snapshots and business documentation.

Agent Interaction Flow

The following diagram shows how data flows through the system when a client makes an API request. Each agent handles a specific type of analysis, and they all draw on the RAG pipeline for business context when it is available.

graph TD
  Client[Client Application] -->|API Request| ALB[API Gateway]
  ALB -->|Route| API[FastAPI Service]

  API -->|POST /analyze| MA[Migration Agent]
  API -->|POST /optimize| OA[Optimizer Agent]
  API -->|POST /context/query| CA[Context Agent]
  API -->|POST /queries/generate| TTS[Text-to-SQL Pipeline]

  MA -->|Clone schema + test| SB[Sandbox Environment]
  MA -->|Get business context| RAG[RAG Pipeline]
  OA -->|EXPLAIN query| DB[(Client Database)]
  OA -->|Get business context| RAG
  CA -->|Schema metadata| SS[Schema Snapshot]
  CA -->|Semantic search| RAG
  TTS -->|Schema + doc context| RAG
  TTS -->|Generate SQL| BDK[Bedrock Claude Sonnet]
  TTS -->|Execute validated SQL| DB

  RAG -->|Embedded chunks| VS[(pgvector Store)]
  RAG -->|Uploaded docs| DOCS[Document Store]

  MA -->|Risk report + approval token| API
  OA -->|Plan + alternatives + indexes| API
  CA -->|Schema + document context| API
  TTS -->|SQL + results as markdown| API
  API -->|JSON Response| Client

  API -->|POST /execute| EX[Executor]
  EX -->|Apply migration| DB

Migration Agent parses SQL migrations into individual statements, scores each for risk (0—100), maps blast radius across foreign key relationships, and tests the migration in an ephemeral sandbox. It returns a risk report with an approval token for the two-step execution flow: analyze first, then execute only after review.

Optimizer Agent runs EXPLAIN (or EXPLAIN ANALYZE) on the client database, parses the execution plan into a structured tree, generates ranked rewrite alternatives validated by pglast, and produces index recommendations with storage and write-overhead estimates. It combines rule-based heuristics with LLM analysis.

Context Engine combines schema metadata from database introspection with business documentation uploaded by the user. Other agents query the Context Engine to enrich their analysis with business understanding — for example, knowing which business processes depend on a table being altered.

Text-to-SQL Pipeline takes a natural language question, retrieves relevant schema and document context via pgvector similarity search, generates a validated SQL query using Claude Sonnet on AWS Bedrock, and executes it in a read-only rolled-back transaction. It includes pglast syntax validation, AST-level hallucination detection (with one self-correction retry), a 10-second statement timeout, and a 500-row result cap.

Infrastructure

The following diagram shows the AWS deployment topology. All components run within a single VPC across two availability zones.

graph TD
  subgraph VPC["AWS VPC (2 AZs)"]
      subgraph Public["Public Subnets"]
          ALB[Application Load Balancer]
      end
      subgraph Private["Private Subnets"]
          subgraph ECS["ECS Cluster"]
              API[API Service
FastAPI container]
              SB[Sandbox Task
Postgres 16 Alpine]
          end
          RDS[(RDS Postgres 16
pgvector)]
      end
  end

  Internet -->|HTTPS| ALB
  ALB -->|:8000| API
  API -->|:5432| RDS
  API -->|RunTask| SB
  SM[Secrets Manager] -.->|DB creds + API keys| API
  S3[S3 Bucket] -.->|Document ingestion| API

ECS on EC2: The API runs as a container with 1024MB memory and 512 CPU units. Sandbox tasks run as separate ECS tasks, each with their own Postgres 16 Alpine container that is created on demand and destroyed after use.

RDS PostgreSQL with pgvector: Stores all application data plus document content for semantic search. The pgvector extension enables cosine similarity queries for the RAG pipeline.

Secrets Manager: Stores database credentials and API keys. The API service retrieves secrets at startup — no credentials are baked into container images.

S3: Stores uploaded documents for asynchronous ingestion into the vector store.

Sandbox Isolation

Every migration analysis spins up an ephemeral Postgres container that clones the client’s schema from the latest introspection snapshot, applies the migration, and reports results. The sandbox is destroyed after each analysis, ensuring migrations are tested without any risk to production data.

If the migration fails in the sandbox (syntax errors, constraint violations, or timeouts), the risk report flags sandbox_result.passed: false with the error details. Only migrations that pass sandbox validation receive an approval token for execution.

Next Steps

Migration Workflow Step-by-step guide to analyzing and executing migrations

Optimizer Usage Submit queries for execution plan analysis and optimization

Context Setup Configure schema introspection and document ingestion