Architecture Overview
AutoDB is built around three specialized agents and a text-to-SQL pipeline that work together: the Migration Agent (analyzes and executes schema changes), the Optimizer Agent (analyzes query performance), the Context Engine (provides schema and business context to the other agents), and the Text-to-SQL pipeline (generates validated SQL from natural language questions).
Every operation runs through a FastAPI service backed by PostgreSQL with pgvector for document storage. Migrations are tested in an ephemeral sandbox before anything touches production. The system is designed so that each agent can operate independently, but they produce better results when the Context Engine has been set up with schema snapshots and business documentation.
Agent Interaction Flow
Section titled “Agent Interaction Flow”The following diagram shows how data flows through the system when a client makes an API request. Each agent handles a specific type of analysis, and they all draw on the RAG pipeline for business context when it is available.
graph TD Client[Client Application] -->|API Request| ALB[API Gateway] ALB -->|Route| API[FastAPI Service] API -->|POST /analyze| MA[Migration Agent] API -->|POST /optimize| OA[Optimizer Agent] API -->|POST /context/query| CA[Context Agent] API -->|POST /queries/generate| TTS[Text-to-SQL Pipeline] MA -->|Clone schema + test| SB[Sandbox Environment] MA -->|Get business context| RAG[RAG Pipeline] OA -->|EXPLAIN query| DB[(Client Database)] OA -->|Get business context| RAG CA -->|Schema metadata| SS[Schema Snapshot] CA -->|Semantic search| RAG TTS -->|Schema + doc context| RAG TTS -->|Generate SQL| BDK[Bedrock Claude Sonnet] TTS -->|Execute validated SQL| DB RAG -->|Embedded chunks| VS[(pgvector Store)] RAG -->|Uploaded docs| DOCS[Document Store] MA -->|Risk report + approval token| API OA -->|Plan + alternatives + indexes| API CA -->|Schema + document context| API TTS -->|SQL + results as markdown| API API -->|JSON Response| Client API -->|POST /execute| EX[Executor] EX -->|Apply migration| DB
Migration Agent parses SQL migrations into individual statements, scores each for risk (0—100), maps blast radius across foreign key relationships, and tests the migration in an ephemeral sandbox. It returns a risk report with an approval token for the two-step execution flow: analyze first, then execute only after review.
Optimizer Agent runs EXPLAIN (or EXPLAIN ANALYZE) on the client database, parses the execution plan into a structured tree, generates ranked rewrite alternatives validated by pglast, and produces index recommendations with storage and write-overhead estimates. It combines rule-based heuristics with LLM analysis.
Context Engine combines schema metadata from database introspection with business documentation uploaded by the user. Other agents query the Context Engine to enrich their analysis with business understanding — for example, knowing which business processes depend on a table being altered.
Text-to-SQL Pipeline takes a natural language question, retrieves relevant schema and document context via pgvector similarity search, generates a validated SQL query using Claude Sonnet on AWS Bedrock, and executes it in a read-only rolled-back transaction. It includes pglast syntax validation, AST-level hallucination detection (with one self-correction retry), a 10-second statement timeout, and a 500-row result cap.
Infrastructure
Section titled “Infrastructure”The following diagram shows the AWS deployment topology. All components run within a single VPC across two availability zones.
graph TD
subgraph VPC["AWS VPC (2 AZs)"]
subgraph Public["Public Subnets"]
ALB[Application Load Balancer]
end
subgraph Private["Private Subnets"]
subgraph ECS["ECS Cluster"]
API[API Service
FastAPI container]
SB[Sandbox Task
Postgres 16 Alpine]
end
RDS[(RDS Postgres 16
pgvector)]
end
end
Internet -->|HTTPS| ALB
ALB -->|:8000| API
API -->|:5432| RDS
API -->|RunTask| SB
SM[Secrets Manager] -.->|DB creds + API keys| API
S3[S3 Bucket] -.->|Document ingestion| APIECS on EC2: The API runs as a container with 1024MB memory and 512 CPU units. Sandbox tasks run as separate ECS tasks, each with their own Postgres 16 Alpine container that is created on demand and destroyed after use.
RDS PostgreSQL with pgvector: Stores all application data plus document content for semantic search. The pgvector extension enables cosine similarity queries for the RAG pipeline.
Secrets Manager: Stores database credentials and API keys. The API service retrieves secrets at startup — no credentials are baked into container images.
S3: Stores uploaded documents for asynchronous ingestion into the vector store.
Sandbox Isolation
Section titled “Sandbox Isolation”Every migration analysis spins up an ephemeral Postgres container that clones the client’s schema from the latest introspection snapshot, applies the migration, and reports results. The sandbox is destroyed after each analysis, ensuring migrations are tested without any risk to production data.
If the migration fails in the sandbox (syntax errors, constraint violations, or timeouts), the risk report flags sandbox_result.passed: false with the error details. Only migrations that pass sandbox validation receive an approval token for execution.