Query Optimization

Overview

Submit a slow SQL query and AutoDB returns a structured execution plan, ranked rewrite alternatives validated by pglast for syntactic correctness, and index recommendations with storage and write-overhead impact estimates. When RAG documents are available, the optimizer uses your business context to inform its analysis.

Execution Plans

AutoDB parses the PostgreSQL EXPLAIN output into a structured tree that you can traverse programmatically. Key fields in execution_plan:

root — the root node of the plan tree, with node_type (e.g., "Seq Scan", "Index Scan", "Hash Join"), cost estimates, and children for nested operations
total_cost — total estimated cost in PostgreSQL cost units
raw_plan_text — the original EXPLAIN text from PostgreSQL
analyzed — whether real timing data is included (see below)

EXPLAIN ANALYZE

Add ?analyze=true to the query string to run EXPLAIN ANALYZE, which returns actual execution timing alongside the estimates.

curl -X POST "https://api.autodb.app/api/v1/connections/550e8400-e29b-41d4-a716-446655440000/queries/optimize?analyze=true" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{"sql": "SELECT id FROM orders WHERE created_at > '\''2024-01-01'\'';"}'

When analyzed is true, plan nodes include actual_startup_time, actual_total_time, actual_rows, and actual_loops in addition to the estimated values.

Rewrite Alternatives

AutoDB generates ranked query rewrites in the alternatives array. Each alternative includes:

rank — position in the ranking (1 = best recommendation)
rewritten_sql — the optimized query, validated with pglast for syntactic correctness before inclusion
explanation — plain-English description of what was changed and why
estimated_improvement — rough performance estimate (e.g., "~1.5-2x faster")

All rewrites are validated with pglast to ensure they are syntactically correct SQL. Invalid rewrites are filtered out before they reach you.

Index Recommendations

The index_recommendations array combines two detection methods:

rule_based — heuristic detection of sequential scans and missing indexes from the execution plan
llm_suggested — AI-generated recommendations based on query patterns and schema context

Each recommendation includes:

create_sql — a ready-to-execute CREATE INDEX statement
reason — why this index would help the query
estimated_storage_impact — approximate additional storage needed (e.g., "~50MB")
write_overhead_warning — impact on write performance (e.g., "Adds ~5-10% overhead on INSERT/UPDATE")
benefits — list of query patterns that would benefit from this index
detection_method — either "rule_based" or "llm_suggested"

Example: Optimizing a Query

curl -X POST https://api.autodb.app/api/v1/connections/550e8400-e29b-41d4-a716-446655440000/queries/optimize \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{"sql": "SELECT o.id, c.name, SUM(oi.quantity * oi.unit_price) AS total FROM orders o JOIN customers c ON o.customer_id = c.id JOIN order_items oi ON oi.order_id = o.id WHERE o.created_at > '\''2024-01-01'\'' GROUP BY o.id, c.name ORDER BY total DESC LIMIT 100;"}'

Response:

{
  "connection_id": "550e8400-e29b-41d4-a716-446655440000",
  "original_sql": "SELECT * FROM orders WHERE created_at > '2024-01-01'",
  "execution_plan": {
    "root": {
      "node_type": "Seq Scan",
      "relation_name": "orders",
      "total_cost": 1250.0,
      "plan_rows": 5000,
      "plan_width": 48,
      "children": []
    },
    "raw_plan_text": "Seq Scan on orders  (cost=0.00..1250.00 rows=5000 width=48)\n  Filter: (created_at > '2024-01-01'::date)",
    "total_cost": 1250.0,
    "analyzed": false
  },
  "alternatives": [
    {
      "rank": 1,
      "original_sql": "SELECT * FROM orders WHERE created_at > '2024-01-01'",
      "rewritten_sql": "SELECT id, customer_id, total, created_at FROM orders WHERE created_at > '2024-01-01'",
      "explanation": "Replace SELECT * with specific columns to reduce I/O.",
      "estimated_improvement": "~1.5-2x faster"
    }
  ],
  "index_recommendations": [
    {
      "table_name": "orders",
      "columns": ["created_at"],
      "index_type": "btree",
      "create_sql": "CREATE INDEX CONCURRENTLY idx_orders_created_at ON orders(created_at);",
      "reason": "Sequential scan on orders filtered by created_at",
      "estimated_storage_impact": "~50MB",
      "write_overhead_warning": "Adds ~5-10% overhead on INSERT/UPDATE",
      "benefits": ["Date range queries on orders"],
      "detection_method": "rule_based"
    }
  ],
  "context_used": true
}

Context-Aware Optimization

The context_used field indicates whether RAG document context was available to inform the analysis. When you upload business documentation via the /documents endpoint, the optimizer uses that context to make more relevant recommendations — for example, understanding which columns are frequently queried together or which tables are performance-critical.

When context_used is false, the optimizer relies solely on the execution plan and schema structure.

Full API Schema: Optimize See complete request and response fields