Skip to content

Query Optimization

Submit a slow SQL query and AutoDB returns a structured execution plan, ranked rewrite alternatives validated by pglast for syntactic correctness, and index recommendations with storage and write-overhead impact estimates. When RAG documents are available, the optimizer uses your business context to inform its analysis.

AutoDB parses the PostgreSQL EXPLAIN output into a structured tree that you can traverse programmatically. Key fields in execution_plan:

  • root — the root node of the plan tree, with node_type (e.g., "Seq Scan", "Index Scan", "Hash Join"), cost estimates, and children for nested operations
  • total_cost — total estimated cost in PostgreSQL cost units
  • raw_plan_text — the original EXPLAIN text from PostgreSQL
  • analyzed — whether real timing data is included (see below)

Add ?analyze=true to the query string to run EXPLAIN ANALYZE, which returns actual execution timing alongside the estimates.

Terminal window
curl -X POST "https://api.autodb.app/api/v1/connections/550e8400-e29b-41d4-a716-446655440000/queries/optimize?analyze=true" \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY" \
-d '{"sql": "SELECT id FROM orders WHERE created_at > '\''2024-01-01'\'';"}'

When analyzed is true, plan nodes include actual_startup_time, actual_total_time, actual_rows, and actual_loops in addition to the estimated values.

AutoDB generates ranked query rewrites in the alternatives array. Each alternative includes:

  • rank — position in the ranking (1 = best recommendation)
  • rewritten_sql — the optimized query, validated with pglast for syntactic correctness before inclusion
  • explanation — plain-English description of what was changed and why
  • estimated_improvement — rough performance estimate (e.g., "~1.5-2x faster")

All rewrites are validated with pglast to ensure they are syntactically correct SQL. Invalid rewrites are filtered out before they reach you.

The index_recommendations array combines two detection methods:

  • rule_based — heuristic detection of sequential scans and missing indexes from the execution plan
  • llm_suggested — AI-generated recommendations based on query patterns and schema context

Each recommendation includes:

  • create_sql — a ready-to-execute CREATE INDEX statement
  • reason — why this index would help the query
  • estimated_storage_impact — approximate additional storage needed (e.g., "~50MB")
  • write_overhead_warning — impact on write performance (e.g., "Adds ~5-10% overhead on INSERT/UPDATE")
  • benefits — list of query patterns that would benefit from this index
  • detection_method — either "rule_based" or "llm_suggested"
Terminal window
curl -X POST https://api.autodb.app/api/v1/connections/550e8400-e29b-41d4-a716-446655440000/queries/optimize \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY" \
-d '{"sql": "SELECT o.id, c.name, SUM(oi.quantity * oi.unit_price) AS total FROM orders o JOIN customers c ON o.customer_id = c.id JOIN order_items oi ON oi.order_id = o.id WHERE o.created_at > '\''2024-01-01'\'' GROUP BY o.id, c.name ORDER BY total DESC LIMIT 100;"}'

Response:

{
"connection_id": "550e8400-e29b-41d4-a716-446655440000",
"original_sql": "SELECT * FROM orders WHERE created_at > '2024-01-01'",
"execution_plan": {
"root": {
"node_type": "Seq Scan",
"relation_name": "orders",
"total_cost": 1250.0,
"plan_rows": 5000,
"plan_width": 48,
"children": []
},
"raw_plan_text": "Seq Scan on orders (cost=0.00..1250.00 rows=5000 width=48)\n Filter: (created_at > '2024-01-01'::date)",
"total_cost": 1250.0,
"analyzed": false
},
"alternatives": [
{
"rank": 1,
"original_sql": "SELECT * FROM orders WHERE created_at > '2024-01-01'",
"rewritten_sql": "SELECT id, customer_id, total, created_at FROM orders WHERE created_at > '2024-01-01'",
"explanation": "Replace SELECT * with specific columns to reduce I/O.",
"estimated_improvement": "~1.5-2x faster"
}
],
"index_recommendations": [
{
"table_name": "orders",
"columns": ["created_at"],
"index_type": "btree",
"create_sql": "CREATE INDEX CONCURRENTLY idx_orders_created_at ON orders(created_at);",
"reason": "Sequential scan on orders filtered by created_at",
"estimated_storage_impact": "~50MB",
"write_overhead_warning": "Adds ~5-10% overhead on INSERT/UPDATE",
"benefits": ["Date range queries on orders"],
"detection_method": "rule_based"
}
],
"context_used": true
}

The context_used field indicates whether RAG document context was available to inform the analysis. When you upload business documentation via the /documents endpoint, the optimizer uses that context to make more relevant recommendations — for example, understanding which columns are frequently queried together or which tables are performance-critical.

When context_used is false, the optimizer relies solely on the execution plan and schema structure.