Phase 2 told me exactly what was wrong — missing indexes and insufficient memory. In this phase I fixed both, re-ran every baseline query, and measured the results. One missing index produced a 4,136× improvement. Not every query improved dramatically, and I documented that honestly too.
I created all indexes using CREATE INDEX CONCURRENTLY. This is the production-safe approach — it builds the index without holding a lock on the table, meaning reads and writes can continue uninterrupted during the build. On a table with 9.6 million rows, locking during index creation would be unacceptable in a live environment.
Phase 2 showed sort and hash operations spilling up to 307 MB per worker to disk — entirely because of the 4MB default work_mem. I tested with 256MB at the session level, which confirmed the spill was memory-related. However, I discovered that setting work_mem too high globally suppresses parallel query execution — the planner becomes conservative about total memory usage across all workers.
After testing I settled on 64MB as the permanent global setting. This eliminates most disk spill while preserving parallel execution for the heaviest workloads.
random_page_cost was reduced from 4.0 (tuned for spinning disk) to 1.5 to reflect the SSD storage on this system. This tells the query planner that random page access is relatively cheap, encouraging it to favor index scans over sequential scans where appropriate.
I re-ran all five baseline queries after applying the indexes and configuration changes. Every execution time is measured from EXPLAIN ANALYZE — actual runtime, not estimates.
| Query | Description | Before | After | Result |
|---|---|---|---|---|
| Q1 | Provider lookup by NPI | 1 ms | 0.053 ms | 19× faster |
| Q2 | Services for a specific provider | 455 ms | 0.110 ms | 4,136× faster |
| Q3 | Top 10 providers by payments | 10,701 ms | 8,795 ms | 18% faster |
| Q4 | Services by state & provider type | 25,190 ms | 25,082 ms | Marginal |
| Q5 | High value procedures | 9,821 ms | 5,527 ms | 44% faster |
Not every query can be dramatically improved — and documenting that is as important as the big wins. Q4 processes all 9.6 million service rows, joins to all 1.17 million providers, and sorts the entire result set. There is no selective filter to exploit. I also discovered that setting work_mem too high (256MB) actually made Q4 slower by suppressing parallel execution. The final 64MB setting restores parallel workers while reducing disk spill. Some queries have an optimization floor, and a DBA needs to know when that floor has been reached.