An end-to-end PostgreSQL database administration project I built using real-world public healthcare data from CMS. I designed the schema, loaded 9.6 million records, diagnosed performance problems, optimized queries by up to 4,136×, implemented a backup and recovery strategy, and secured the database with role-based access control, row-level security, and audit logging.
I wanted to build a PostgreSQL project that goes beyond writing queries — one that demonstrates the full scope of what a Database Administrator does in a real production environment. Schema design, bulk data loading, performance baselining, index optimization, backup and recovery, monitoring, and security aren't separate skills. They're a discipline, and this project treats them that way.
I used the CMS Medicare Physician & Other Practitioners dataset for 2023 — the same family of public data I worked with in my Medicare pipeline project. The dataset contains one row per provider per procedure code, capturing utilization and payment information for Medicare fee-for-service claims. At 3 GB raw and 9.6 million rows, it's large enough to make performance decisions matter.
I designed a normalized schema with a dedicated cms schema, two production tables, and a staging table. Loaded 9.6M rows using a staged COPY approach, validated data integrity, and documented every design decision.
Before optimizing anything I measured everything — table sizes, index inventory, cache hit ratios, sequential scan counts, and execution plans for five representative queries. Q4 took 25 seconds. That's the baseline.
View PhaseI created five targeted indexes using CONCURRENTLY, tuned work_mem and random_page_cost, and re-ran every baseline query. One missing index turned a 455ms full table scan into a 0.11ms index lookup — a 4,136× improvement.
View PhaseI implemented a two-tier backup strategy — logical backups with pg_dump and physical backups with pg_basebackup — tested a full end-to-end restore, verified backup integrity, and built a production health monitoring script.
View PhaseI implemented role-based access control with four least-privilege roles, row-level security restricting app users to 54,351 of 1.17M rows, and trigger-based audit logging capturing every data modification with full before/after context.
View PhaseI ran the same five queries before and after adding indexes and tuning configuration. The results below are measured execution times from EXPLAIN ANALYZE — real numbers, not estimates.
| Query | Description | Before | After | Result |
|---|---|---|---|---|
| Q1 | Provider lookup by NPI | 1 ms | 0.053 ms | 19× faster |
| Q2 | Services for a specific provider | 455 ms | 0.110 ms | 4,136× faster |
| Q3 | Top 10 providers by payments | 10,701 ms | 8,795 ms | 18% faster |
| Q4 | Services by state & provider type | 25,190 ms | 25,082 ms | Marginal |
| Q5 | High value procedures | 9,821 ms | 5,527 ms | 44% faster |
rndrng_npi in the services table. One CREATE INDEX statement fixed it.app_user to Virginia and Maryland only — enforced transparently by the database engine regardless of how the query is written.Walk through all five phases — schema design, diagnostics, optimization, operations, and security — with full technical narrative, results, and decision rationale.
Start with Phase 1Full source code — all SQL scripts organized by phase, complete documentation, and commit history showing every step of the project from day one.
View Source