Database Administration Portfolio Project

PostgreSQL DBA
End-to-End Project

An end-to-end PostgreSQL database administration project I built using real-world public healthcare data from CMS. I designed the schema, loaded 9.6 million records, diagnosed performance problems, optimized queries by up to 4,136×, implemented a backup and recovery strategy, and secured the database with role-based access control, row-level security, and audit logging.

PostgreSQL 18.2 Schema Design Performance Tuning Backup & Recovery Security & RBAC CMS 2023 Data

What This Project Demonstrates

The Goal

I wanted to build a PostgreSQL project that goes beyond writing queries — one that demonstrates the full scope of what a Database Administrator does in a real production environment. Schema design, bulk data loading, performance baselining, index optimization, backup and recovery, monitoring, and security aren't separate skills. They're a discipline, and this project treats them that way.

The Dataset

I used the CMS Medicare Physician & Other Practitioners dataset for 2023 — the same family of public data I worked with in my Medicare pipeline project. The dataset contains one row per provider per procedure code, capturing utilization and payment information for Medicare fee-for-service claims. At 3 GB raw and 9.6 million rows, it's large enough to make performance decisions matter.

9.6M
Service Rows Loaded
1.17M
Unique Providers
5.5 GB
Database Size
5
Project Phases

Five Phases, One Complete Project

Before & After Optimization

I ran the same five queries before and after adding indexes and tuning configuration. The results below are measured execution times from EXPLAIN ANALYZE — real numbers, not estimates.

Query Description Before After Result
Q1 Provider lookup by NPI 1 ms 0.053 ms 19× faster
Q2 Services for a specific provider 455 ms 0.110 ms 4,136× faster
Q3 Top 10 providers by payments 10,701 ms 8,795 ms 18% faster
Q4 Services by state & provider type 25,190 ms 25,082 ms Marginal
Q5 High value procedures 9,821 ms 5,527 ms 44% faster

What the Project Actually Shows

4,136×
A single missing index turned a 455ms sequential scan into a 0.11ms index lookup. PostgreSQL was launching 2 parallel workers and scanning 3.2 million rows to return 7 records — because no index existed on rndrng_npi in the services table. One CREATE INDEX statement fixed it.
85%
pg_dump compressed a 5.5 GB database down to 845 MB. The custom format backup restored cleanly with all 9.6M rows and all 7 indexes intact. A backup that has never been tested is not a backup — the restore test is documented in full.
54,351
Row-level security restricted the application user to 54,351 of 1,175,281 providers. The RLS policy limits app_user to Virginia and Maryland only — enforced transparently by the database engine regardless of how the query is written.
Q4
Not every query can be dramatically improved — and documenting that honestly matters. The state/provider-type aggregation processes all 9.6M rows with no selective filter. Indexes cannot eliminate the cost of reading every row. Setting work_mem too high actually made this query slower by suppressing parallel execution.

Built with PostgreSQL 18

PostgreSQL 18.2 Core database engine
DataGrip Primary query tool
pg_dump Logical backup
pg_basebackup Physical cluster backup
pg_stat_statements Query statistics
EXPLAIN ANALYZE Execution plan analysis
Row-Level Security Data visibility control
Git & GitHub Version control
PowerShell CLI & automation
VS Code Script authoring

Two Ways In