Case Study

This case study is built on a synthetic clinic operations dataset. The pipeline and dashboard are designed to mirror real-world workflows without using any real patient or organizational data. All SQL shown is written for Azure SQL and formatted for readability on this site.
Context
This project is the first of two parts in a compact end-to-end build that converts operational activity into a format suitable for day-to-day clinic management. Appointment activity and post-visit feedback were simulated, landed in Azure Blob Storage, ingested into Azure SQL through Azure Data Factory, and transformed into daily KPI tables available for refresh in Power BI.
The objective is not to demonstrate tooling in isolation, but rather to showcase a full analytics workflow: (1) definition of operational questions, (2) engineering of defensible metrics, and (3) the delivery of a dashboard intended to support decisions for busy medical professionals. Patient feedback analysis is reserved for a follow-up post (stay tuned for this!).
The Problem at Hand
Two files were generated to reflect common clinic operations data. One file contains appointment scheduling records and attendance outcomes. The second file contains post-visit feedback linked to appointments. These files serve as the source system for the pipeline.
Clinic operations are governed by measurable levers that affect capacity and service delivery, including missed appointments, cancellations, wait time, and patient experience. Variation in these measures disrupts staffing alignment with demand, reduces effective capacity, and produces inconsistent service quality across clinic locations and service lines. The dashboard is designed to support routine operational review by identifying increases in no-show activity, isolating services associated with elevated delays, tracking movement in experience measures over time, and characterizing tradeoffs between throughput and quality. And before you ask — yes, it can be totally normal for dentists and physiotherapists to live on the same dashboard.
Generating Synthetic Data
cat > generate_csvs.py <<'PY'
import csv, random, datetime
random.seed(42)
N_APPTS = 20000
FEEDBACK_RATE = 0.30
start = datetime.date(2025, 8, 15)
end = datetime.date(2026, 2, 10)
days = (end - start).days
locations = ["Toronto - Downtown","Toronto - North York","Toronto - Scarborough","Mississauga","Vaughan"]
services = ["Physiotherapy","Counselling","Dental Hygiene","Family Medicine","Massage Therapy","Chronic Care Follow-up"]
channels = ["phone","web","referral"]
payers = ["OHIP","insurance","private","WSIB"]
providers = ["Provider Patel","Provider Chen","Provider Singh","Provider Nguyen","Provider Ali","Provider Kim","Provider Brown","Provider Martin","Provider Rossi","Provider Zhao"]
dur = {"Physiotherapy":45,"Counselling":50,"Dental Hygiene":40,"Family Medicine":20,"Massage Therapy":60,"Chronic Care Follow-up":30}
cancel_reasons = ["schedule conflict","symptoms improved","transportation issue","forgot","work conflict","weather","other"]
topic_tags = ["wait_time","communication","scheduling","billing","facilities","outcomes","staff"]
def rand_dt():
d = start + datetime.timedelta(days=random.randint(0, days))
if d.weekday() == 6:
d = d + datetime.timedelta(days=1)
hour = random.choice([8,9,10,11,12,13,14,15,16,17,18])
minute = random.choice([0,10,15,20,30,40,45,50])
return datetime.datetime(d.year, d.month, d.day, hour, minute, 0)
def fmt(ts): return ts.strftime("%Y-%m-%d %H:%M:%S")
appointments = []
for i in range(1, N_APPTS + 1):
appt_id = f"A{i:07d}"
patient_id = f"P{random.randint(1, 6000):06d}"
clinic_location = random.choices(locations, weights=[28,22,18,18,14], k=1)[0]
service_type = random.choices(services, weights=[22,18,16,20,14,10], k=1)[0]
booking_channel = random.choices(channels, weights=[34,52,14], k=1)[0]
payer_type = random.choices(payers, weights=[40,33,22,5], k=1)[0]
provider = random.choice(providers)
appt_dt = rand_dt()
lead = max(0, min(60, random.gammavariate(2.0, 4.0)))
if booking_channel == "web": lead = min(60, lead + random.uniform(0,4))
if booking_channel == "phone": lead = max(0, lead + random.uniform(-1,2))
booked_at = appt_dt - datetime.timedelta(days=lead, hours=random.randint(0,23))
duration_minutes = dur[service_type]
late = lead < 2
p_no = min(0.22, max(0.03, 0.08 + (0.05 if late else 0) + (0.01 if booking_channel=="phone" else 0)))
p_ca = min(0.25, max(0.05, 0.10 + (0.04 if late else 0) + (0.01 if booking_channel=="web" else 0)))
r = random.random()
if r < p_no:
status = "no_show"
elif r < p_no + p_ca:
status = "cancelled"
else:
status = "attended"
cancel_reason = random.choice(cancel_reasons) if status=="cancelled" else ""
wait = ""
if status == "attended":
base = 8 + random.randint(0,12)
peak = 5 if appt_dt.hour in [10,11,12,13,14,15] else 0
wait_val = max(0, min(90, int(round(base + peak + random.gauss(0,6)))))
wait = str(wait_val)
appointments.append({
"appointment_id": appt_id,
"patient_id": patient_id,
"clinic_location": clinic_location,
"provider": provider,
"service_type": service_type,
"booking_channel": booking_channel,
"payer_type": payer_type,
"booked_at": fmt(booked_at),
"appointment_datetime": fmt(appt_dt),
"duration_minutes": duration_minutes,
"status": status,
"cancel_reason": cancel_reason,
"wait_minutes": wait,
"age_group": random.choices(["0-17","18-34","35-49","50-64","65+"], weights=[6,34,26,22,12], k=1)[0],
"follow_up_needed": random.choice([True, False]),
})
with open("appointments.csv", "w", newline="", encoding="utf-8") as f:
w = csv.DictWriter(f, fieldnames=list(appointments[0].keys()))
w.writeheader()
w.writerows(appointments)
attended = [a for a in appointments if a["status"]=="attended"]
n_feedback = int(len(attended) * FEEDBACK_RATE)
sample = random.sample(attended, n_feedback)
feedback_rows = []
for i, a in enumerate(sample, start=1):
wait = int(a["wait_minutes"]) if a["wait_minutes"] else 0
raw = 4.4 - (wait/35.0) + random.gauss(0, 0.55)
rating = max(1, min(5, int(round(raw))))
nps = max(0, min(10, int(round(rating*2 + random.gauss(0,1.5)))))
would_recommend = (nps >= 8)
topic = "wait_time" if (wait >= 25 and rating <= 3) else random.choice(topic_tags)
submitted = datetime.datetime.strptime(a["appointment_datetime"], "%Y-%m-%d %H:%M:%S") + datetime.timedelta(hours=random.randint(1,72))
feedback_rows.append({
"feedback_id": f"F{i:07d}",
"appointment_id": a["appointment_id"],
"submitted_at": fmt(submitted),
"rating": rating,
"nps": nps,
"would_recommend": would_recommend,
"topic_tag": topic,
"feedback_text": "Synthetic feedback generated for portfolio use."
})
with open("feedback.csv", "w", newline="", encoding="utf-8") as f:
w = csv.DictWriter(f, fieldnames=list(feedback_rows[0].keys()))
w.writeheader()
w.writerows(feedback_rows)
print("appointments.csv rows:", len(appointments))
print("feedback.csv rows:", len(feedback_rows))
PY
python generate_csvs.py
ls -lh appointments.csv feedback.csv1. Provision Azure Resources
This is a good point to note: the pipeline follows a common three-layer pattern consisting of (1) a storage landing zone for raw files, (2) an orchestration layer for ingestion and transforms, and (3) a relational layer for analytics-ready tables. The implementation is intentionally compact, but the structure matches what is used in many real deployments. The snippet below generates unique resource names at runtime so the commands remain reusable and no resources in the post are tied to a specific subscription.
export LOC="canadacentral"
export RG="rg-clinicops-pulse"
az group create -n "$RG" -l "$LOC" >/dev/null
SUFFIX="$(date +%y%m%d%H%M%S)$RANDOM"
export STORAGE="clinicops$SUFFIX"
export CONTAINER="landing"
export SQLSERVER="sql-clinicops-$SUFFIX"
export SQLDB="clinicopsdb"
export ADF="adf-clinicops-$SUFFIX"
az storage account create -g "$RG" -n "$STORAGE" -l "$LOC" --sku Standard_LRS >/dev/null
STORAGE_KEY="$(az storage account keys list -g "$RG" -n "$STORAGE" --query '[0].value' -o tsv --only-show-errors)"
az storage container create -n "$CONTAINER" --account-name "$STORAGE" --account-key "$STORAGE_KEY" >/dev/null
az provider register -n Microsoft.Sql >/dev/null
echo "Enter a strong SQL admin password (input hidden):"
read -s SQLPASS; echo
export SQLPASS
az sql server create -g "$RG" -n "$SQLSERVER" -l "$LOC" -u "sqladmin" -p "$SQLPASS" >/dev/null
az sql db create -g "$RG" -s "$SQLSERVER" -n "$SQLDB" --service-objective S0 >/dev/null
az config set extension.use_dynamic_install=yes_without_prompt >/dev/null 2>&1 || true
az extension add --name datafactory --upgrade --only-show-errors >/dev/null 2>&1 || true
az provider register -n Microsoft.DataFactory >/dev/null
az datafactory factory create -g "$RG" -n "$ADF" -l "$LOC" >/dev/null
echo "Resources created."
echo "RG=$RG"
echo "STORAGE=$STORAGE"
echo "CONTAINER=$CONTAINER"
echo "SQLSERVER=$SQLSERVER"
echo "SQLDB=$SQLDB"
echo "ADF=$ADF"2. Upload Data to Blob Storage
The storage key is assigned to a variable and is not printed to output or written to disk. This limits accidental exposure through terminal history, logs, screenshots, or shared snippets, which is a common failure mode during quick prototypes. The command sequence is structured so the key is used only as an input to the upload and listing steps, and the surrounding output remains non-sensitive. This approach reflects baseline secret-handling practice even in a lightweight demonstration, where the goal is to show end-to-end workflow without normalizing insecure handling of credentials.
STORAGE_KEY="$(az storage account keys list -g "$RG" -n "$STORAGE" --query '[0].value' -o tsv --only-show-errors)"
az storage blob upload --account-name "$STORAGE" --account-key "$STORAGE_KEY" -c "$CONTAINER" -f "appointments.csv" -n "appointments.csv" --overwrite true >/dev/null
az storage blob upload --account-name "$STORAGE" --account-key "$STORAGE_KEY" -c "$CONTAINER" -f "feedback.csv" -n "feedback.csv" --overwrite true >/dev/null
az storage blob list --account-name "$STORAGE" --account-key "$STORAGE_KEY" -c "$CONTAINER" -o table3. Create SQL Staging Tables
Staging tables were used to keep ingestion simple and auditable. This design supports repeatable loads without altering downstream objects, enables straightforward validation through row counts and spot checks, and provides a clear boundary between raw ingested structure and curated analytics structure. The separation also reduces coupling between source formatting and reporting logic, allowing changes to ingestion or source files without requiring immediate redesign of KPI tables or Power BI measures.
if ! command -v sqlcmd >/dev/null 2>&1 && [ ! -x /opt/mssql-tools18/bin/sqlcmd ]; then
source /etc/os-release
CODENAME="${VERSION_CODENAME:-jammy}"
curl -sSL https://packages.microsoft.com/keys/microsoft.asc | sudo gpg --dearmor -o /usr/share/keyrings/microsoft.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/microsoft.gpg] https://packages.microsoft.com/ubuntu/${VERSION_ID}/prod ${CODENAME} main" | sudo tee /etc/apt/sources.list.d/microsoft-prod.list >/dev/null
sudo apt-get update -y >/dev/null
sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18 mssql-tools18 >/dev/null
fi
SQLCMD="$(command -v sqlcmd 2>/dev/null || echo /opt/mssql-tools18/bin/sqlcmd)"
OPTC=""
if "$SQLCMD" -? 2>/dev/null | grep -qE '(^|\s)-C(\s|,|$)'; then OPTC="-C"; fi
SQLFQDN="$(az sql server show -g "$RG" -n "$SQLSERVER" --query fullyQualifiedDomainName -o tsv --only-show-errors)"
MYIP="$(curl -s https://api.ipify.org)"
az sql server firewall-rule create -g "$RG" -s "$SQLSERVER" -n "AllowAzureServices" --start-ip-address 0.0.0.0 --end-ip-address 0.0.0.0 2>/dev/null || az sql server firewall-rule update -g "$RG" -s "$SQLSERVER" -n "AllowAzureServices" --start-ip-address 0.0.0.0 --end-ip-address 0.0.0.0 >/dev/null
az sql server firewall-rule create -g "$RG" -s "$SQLSERVER" -n "AllowCloudShell" --start-ip-address "$MYIP" --end-ip-address "$MYIP" 2>/dev/null || az sql server firewall-rule update -g "$RG" -s "$SQLSERVER" -n "AllowCloudShell" --start-ip-address "$MYIP" --end-ip-address "$MYIP" >/dev/null
cat > create_tables.sql <<'SQL'
IF OBJECT_ID('dbo.stg_appointments','U') IS NOT NULL DROP TABLE dbo.stg_appointments;
IF OBJECT_ID('dbo.stg_feedback','U') IS NOT NULL DROP TABLE dbo.stg_feedback;
CREATE TABLE dbo.stg_appointments (
appointment_id varchar(20) NOT NULL,
patient_id varchar(20) NOT NULL,
clinic_location varchar(80) NOT NULL,
provider varchar(80) NOT NULL,
service_type varchar(80) NOT NULL,
booking_channel varchar(20) NOT NULL,
payer_type varchar(20) NOT NULL,
booked_at datetime2 NOT NULL,
appointment_datetime datetime2 NOT NULL,
duration_minutes int NOT NULL,
status varchar(20) NOT NULL,
cancel_reason varchar(120) NULL,
wait_minutes int NULL,
age_group varchar(20) NULL,
follow_up_needed bit NULL
);
CREATE TABLE dbo.stg_feedback (
feedback_id varchar(20) NOT NULL,
appointment_id varchar(20) NOT NULL,
submitted_at datetime2 NOT NULL,
rating int NOT NULL,
nps int NOT NULL,
would_recommend bit NOT NULL,
topic_tag varchar(50) NULL,
feedback_text varchar(400) NULL
);
SQL
"$SQLCMD" -S "tcp:${SQLFQDN},1433" -d "$SQLDB" -U "sqladmin" -P "$SQLPASS" -b -N ${OPTC} -i create_tables.sql
"$SQLCMD" -S "tcp:${SQLFQDN},1433" -d "$SQLDB" -U "sqladmin" -P "$SQLPASS" -b -N ${OPTC} -Q "SELECT name FROM sys.tables WHERE name IN ('stg_appointments','stg_feedback');"4. Ingest Blob to SQL with Azure Data Factory
In ADF Studio, a linked service was created for Azure Blob Storage and a second linked service was created for Azure SQL Database. Delimited-text datasets were then created for the two CSV files, and table datasets were created for the corresponding SQL staging tables. The pipeline was implemented with two Copy Data activities, one to load appointments.csv into dbo.stg_appointments and one to load feedback.csv into dbo.stg_feedback, followed by a SQL activity that executes the KPI build script to populate dbo.analytics_kpi_daily. Pipeline execution was validated using Debug runs to confirm connectivity, mappings, and row counts, and then the pipeline was published so execution history and outputs were retained and visible in the Monitor view.
5. Build the KPI table in SQL
The KPI layer is defined as a daily rollup at the grain of clinic location (clinic_location) and service type (service_type). The design is intentionally simple, but it establishes a consistent semantic base for Power BI by concentrating metric logic in one place and standardizing the level of aggregation used across visuals. This structure reduces the risk of misleading results that occur when already-aggregated values are averaged again during report interactions, and it supports predictable filtering and drill-down behavior when slicing by date, clinic location, or service line.
cat > build_kpis.sql <<'SQL'
IF OBJECT_ID('dbo.analytics_kpi_daily','U') IS NULL
BEGIN
CREATE TABLE dbo.analytics_kpi_daily (
[date] date NOT NULL,
clinic_location varchar(80) NOT NULL,
service_type varchar(80) NOT NULL,
total_appts int NOT NULL,
attended int NOT NULL,
cancelled int NOT NULL,
no_show int NOT NULL,
no_show_rate decimal(6,4) NOT NULL,
avg_wait_minutes decimal(10,2) NULL,
avg_rating decimal(10,2) NULL,
avg_nps decimal(10,2) NULL,
PRIMARY KEY ([date], clinic_location, service_type)
);
END;
WITH appt AS (
SELECT
CAST(appointment_datetime AS date) AS [date],
clinic_location,
service_type,
COUNT(*) AS total_appts,
SUM(CASE WHEN status='attended' THEN 1 ELSE 0 END) AS attended,
SUM(CASE WHEN status='cancelled' THEN 1 ELSE 0 END) AS cancelled,
SUM(CASE WHEN status='no_show' THEN 1 ELSE 0 END) AS no_show,
AVG(CASE WHEN status='attended' THEN CAST(wait_minutes AS float) END) AS avg_wait_minutes
FROM dbo.stg_appointments
GROUP BY CAST(appointment_datetime AS date), clinic_location, service_type
),
fb AS (
SELECT
CAST(a.appointment_datetime AS date) AS [date],
a.clinic_location,
a.service_type,
AVG(CAST(f.rating AS float)) AS avg_rating,
AVG(CAST(f.nps AS float)) AS avg_nps
FROM dbo.stg_feedback f
JOIN dbo.stg_appointments a
ON a.appointment_id = f.appointment_id
GROUP BY CAST(a.appointment_datetime AS date), a.clinic_location, a.service_type
)
MERGE dbo.analytics_kpi_daily AS tgt
USING (
SELECT
appt.[date],
appt.clinic_location,
appt.service_type,
appt.total_appts,
appt.attended,
appt.cancelled,
appt.no_show,
CAST(appt.no_show AS decimal(10,4)) / NULLIF(appt.total_appts,0) AS no_show_rate,
appt.avg_wait_minutes,
fb.avg_rating,
fb.avg_nps
FROM appt
LEFT JOIN fb
ON fb.[date]=appt.[date]
AND fb.clinic_location=appt.clinic_location
AND fb.service_type=appt.service_type
) AS src
ON (tgt.[date]=src.[date] AND tgt.clinic_location=src.clinic_location AND tgt.service_type=src.service_type)
WHEN MATCHED THEN UPDATE SET
total_appts=src.total_appts,
attended=src.attended,
cancelled=src.cancelled,
no_show=src.no_show,
no_show_rate=src.no_show_rate,
avg_wait_minutes=src.avg_wait_minutes,
avg_rating=src.avg_rating,
avg_nps=src.avg_nps
WHEN NOT MATCHED THEN INSERT (
[date], clinic_location, service_type,
total_appts, attended, cancelled, no_show, no_show_rate,
avg_wait_minutes, avg_rating, avg_nps
) VALUES (
src.[date], src.clinic_location, src.service_type,
src.total_appts, src.attended, src.cancelled, src.no_show, src.no_show_rate,
src.avg_wait_minutes, src.avg_rating, src.avg_nps
);
SQL
"$SQLCMD" -S "tcp:${SQLFQDN},1433" -d "$SQLDB" -U "sqladmin" -P "$SQLPASS" -b -N ${OPTC} -i build_kpis.sql
"$SQLCMD" -S "tcp:${SQLFQDN},1433" -d "$SQLDB" -U "sqladmin" -P "$SQLPASS" -b -N ${OPTC} -Q "SELECT COUNT(*) AS kpi_rows FROM dbo.analytics_kpi_daily;"6. Power BI Report Build
Power BI connects directly to the Azure SQL database and uses the KPI table as the primary semantic source. Measures are defined instead of relying on raw columns so that totals, rollups, and slicer-driven interactions remain logically consistent across the report. Weighted averages are used for wait time (Avg Wait (min)), rating (Avg Rating (1-5)), and the 0–10 recommend score (Avg NPS (0-10)) to prevent aggregation bias when combining multiple daily rows across dates, clinic locations, and service lines, where simple averaging of pre-aggregated values can misstate the underlying experience.
KPI visuals are implemented using a value measure for the headline metric, a date field for the trend axis, and a target measure to support threshold-based interpretation. Slicers filter the report by date range, service_type, and clinic_location, allowing the same KPI set to function both as a high-level operational summary and as a drill-down view for identifying where performance differs by site or service.
Avg Wait (min) =
DIVIDE(
SUMX(analytics_kpi_daily, analytics_kpi_daily[attended] * analytics_kpi_daily[avg_wait_minutes]),
[Attended Appts]
)Avg Rating (1–5) =
DIVIDE(
SUMX(analytics_kpi_daily, analytics_kpi_daily[attended] * analytics_kpi_daily[avg_rating]),
[Attended Appts]
)Avg Recommend Score (0–10) =
DIVIDE(
SUMX(analytics_kpi_daily, analytics_kpi_daily[attended] * analytics_kpi_daily[avg_nps]),
[Attended Appts]
)In this demo, the term NPS refers to the average 0–10 recommend score rather than the classic Net Promoter Score calculation, which is defined as the percentage of promoters minus the percentage of detractors. The average 0–10 score provides a stable, interpretable experience signal for an operational clinic dashboard, particularly when tracked over time and compared against an explicit target. This choice also simplifies communication of results in a stakeholder setting, since the metric can be explained directly as an average recommendation rating while still supporting trend and segment analysis across locations and service lines.
Signals Delivered in this Project
Power BI Dashboard Preview
After establishing a stable backend for the clinic operations KPI console, the natural next step was a client-facing reporting layer designed for quick weekly monitoring and simple drill-down.
A date-range control (located at the top of the card stack) and slicers for service line and clinic location (located to the right) filter the entire page.
The headline cards surface the core operational signals for the selected slice: metrics such as no-show rate, attendance rate, average wait time in minutes, and average rating on a 1–5 scale.
The KPI tiles add short trend context against explicit targets, showing whether cancellation rate and the 0–10 recommend score are moving toward or away from goal.
A total appointments card provides volume context so changes in rates can be interpreted alongside throughput.

The deliverable is a set of decision-ready signals that support concrete operational decisions.
A daily KPI grain by location and service supports staffing and scheduling adjustments when no-show rates rise. It also supports targeted confirmation or reminder workflows for higher-risk booking channels.
Trend views help separate one-day volatility from sustained drift. That enables earlier intervention such as adding coverage during peak hours, shifting appointment lengths, or tightening triage rules when wait times trend upward, even in a small clinic setting.
Filters isolate the slice driving the change, so follow-up can focus on the relevant clinic, provider group, or service type instead of broad fixes.
A logical extension is seasonal tracking to add context. Holidays, winter weather, school-year cycles, and illness waves can shift demand, cancellations, and no-shows in predictable ways.
Monitoring month-of-year and day-of-week patterns, rolling averages, and period-over-period comparisons helps distinguish calendar effects from true performance deterioration within a given clinic. That supports proactive planning, such as adjusting staffing and appointment templates ahead of known peak periods.
The build keeps the basics tight. Ingestion is repeatable, metric definitions stay auditable, and reporting refreshes without manual spreadsheet work, so decisions come from numbers that stay consistent and traceable as new data flows in. Together, the pipeline and dashboard form a compact operational layer that is easy to monitor, easy to explain, and easy to improve as new questions show up. Next post will pick up from here.