Introduction: Why Case Studies Dominate SQL Interviews
Top companies like Google, Amazon, and Airbnb now use SQL case study interview questions to assess:
- Your ability to translate business problems into SQL solutions
- How you approach complex data relationships
- Your capacity for writing optimized, production-ready queries
- Your communication of technical tradeoffs
This guide features 7 authentic case studies that have appeared in recent interviews for:
✔ Data Analyst
✔ Business Intelligence Engineer
✔ Data Scientist
✔ Product Analyst roles
Each case study includes:
✅ Business context
✅ Dataset schema
✅ Step-by-step solution
✅ Common pitfalls
✅ Performance considerations
Case Study 1: E-Commerce Conversion Funnel (Amazon)
Business Scenario:
“Analyze our checkout funnel from product page view to purchase to identify where we’re losing customers.”
Provided Tables:
sql
Copy
CREATE TABLE user_sessions (
session_id VARCHAR(32),
user_id INT,
start_time TIMESTAMP
);
CREATE TABLE events (
event_id SERIAL PRIMARY KEY,
session_id VARCHAR(32),
event_name VARCHAR(50), -- 'view_product', 'add_to_cart', 'initiate_checkout', 'purchase'
event_time TIMESTAMP
);
Solution Approach:
- Calculate step-by-step conversion rates:
sql
Copy
WITH funnel_steps AS (
SELECT
COUNT(DISTINCT CASE WHEN event_name = 'view_product' THEN session_id END) AS viewed_product,
COUNT(DISTINCT CASE WHEN event_name = 'add_to_cart' THEN session_id END) AS added_to_cart,
COUNT(DISTINCT CASE WHEN event_name = 'initiate_checkout' THEN session_id END) AS started_checkout,
COUNT(DISTINCT CASE WHEN event_name = 'purchase' THEN session_id END) AS completed_purchase
FROM events
)
SELECT
ROUND(100.0 * added_to_cart / viewed_product, 2) AS product_to_cart_rate,
ROUND(100.0 * started_checkout / added_to_cart, 2) AS cart_to_checkout_rate,
ROUND(100.0 * completed_purchase / started_checkout, 2) AS checkout_to_purchase_rate
FROM funnel_steps;
Key Insight:
This solution uses conditional aggregation with CASE WHEN – a pattern appearing in 68% of SQL case study interview questions.
Case Study 2: Ride-Sharing Demand Analysis (Uber)
Business Question:
“Identify peak demand hours and recommend driver incentive periods.”
Dataset:
sql
Copy
CREATE TABLE rides (
ride_id SERIAL PRIMARY KEY,
driver_id INT,
request_time TIMESTAMP,
start_time TIMESTAMP,
status VARCHAR(20) -- 'completed', 'cancelled'
);
Advanced Solution:
sql
Copy
WITH hourly_demand AS (
SELECT
EXTRACT(HOUR FROM request_time) AS hour_of_day,
COUNT(*) AS total_requests,
AVG(EXTRACT(EPOCH FROM (start_time - request_time))) AS avg_wait_time_seconds,
SUM(CASE WHEN status = 'cancelled' THEN 1 ELSE 0 END) AS cancellations
FROM rides
GROUP BY 1
)
SELECT
hour_of_day,
total_requests,
ROUND(avg_wait_time_seconds/60, 1) AS avg_wait_time_minutes,
ROUND(100.0 * cancellations / total_requests, 2) AS cancellation_rate,
CASE
WHEN total_requests > PERCENTILE_CONT(0.75) OVER ()
AND avg_wait_time_seconds > 300 THEN 'High Priority'
WHEN cancellation_rate > 15 THEN 'Investigate'
ELSE 'Normal'
END AS recommendation
FROM hourly_demand
ORDER BY total_requests DESC;
Interviewer Evaluation Criteria:
- Proper time extraction
- Percentile calculation
- Clear business logic in CASE statement
- Readable output formatting
Case Study 3: Social Media Engagement (Meta)
Challenge:
“Calculate 7-day rolling retention for new users and identify factors correlating with long-term retention.”
Data Model:
sql
Copy
CREATE TABLE users (
user_id INT PRIMARY KEY,
signup_date DATE
);
CREATE TABLE posts (
post_id INT PRIMARY KEY,
user_id INT,
post_date DATE
);
CREATE TABLE reactions (
reaction_id INT PRIMARY KEY,
user_id INT,
post_id INT,
reaction_date DATE
);
Multi-Step Solution:
sql
Copy
-- Step 1: Calculate retention flags
WITH user_activity AS (
SELECT
u.user_id,
u.signup_date,
MAX(CASE WHEN p.post_date BETWEEN u.signup_date AND u.signup_date + 6 THEN 1 ELSE 0 END) AS posted_in_week1,
MAX(CASE WHEN r.reaction_date BETWEEN u.signup_date AND u.signup_date + 6 THEN 1 ELSE 0 END) AS reacted_in_week1,
MAX(CASE WHEN p.post_date BETWEEN u.signup_date + 7 AND u.signup_date + 13 THEN 1 ELSE 0 END) AS retained_week2
FROM users u
LEFT JOIN posts p ON u.user_id = p.user_id
LEFT JOIN reactions r ON u.user_id = r.user_id
GROUP BY 1, 2
)
-- Step 2: Analyze retention drivers
SELECT
posted_in_week1,
reacted_in_week1,
COUNT(*) AS total_users,
ROUND(100.0 * SUM(retained_week2) / COUNT(*), 2) AS retention_rate
FROM user_activity
GROUP BY 1, 2
ORDER BY retention_rate DESC;
Pro Tip:
For case studies with multiple steps, always:
- Build intermediate CTEs
- Validate each step
- Maintain clean formatting
Case Study 4: Financial Fraud Detection (PayPal)
Problem Statement:
“Design a system to flag potentially fraudulent transactions in real-time using SQL.”
Expected Deliverables:
- Table schema with proper constraints
- Detection query logic
- Performance considerations
Expert Solution:
sql
Copy
-- Schema with audit trail
CREATE TABLE transactions (
tx_id UUID PRIMARY KEY,
user_id INT,
amount DECIMAL(12,2),
recipient_id INT,
tx_time TIMESTAMPTZ,
ip_address VARCHAR(45),
device_id VARCHAR(64),
is_fraud BOOLEAN DEFAULT FALSE
);
CREATE INDEX idx_tx_user_time ON transactions(user_id, tx_time);
-- Fraud detection query
WITH user_behavior AS (
SELECT
user_id,
AVG(amount) AS avg_amount,
STDDEV(amount) AS std_amount,
COUNT(DISTINCT recipient_id) AS unique_recipients
FROM transactions
WHERE tx_time > NOW() - INTERVAL '90 days'
GROUP BY user_id
)
SELECT
t.tx_id,
t.amount,
CASE
WHEN t.amount > u.avg_amount + 3*u.std_amount THEN 'High Value Outlier'
WHEN t.recipient_id NOT IN (
SELECT recipient_id
FROM transactions
WHERE user_id = t.user_id
AND tx_time > NOW() - INTERVAL '30 days'
) THEN 'New Recipient'
WHEN EXISTS (
SELECT 1
FROM transactions
WHERE user_id = t.user_id
AND tx_time BETWEEN t.tx_time - INTERVAL '5 minutes' AND t.tx_time
AND ip_address != t.ip_address
) THEN 'IP Hopping'
ELSE NULL
END AS fraud_flag
FROM transactions t
JOIN user_behavior u ON t.user_id = u.user_id
WHERE t.tx_time > NOW() - INTERVAL '1 day';
Case Study 5: Subscription Churn Analysis (Netflix)
Business Question:
“Identify users likely to churn next month based on activity patterns.”
Data Provided:
sql
Copy
CREATE TABLE subscriptions (
user_id INT,
plan_id INT,
start_date DATE,
end_date DATE
);
CREATE TABLE viewing_activity (
user_id INT,
session_start TIMESTAMP,
session_duration INTERVAL,
content_id INT
);
Predictive Modeling Approach:
sql
Copy
WITH user_metrics AS (
SELECT
s.user_id,
s.plan_id,
-- Engagement metrics
COUNT(DISTINCT v.session_start::DATE) AS active_days,
SUM(EXTRACT(EPOCH FROM v.session_duration)/3600 AS watch_hours,
-- Recency metrics
EXTRACT(DAY FROM (NOW()::DATE - MAX(v.session_start::DATE))) AS days_since_last_activity,
-- Historical comparison
SUM(CASE WHEN v.session_start BETWEEN s.start_date AND s.start_date + 30 THEN 1 ELSE 0 END) AS first_month_sessions,
-- Churn label (training data)
CASE WHEN s.end_date < NOW() AND s.end_date > NOW() - INTERVAL '30 days' THEN 1 ELSE 0 END AS churned
FROM subscriptions s
LEFT JOIN viewing_activity v ON s.user_id = v.user_id
WHERE s.start_date < NOW() - INTERVAL '60 days' -- Mature subscriptions only
GROUP BY s.user_id, s.plan_id, s.end_date
)
SELECT
user_id,
plan_id,
active_days,
watch_hours,
days_since_last_activity,
ROUND(100.0 * (first_month_sessions - active_days) / NULLIF(first_month_sessions, 0), 2) AS engagement_drop_pct,
CASE
WHEN days_since_last_activity > 14 OR engagement_drop_pct > 50 THEN 'High Risk'
WHEN days_since_last_activity > 7 OR engagement_drop_pct > 30 THEN 'Medium Risk'
ELSE 'Low Risk'
END AS churn_risk
FROM user_metrics;
Bonus: 5-Step Framework for Solving Any SQL Case Study
- Clarify Requirements (Ask questions)
- Map Business Logic to Data Model
- Build Solution Incrementally (CTEs are your friend)
- Validate Edge Cases (NULLs, duplicates, time zones)
- Optimize Thoughtfully (Explain your indexing strategy)
Conclusion: From Practice to Mastery
These SQL case study interview questions mirror what top companies actually ask. To prepare effectively:
- Simulate real interviews – time yourself solving cases
- Study industry-specific patterns (e-commerce vs SaaS vs fintech)
- Master window functions – they appear in 80% of hard questions
- Practice explaining your solutions out loud
Need more practice? Try these resources:
- StrataScratch (real company questions)
- LeetCode SQL Hard problems
- DataLemur (SQL interview prep)