Introduction: Why Case Studies Dominate SQL Interviews
Top companies like Google, Amazon, and Airbnb now use SQL case study interview questions to assess:
- Your ability to translate business problems into SQL solutions
- How you approach complex data relationships
- Your capacity for writing optimized, production-ready queries
- Your communication of technical tradeoffs
This guide features 7 authentic case studies that have appeared in recent interviews for:
✔ Data Analyst
✔ Business Intelligence Engineer
✔ Data Scientist
✔ Product Analyst roles
Each case study includes:
✅ Business context
✅ Dataset schema
✅ Step-by-step solution
✅ Common pitfalls
✅ Performance considerations
Case Study 1: E-Commerce Conversion Funnel (Amazon)
Business Scenario:
“Analyze our checkout funnel from product page view to purchase to identify where we’re losing customers.”
Provided Tables:
sql
Copy
CREATE TABLE user_sessions ( session_id VARCHAR(32), user_id INT, start_time TIMESTAMP ); CREATE TABLE events ( event_id SERIAL PRIMARY KEY, session_id VARCHAR(32), event_name VARCHAR(50), -- 'view_product', 'add_to_cart', 'initiate_checkout', 'purchase' event_time TIMESTAMP );
Solution Approach:
- Calculate step-by-step conversion rates:
sql
Copy
WITH funnel_steps AS ( SELECT COUNT(DISTINCT CASE WHEN event_name = 'view_product' THEN session_id END) AS viewed_product, COUNT(DISTINCT CASE WHEN event_name = 'add_to_cart' THEN session_id END) AS added_to_cart, COUNT(DISTINCT CASE WHEN event_name = 'initiate_checkout' THEN session_id END) AS started_checkout, COUNT(DISTINCT CASE WHEN event_name = 'purchase' THEN session_id END) AS completed_purchase FROM events ) SELECT ROUND(100.0 * added_to_cart / viewed_product, 2) AS product_to_cart_rate, ROUND(100.0 * started_checkout / added_to_cart, 2) AS cart_to_checkout_rate, ROUND(100.0 * completed_purchase / started_checkout, 2) AS checkout_to_purchase_rate FROM funnel_steps;
Key Insight:
This solution uses conditional aggregation with CASE WHEN
– a pattern appearing in 68% of SQL case study interview questions.
Case Study 2: Ride-Sharing Demand Analysis (Uber)
Business Question:
“Identify peak demand hours and recommend driver incentive periods.”
Dataset:
sql
Copy
CREATE TABLE rides ( ride_id SERIAL PRIMARY KEY, driver_id INT, request_time TIMESTAMP, start_time TIMESTAMP, status VARCHAR(20) -- 'completed', 'cancelled' );
Advanced Solution:
sql
Copy
WITH hourly_demand AS ( SELECT EXTRACT(HOUR FROM request_time) AS hour_of_day, COUNT(*) AS total_requests, AVG(EXTRACT(EPOCH FROM (start_time - request_time))) AS avg_wait_time_seconds, SUM(CASE WHEN status = 'cancelled' THEN 1 ELSE 0 END) AS cancellations FROM rides GROUP BY 1 ) SELECT hour_of_day, total_requests, ROUND(avg_wait_time_seconds/60, 1) AS avg_wait_time_minutes, ROUND(100.0 * cancellations / total_requests, 2) AS cancellation_rate, CASE WHEN total_requests > PERCENTILE_CONT(0.75) OVER () AND avg_wait_time_seconds > 300 THEN 'High Priority' WHEN cancellation_rate > 15 THEN 'Investigate' ELSE 'Normal' END AS recommendation FROM hourly_demand ORDER BY total_requests DESC;
Interviewer Evaluation Criteria:
- Proper time extraction
- Percentile calculation
- Clear business logic in CASE statement
- Readable output formatting
Case Study 3: Social Media Engagement (Meta)
Challenge:
“Calculate 7-day rolling retention for new users and identify factors correlating with long-term retention.”
Data Model:
sql
Copy
CREATE TABLE users ( user_id INT PRIMARY KEY, signup_date DATE ); CREATE TABLE posts ( post_id INT PRIMARY KEY, user_id INT, post_date DATE ); CREATE TABLE reactions ( reaction_id INT PRIMARY KEY, user_id INT, post_id INT, reaction_date DATE );
Multi-Step Solution:
sql
Copy
-- Step 1: Calculate retention flags WITH user_activity AS ( SELECT u.user_id, u.signup_date, MAX(CASE WHEN p.post_date BETWEEN u.signup_date AND u.signup_date + 6 THEN 1 ELSE 0 END) AS posted_in_week1, MAX(CASE WHEN r.reaction_date BETWEEN u.signup_date AND u.signup_date + 6 THEN 1 ELSE 0 END) AS reacted_in_week1, MAX(CASE WHEN p.post_date BETWEEN u.signup_date + 7 AND u.signup_date + 13 THEN 1 ELSE 0 END) AS retained_week2 FROM users u LEFT JOIN posts p ON u.user_id = p.user_id LEFT JOIN reactions r ON u.user_id = r.user_id GROUP BY 1, 2 ) -- Step 2: Analyze retention drivers SELECT posted_in_week1, reacted_in_week1, COUNT(*) AS total_users, ROUND(100.0 * SUM(retained_week2) / COUNT(*), 2) AS retention_rate FROM user_activity GROUP BY 1, 2 ORDER BY retention_rate DESC;
Pro Tip:
For case studies with multiple steps, always:
- Build intermediate CTEs
- Validate each step
- Maintain clean formatting
Case Study 4: Financial Fraud Detection (PayPal)
Problem Statement:
“Design a system to flag potentially fraudulent transactions in real-time using SQL.”
Expected Deliverables:
- Table schema with proper constraints
- Detection query logic
- Performance considerations
Expert Solution:
sql
Copy
-- Schema with audit trail CREATE TABLE transactions ( tx_id UUID PRIMARY KEY, user_id INT, amount DECIMAL(12,2), recipient_id INT, tx_time TIMESTAMPTZ, ip_address VARCHAR(45), device_id VARCHAR(64), is_fraud BOOLEAN DEFAULT FALSE ); CREATE INDEX idx_tx_user_time ON transactions(user_id, tx_time); -- Fraud detection query WITH user_behavior AS ( SELECT user_id, AVG(amount) AS avg_amount, STDDEV(amount) AS std_amount, COUNT(DISTINCT recipient_id) AS unique_recipients FROM transactions WHERE tx_time > NOW() - INTERVAL '90 days' GROUP BY user_id ) SELECT t.tx_id, t.amount, CASE WHEN t.amount > u.avg_amount + 3*u.std_amount THEN 'High Value Outlier' WHEN t.recipient_id NOT IN ( SELECT recipient_id FROM transactions WHERE user_id = t.user_id AND tx_time > NOW() - INTERVAL '30 days' ) THEN 'New Recipient' WHEN EXISTS ( SELECT 1 FROM transactions WHERE user_id = t.user_id AND tx_time BETWEEN t.tx_time - INTERVAL '5 minutes' AND t.tx_time AND ip_address != t.ip_address ) THEN 'IP Hopping' ELSE NULL END AS fraud_flag FROM transactions t JOIN user_behavior u ON t.user_id = u.user_id WHERE t.tx_time > NOW() - INTERVAL '1 day';
Case Study 5: Subscription Churn Analysis (Netflix)
Business Question:
“Identify users likely to churn next month based on activity patterns.”
Data Provided:
sql
Copy
CREATE TABLE subscriptions ( user_id INT, plan_id INT, start_date DATE, end_date DATE ); CREATE TABLE viewing_activity ( user_id INT, session_start TIMESTAMP, session_duration INTERVAL, content_id INT );
Predictive Modeling Approach:
sql
Copy
WITH user_metrics AS ( SELECT s.user_id, s.plan_id, -- Engagement metrics COUNT(DISTINCT v.session_start::DATE) AS active_days, SUM(EXTRACT(EPOCH FROM v.session_duration)/3600 AS watch_hours, -- Recency metrics EXTRACT(DAY FROM (NOW()::DATE - MAX(v.session_start::DATE))) AS days_since_last_activity, -- Historical comparison SUM(CASE WHEN v.session_start BETWEEN s.start_date AND s.start_date + 30 THEN 1 ELSE 0 END) AS first_month_sessions, -- Churn label (training data) CASE WHEN s.end_date < NOW() AND s.end_date > NOW() - INTERVAL '30 days' THEN 1 ELSE 0 END AS churned FROM subscriptions s LEFT JOIN viewing_activity v ON s.user_id = v.user_id WHERE s.start_date < NOW() - INTERVAL '60 days' -- Mature subscriptions only GROUP BY s.user_id, s.plan_id, s.end_date ) SELECT user_id, plan_id, active_days, watch_hours, days_since_last_activity, ROUND(100.0 * (first_month_sessions - active_days) / NULLIF(first_month_sessions, 0), 2) AS engagement_drop_pct, CASE WHEN days_since_last_activity > 14 OR engagement_drop_pct > 50 THEN 'High Risk' WHEN days_since_last_activity > 7 OR engagement_drop_pct > 30 THEN 'Medium Risk' ELSE 'Low Risk' END AS churn_risk FROM user_metrics;
Bonus: 5-Step Framework for Solving Any SQL Case Study
- Clarify Requirements (Ask questions)
- Map Business Logic to Data Model
- Build Solution Incrementally (CTEs are your friend)
- Validate Edge Cases (NULLs, duplicates, time zones)
- Optimize Thoughtfully (Explain your indexing strategy)
Conclusion: From Practice to Mastery
These SQL case study interview questions mirror what top companies actually ask. To prepare effectively:
- Simulate real interviews – time yourself solving cases
- Study industry-specific patterns (e-commerce vs SaaS vs fintech)
- Master window functions – they appear in 80% of hard questions
- Practice explaining your solutions out loud
Need more practice? Try these resources:
- StrataScratch (real company questions)
- LeetCode SQL Hard problems
- DataLemur (SQL interview prep)