Topic 28: Business Use Cases & Solutions
What You'll Learn
This topic covers real-world business problems with detailed solutions:
- Customer Churn Prediction
- Recommendation Systems
- Fraud Detection
- Price Optimization
- Demand Forecasting
- Click-Through Rate (CTR) Prediction
- Customer Lifetime Value (CLV)
- A/B Testing for ML Models
Why We Need This
Interview Importance
- Common questions: "How would you solve X business problem?"
- Practical knowledge: Shows real-world experience
- Problem-solving: Demonstrates end-to-end thinking
Real-World Application
- Production systems: These are real problems
- Business impact: ML drives business value
- System design: Need to consider scalability, monitoring
Core Intuition
Business ML questions are usually testing whether you can connect technical modeling to a real decision.
A strong answer connects:
- business objective
- target definition
- features
- metric
- intervention
- deployment and measurement
The interviewer usually cares less about the fanciest model than about whether the whole system makes sense.
Technical Details Interviewers Often Want
Target Definition Is Critical
If the target is poorly defined, the rest of the pipeline is compromised.
Examples:
- churn window too short or too long
- fraud labels delayed or noisy
- forecast horizon misaligned with the decision horizon
Metrics Must Match Actionability
The right metric depends on the action.
Examples:
- Precision@K for limited outreach budgets
- recall for high-cost missed positives
- ranking metrics for recommender surfaces
Intervention Matters
A model is only valuable if there is a useful downstream action and a way to measure impact.
That is why applied interview answers should mention:
- intervention
- cost
- A/B testing or lift measurement
Common Failure Modes
- choosing a model before defining the business problem
- using the wrong evaluation metric
- ignoring leakage or delayed labels
- forgetting intervention cost
- discussing deployment without monitoring or ROI measurement
Edge Cases and Follow-Up Questions
- Why is target definition often the hardest part?
- Why can a good offline metric still fail to create business value?
- How do you choose the right metric under a fixed intervention budget?
- Why is A/B testing often needed even after a strong offline result?
- Why is prediction quality not the same as business impact?
What to Practice Saying Out Loud
- How to frame a business problem before choosing a model
- Why metric choice depends on actionability and cost
- Why an ML system needs intervention and measurement, not just prediction
Business Use Cases
1. Customer Churn Prediction
Problem Statement: A subscription-based company wants to predict which customers will cancel their subscription in the next month.
Business Impact:
- Reduce churn by 10% → Save $X million annually
- Identify at-risk customers early
- Target retention campaigns effectively
Solution Approach:
Step 1: Define Problem
Target Variable: We need to define what "churn" means. For a subscription service, churn typically means a customer canceled their subscription. We create a binary target:
- Churn = 1: Customer canceled subscription in the next 30 days
- Churn = 0: Customer did not cancel in the next 30 days
Time Window: We predict churn in the next 30 days. This gives the business time to intervene. The window should be:
- Long enough to allow intervention (30 days is reasonable)
- Short enough to be actionable (predicting 1 year ahead is less useful)
- Aligned with business cycles (monthly subscriptions → 30 days)
Evaluation Metric: We use Precision@K because:
- Business has limited resources for retention campaigns
- Can only reach out to top K customers (e.g., top 1000)
- Want to maximize the proportion of those K who actually churn
- Precision@K = (Number of churned customers in top K) / K
Why not just accuracy?
- Churn is imbalanced (maybe 5% churn rate)
- Accuracy would be high (95%) even if we predict "no churn" for everyone
- Precision@K focuses on the actionable customers
Step 2: Data Collection
Feature Categories:
Demographics:
- Age, location, subscription tier, account age
- Why important: Different segments have different churn patterns (e.g., students vs professionals)
Usage Patterns:
- Login frequency (daily, weekly, monthly)
- Feature usage (which features used, how often)
- Session duration (average time per session)
- Why important: Declining usage often precedes churn. If someone stops logging in, they're likely to churn.
Engagement Metrics:
- Support ticket count (more tickets might indicate dissatisfaction)
- Feedback scores (low satisfaction → higher churn)
- Response to emails/notifications (engagement with communications)
- Why important: Engagement is a strong indicator of satisfaction and retention
Payment History:
- Payment method, payment frequency
- Failed payment count (payment issues → churn)
- Payment delays (late payments might indicate financial stress)
- Why important: Payment problems are a direct cause of churn
Behavioral Patterns:
- Time since last login (inactive users → churn)
- Feature adoption rate (tried new features? engaged with product?)
- Usage trend (increasing or decreasing?)
- Why important: Behavioral changes signal intent to churn
Target Variable Creation: For each customer at time T, we look forward 30 days:
- If they canceled between T and T+30 days → churn = 1
- Otherwise → churn = 0
Important: We must only use information available at time T. No future information (data leakage).
Step 3: Feature Engineering
Time-Based Features: These capture recency and frequency patterns:
# Days since last login - critical feature
# If someone hasn't logged in for 30 days, high churn risk
features['days_since_last_login'] = (today - last_login_date).days
# Login frequency - measures engagement
features['login_frequency_7d'] = count_logins_last_7_days
features['login_frequency_30d'] = count_logins_last_30_days
# Trend: Is usage increasing or decreasing?
features['usage_trend'] = login_frequency_7d / (login_frequency_30d + 1)
# If trend < 1, usage is declining → churn risk
Why these matter: Inactive users are at high risk. If someone stops using the product, they'll likely cancel.
Engagement Features:
# Support tickets - more tickets might indicate problems
features['support_ticket_count'] = count_tickets_last_month
features['avg_ticket_resolution_time'] = average_resolution_hours
# Satisfaction - direct measure of happiness
features['satisfaction_score'] = average_feedback_score
features['satisfaction_trend'] = recent_score - historical_score
# Feature adoption - engaged users try new features
features['feature_adoption_rate'] = features_used / total_features
features['new_features_tried_last_month'] = count_new_features_tried
Why these matter: Dissatisfaction and lack of engagement are strong predictors of churn.
Payment Features:
# Payment problems - direct cause of churn
features['failed_payment_count'] = count_failed_payments_last_3_months
features['payment_delay_days'] = average_delay_days
features['payment_method_changes'] = count_payment_method_changes
# Subscription details
features['subscription_age_days'] = days_since_signup
features['price_per_month'] = monthly_subscription_cost
features['discount_applied'] = is_using_discount
Why these matter: Payment issues are a direct cause of involuntary churn. Also, newer customers and those on discounts might churn more.
Behavioral Patterns:
# Feature usage patterns
features['most_used_feature'] = feature_with_highest_usage
features['feature_diversity'] = number_of_different_features_used
# Session patterns
features['avg_session_duration'] = average_minutes_per_session
features['sessions_per_week'] = average_sessions_per_week
# Communication engagement
features['email_open_rate'] = emails_opened / emails_sent
features['notification_response_rate'] = notifications_clicked / notifications_sent
Why these matter: Declining engagement across multiple dimensions indicates churn risk.
Step 4: Model Selection
- Baseline: Logistic regression (interpretable)
- Production: Gradient Boosting (XGBoost) for accuracy
- Ensemble: Combine multiple models
Step 5: Evaluation Metrics
- Primary: Precision@K (for top K at-risk customers)
- Secondary: Recall, F1-score, AUC-ROC
- Business: Cost of intervention vs cost of churn
Step 6: Deployment
- Real-time: Score customers daily
- Action: Send retention offers to top K at-risk customers
- Monitoring: Track actual churn rate, model drift
Step 7: Business Impact Measurement
- A/B Test: Random sample gets no intervention (control)
- Measure: Churn rate reduction in treatment group
- ROI: (Churn reduction × Customer value) - Intervention cost
Code Structure:
# 1. Data preparation
def prepare_churn_data(df):
# Feature engineering
# Target creation
return features, target
# 2. Model training
def train_churn_model(X_train, y_train):
model = XGBClassifier()
model.fit(X_train, y_train)
return model
# 3. Prediction
def predict_churn_risk(model, customer_features):
risk_score = model.predict_proba(customer_features)[:, 1]
return risk_score
# 4. Action
def get_top_at_risk_customers(model, customers, k=1000):
scores = predict_churn_risk(model, customers)
top_k_indices = np.argsort(scores)[-k:]
return top_k_indices
2. Recommendation System
Problem Statement: E-commerce platform wants to recommend products to users to increase sales.
Business Impact:
- Increase conversion rate by 15%
- Improve user engagement
- Increase average order value
Solution Approach:
Step 1: Problem Definition
- Type: Collaborative filtering + Content-based
- Evaluation: Precision@K, Recall@K, NDCG@K
- Real-time: Need <100ms latency
Step 2: Data
- User-item interactions: Views, purchases, ratings
- Item features: Category, price, brand, description
- User features: Demographics, purchase history
Step 3: Model Architecture
- Hybrid approach:
- Collaborative filtering (matrix factorization)
- Content-based (item similarity)
- Deep learning (neural collaborative filtering)
- Ensemble: Combine all three
Step 4: Implementation
# Matrix factorization
def train_collaborative_filtering(interactions):
# User-item matrix
# Factorize: R ≈ P @ Q^T
model = MatrixFactorization()
model.fit(interactions)
return model
# Content-based
def content_based_recommendations(user_history, item_features):
# Find items similar to user's past purchases
similarities = cosine_similarity(user_history, item_features)
return top_k_items(similarities)
# Hybrid
def hybrid_recommendations(user_id, collaborative_scores, content_scores):
# Weighted combination
final_scores = 0.6 * collaborative_scores + 0.4 * content_scores
return top_k_items(final_scores)
Step 5: Evaluation
- Offline: Train/test split, cross-validation
- Online: A/B test with real users
- Metrics: CTR, conversion rate, revenue
Step 6: Deployment
- Caching: Pre-compute recommendations
- Real-time: Update for new users/items
- Cold start: Use content-based for new users
3. Fraud Detection
Problem Statement: Payment processor needs to detect fraudulent transactions in real-time.
Business Impact:
- Reduce fraud losses by $X million
- Maintain low false positive rate (<0.1%)
- Process transactions in <50ms
Solution Approach:
Step 1: Problem Definition
- Type: Anomaly detection + Classification
- Imbalance: 99.9% legitimate, 0.1% fraud
- Latency: Real-time (<50ms)
Step 2: Features
- Transaction: Amount, merchant, location, time
- User: Historical behavior, device, IP
- Pattern: Velocity (transactions/hour), unusual patterns
Step 3: Model
- Primary: Isolation Forest (anomaly detection)
- Secondary: Gradient Boosting (classification)
- Ensemble: Combine both
Step 4: Handling Imbalance
- Sampling: SMOTE (oversample minority)
- Cost-sensitive: Higher penalty for false negatives
- Threshold tuning: Optimize for business metric
Step 5: Evaluation
- Primary: Precision (minimize false positives)
- Secondary: Recall (catch fraud)
- Business: Fraud detection rate, false positive rate
Step 6: Deployment
- Real-time scoring: Model API
- Rule-based fallback: For edge cases
- Human review: Top risk transactions
4. Price Optimization
Problem Statement: Ride-sharing company wants to optimize pricing to maximize revenue.
Business Impact:
- Increase revenue by 5-10%
- Balance supply and demand
- Maintain customer satisfaction
Solution Approach:
Step 1: Problem
- Objective: Maximize revenue = price × demand(price)
- Constraints: Price bounds, competitor prices
- Dynamic: Update prices in real-time
Step 2: Data
- Historical: Past prices, demand, revenue
- Context: Time, location, weather, events
- Competitor: Competitor prices
Step 3: Model
- Demand model: Predict demand as function of price
- Price elasticity: How demand changes with price
- Optimization: Find price that maximizes revenue
Step 4: Implementation
def predict_demand(price, features):
# Demand = f(price, time, location, ...)
model = GradientBoostingRegressor()
demand = model.predict([[price] + features])
return demand
def optimize_price(features, price_bounds):
# Revenue = price × demand(price)
# Find price that maximizes revenue
best_price = None
best_revenue = 0
for price in np.linspace(price_bounds[0], price_bounds[1], 100):
demand = predict_demand(price, features)
revenue = price * demand
if revenue > best_revenue:
best_revenue = revenue
best_price = price
return best_price
Step 5: Evaluation
- A/B test: Compare optimized vs fixed pricing
- Metrics: Revenue, customer satisfaction, supply/demand balance
5. Demand Forecasting
Problem Statement: Retailer needs to forecast product demand to optimize inventory.
Business Impact:
- Reduce inventory costs by 20%
- Minimize stockouts
- Improve customer satisfaction
Solution Approach:
Step 1: Problem
- Type: Time series forecasting
- Horizon: Next 7, 30, 90 days
- Granularity: Product × Store × Day
Step 2: Features
- Historical: Past sales, trends, seasonality
- External: Holidays, promotions, weather
- Product: Category, price, lifecycle stage
Step 3: Models
- Baseline: Moving average, exponential smoothing
- Advanced: ARIMA, Prophet, LSTM
- Ensemble: Combine multiple models
Step 4: Evaluation
- Metrics: MAPE, RMSE, MAE
- Business: Stockout rate, overstock rate
General Problem-Solving Framework
1. Understand Business Problem
- What is the business goal?
- What are the constraints?
- What is success?
2. Define ML Problem
- Classification, regression, ranking?
- What is the target variable?
- What are the features?
3. Data Collection
- What data is available?
- What data is needed?
- How to label data?
4. Feature Engineering
- Domain knowledge
- Time-based features
- Interactions
5. Model Selection
- Start simple (baseline)
- Try multiple models
- Ensemble if needed
6. Evaluation
- Offline metrics
- Online A/B test
- Business metrics
7. Deployment
- Real-time vs batch
- Latency requirements
- Monitoring
8. Iteration
- Monitor performance
- Collect feedback
- Improve model
Exercises
- Design solution for a new business problem
- Implement end-to-end pipeline
- Evaluate business impact
- Design A/B test
Next Steps
- Topic 29: System design for ML
- Topic 30: A/B testing and experimentation