Expansion Propensity Model Card¶

Model name: expansion_model Version: 1.1.0 Type: XGBoostClassifier + CalibratedClassifierCV (isotonic, cv=5) Task: Binary classification — P(upgrade to next plan tier within 90 days) Artifact: models/expansion_model.pkl + models/expansion_model_metadata.json

Purpose¶

The expansion propensity model is the offensive complement to the defensive churn model. Together they cover the full NRR lifecycle: Retain + Expand. The model targets Customer Success and Sales teams who need to identify accounts with genuine upgrade intent, distinguish them from accounts that look active but are actually churning, and prioritise outreach by expected ARR uplift.

Features (22 total)¶

Base churn features (16) — reused from `mart_customer_churn_features`¶

Feature	Type	Description
`mrr`	numeric	Monthly Recurring Revenue (USD)
`tenure_days`	numeric	Days since signup
`total_events`	numeric	Lifetime product event count
`events_last_30d`	numeric	Product activity last 30 days
`events_last_7d`	numeric	Product activity last 7 days
`avg_adoption_score`	numeric	Average feature adoption score
`days_since_last_event`	numeric	Recency of last product interaction
`retention_signal_count`	numeric	High-value events (API, integrations, monitoring)
`integration_connects_first_30d`	numeric	Integrations in onboarding window
`tickets_last_30d`	numeric	Support tickets last 30 days
`high_priority_tickets`	numeric	Lifetime high/critical ticket count
`avg_resolution_hours`	numeric	Average ticket resolution time
`is_early_stage`	binary	In first 90 days of tenure
`activated_at_30d`	binary	≥3 integration connects in first 30 days (2.7× lower churn, log-rank p<0.001)
`plan_tier`	categorical	free / starter / growth / enterprise / custom
`industry`	categorical	Industry vertical

Expansion-specific signals (6) — from `mart_customer_expansion_features`¶

#	Feature	Type	Description	Business meaning
16	`premium_feature_trials_30d`	numeric	`premium_feature_trial` events last 30d	Customers trialing capabilities above their tier
17	`feature_request_tickets_90d`	numeric	Feature request tickets last 90d	Asking for capabilities they don't have yet
18	`has_open_expansion_opp`	binary	Active expansion GTM opportunity	Sales team already sees upgrade signal
19	`expansion_opp_amount`	numeric	USD value of open expansion opp	Dollar size of Sales-identified opportunity
20	`mrr_tier_ceiling_pct`	numeric [0,1]	(MRR − floor) / (ceiling − floor)	How close MRR is to the top of current tier. FREE tier: always 0.0
21	`feature_limit_hit_30d`	numeric	`feature_limit_hit` events last 30d	Primary free-tier signal — customer has hit a data-sharing/export limit

Feature 21 rationale: For free-tier customers, mrr_tier_ceiling_pct is always 0.0 (no MRR). feature_limit_hit_30d provides a direct behavioural signal that the customer has outgrown their tier. CS practitioners report 38–45% conversion when contacting free customers who hit limits during active audit cycles (see docs/stakeholder-notes.md Section 5.1).

Leakage Guard¶

has_open_expansion_opp must NOT be the #1 SHAP feature.

If it is, the Sales team may be creating expansion opportunities in response to usage signals that the model should be discovering independently. This creates a circular dependency — the feature is a consequence of upgrade intent, not a cause. The training script logs a warning if this occurs. In that case:

Retrain without has_open_expansion_opp
Compare AUC delta — if < 0.03, remove the feature permanently
See notebooks/expansion_propensity_modeling.ipynb Section 5 for the SHAP beeswarm leakage check

Training Data¶

Split	Criteria	Notes
Train	signup_date < 2025-06-01	~18 months of cohorts
Test	signup_date ≥ 2025-06-01	~9 months of cohorts, out-of-time
Scope	Active + never-churned customers only	Churned customers excluded
Label	`upgrade_date IS NOT NULL`	Upgraded = 1, never upgraded = 0

Point-in-time correctness: For upgraded customers, all features are computed AS OF upgrade_date. For active customers, AS OF REFERENCE_DATE. No future data leaks.

Performance Metrics¶

Metric	Threshold	Notes
AUC-ROC	> 0.75	Acceptance gate in training script
Brier score	< 0.10	Calibration quality
Precision @decile 1	Reported	Fraction of upgraders in top 10% by score

Known Limitations¶

Synthetic data only — the model is trained on generated data with causal correlations, not real customer histories. AUC will change when deployed against real data.
Enterprise → Custom tier — the 1.2× uplift multiplier for seat/add-on expansion is a conservative estimate. Actual enterprise expansion deals vary widely.
No seasonality — monthly MRR cycles and annual contract renewals are not captured.
Static tier ceilings — mrr_tier_ceiling_pct uses hardcoded tier boundaries (Starter: 500–2000, Growth: 2000–8000, Enterprise: 8000–50000). Update in train_expansion_model.py if pricing changes.

Retraining Cadence¶

Trigger	Action
Monthly	Evaluate AUC on new cohort; flag if < 0.72
Quarterly	Full retrain on rolling 24-month window
Pricing change	Immediate retrain + update `mrr_tier_ceiling_pct` tier boundaries
SHAP leakage alert	Investigate `has_open_expansion_opp` dominance; retrain if needed