Expansion Propensity Model Card¶
Model name: expansion_model
Version: 1.1.0
Type: XGBoostClassifier + CalibratedClassifierCV (isotonic, cv=5)
Task: Binary classification — P(upgrade to next plan tier within 90 days)
Artifact: models/expansion_model.pkl + models/expansion_model_metadata.json
Purpose¶
The expansion propensity model is the offensive complement to the defensive churn model. Together they cover the full NRR lifecycle: Retain + Expand. The model targets Customer Success and Sales teams who need to identify accounts with genuine upgrade intent, distinguish them from accounts that look active but are actually churning, and prioritise outreach by expected ARR uplift.
Features (22 total)¶
Base churn features (16) — reused from mart_customer_churn_features¶
| Feature | Type | Description |
|---|---|---|
mrr |
numeric | Monthly Recurring Revenue (USD) |
tenure_days |
numeric | Days since signup |
total_events |
numeric | Lifetime product event count |
events_last_30d |
numeric | Product activity last 30 days |
events_last_7d |
numeric | Product activity last 7 days |
avg_adoption_score |
numeric | Average feature adoption score |
days_since_last_event |
numeric | Recency of last product interaction |
retention_signal_count |
numeric | High-value events (API, integrations, monitoring) |
integration_connects_first_30d |
numeric | Integrations in onboarding window |
tickets_last_30d |
numeric | Support tickets last 30 days |
high_priority_tickets |
numeric | Lifetime high/critical ticket count |
avg_resolution_hours |
numeric | Average ticket resolution time |
is_early_stage |
binary | In first 90 days of tenure |
activated_at_30d |
binary | ≥3 integration connects in first 30 days (2.7× lower churn, log-rank p<0.001) |
plan_tier |
categorical | free / starter / growth / enterprise / custom |
industry |
categorical | Industry vertical |
Expansion-specific signals (6) — from mart_customer_expansion_features¶
| # | Feature | Type | Description | Business meaning |
|---|---|---|---|---|
| 16 | premium_feature_trials_30d |
numeric | premium_feature_trial events last 30d |
Customers trialing capabilities above their tier |
| 17 | feature_request_tickets_90d |
numeric | Feature request tickets last 90d | Asking for capabilities they don't have yet |
| 18 | has_open_expansion_opp |
binary | Active expansion GTM opportunity | Sales team already sees upgrade signal |
| 19 | expansion_opp_amount |
numeric | USD value of open expansion opp | Dollar size of Sales-identified opportunity |
| 20 | mrr_tier_ceiling_pct |
numeric [0,1] | (MRR − floor) / (ceiling − floor) | How close MRR is to the top of current tier. FREE tier: always 0.0 |
| 21 | feature_limit_hit_30d |
numeric | feature_limit_hit events last 30d |
Primary free-tier signal — customer has hit a data-sharing/export limit |
Feature 21 rationale: For free-tier customers, mrr_tier_ceiling_pct is always 0.0 (no MRR). feature_limit_hit_30d provides a direct behavioural signal that the customer has outgrown their tier. CS practitioners report 38–45% conversion when contacting free customers who hit limits during active audit cycles (see docs/stakeholder-notes.md Section 5.1).
Leakage Guard¶
has_open_expansion_opp must NOT be the #1 SHAP feature.
If it is, the Sales team may be creating expansion opportunities in response to usage signals that the model should be discovering independently. This creates a circular dependency — the feature is a consequence of upgrade intent, not a cause. The training script logs a warning if this occurs. In that case:
- Retrain without
has_open_expansion_opp - Compare AUC delta — if < 0.03, remove the feature permanently
- See
notebooks/expansion_propensity_modeling.ipynbSection 5 for the SHAP beeswarm leakage check
Training Data¶
| Split | Criteria | Notes |
|---|---|---|
| Train | signup_date < 2025-06-01 | ~18 months of cohorts |
| Test | signup_date ≥ 2025-06-01 | ~9 months of cohorts, out-of-time |
| Scope | Active + never-churned customers only | Churned customers excluded |
| Label | upgrade_date IS NOT NULL |
Upgraded = 1, never upgraded = 0 |
Point-in-time correctness: For upgraded customers, all features are computed AS OF upgrade_date. For active customers, AS OF REFERENCE_DATE. No future data leaks.
Performance Metrics¶
| Metric | Threshold | Notes |
|---|---|---|
| AUC-ROC | > 0.75 | Acceptance gate in training script |
| Brier score | < 0.10 | Calibration quality |
| Precision @decile 1 | Reported | Fraction of upgraders in top 10% by score |
Known Limitations¶
- Synthetic data only — the model is trained on generated data with causal correlations, not real customer histories. AUC will change when deployed against real data.
- Enterprise → Custom tier — the 1.2× uplift multiplier for seat/add-on expansion is a conservative estimate. Actual enterprise expansion deals vary widely.
- No seasonality — monthly MRR cycles and annual contract renewals are not captured.
- Static tier ceilings —
mrr_tier_ceiling_pctuses hardcoded tier boundaries (Starter: 500–2000, Growth: 2000–8000, Enterprise: 8000–50000). Update intrain_expansion_model.pyif pricing changes.
Retraining Cadence¶
| Trigger | Action |
|---|---|
| Monthly | Evaluate AUC on new cohort; flag if < 0.72 |
| Quarterly | Full retrain on rolling 24-month window |
| Pricing change | Immediate retrain + update mrr_tier_ceiling_pct tier boundaries |
| SHAP leakage alert | Investigate has_open_expansion_opp dominance; retrain if needed |