Model Card — SaaSGuard Churn Prediction Model v0.5¶

Intended for CS operations teams, product analytics, and technical stakeholders. Updated: 2026-03-20. Artifact: models/churn_model.pkl (DVC-tracked).

Model Summary¶

Property	Value
Task	Binary classification — P(customer churns within next 90 days)
Algorithm	XGBoost inside sklearn `Pipeline`, wrapped in `CalibratedClassifierCV` (isotonic, cv=5)
Feature count	16 (14 numerical + 2 categorical encoded as ordinal)
Training data	5,000 customers (1,471 churned / 3,529 active); RANDOM_SEED=42
Validation strategy	Out-of-time split — train: `signup_date < 2025-06-01`, test: `≥ 2025-06-01`
Output	Calibrated probability ∈ [0, 1] + top-5 SHAP drivers + risk tier + recommended CS action

Intended Use¶

Primary use case: Customer Success teams prioritising outreach for at-risk accounts.

Input: A customer_id for any active customer in marts.mart_customer_churn_features.

Output consumed by: - POST /predictions/churn — individual prediction with SHAP explanation - Superset Churn Heatmap dashboard — account-level risk ranked by churn_probability × MRR - CS intervention queue — GET /customers?tier=critical&limit=20

Not intended for: - Predicting churn for customers with < 7 days tenure (insufficient usage signal) - Automated contract termination decisions (human-in-the-loop required) - Any use outside CS prioritisation without bias audit (see below)

Features¶

Feature	Type	Source	EDA Signal
`mrr`	Numerical	`raw.customers`	Revenue-at-risk weighting
`tenure_days`	Numerical	Derived: `signup_date → reference_date`	Time-in-product proxy
`total_events`	Numerical	`raw.usage_events`	Lifetime engagement volume
`events_last_30d`	Numerical	`raw.usage_events`	Primary decay signal — r = −0.38
`events_last_7d`	Numerical	`raw.usage_events`	Leading disengagement indicator
`avg_adoption_score`	Numerical	`raw.usage_events`	Feature depth — r = −0.34
`days_since_last_event`	Numerical	`raw.usage_events`	Recency decay
`retention_signal_count`	Numerical	`raw.usage_events`	Deep product adoption — r = −0.32
`integration_connects_first_30d`	Numerical	`raw.usage_events`	Activation gate: ≥3 → 2.7× lower churn
`tickets_last_30d`	Numerical	`raw.support_tickets`	Pre-churn frustration signal
`high_priority_tickets`	Numerical	`raw.support_tickets`	r = +0.27
`avg_resolution_hours`	Numerical	`raw.support_tickets`	CS experience quality
`is_early_stage`	Binary (int)	Derived: `tenure_days ≤ 90`	First-90-day cohort flag
`activated_at_30d`	Binary (int)	Derived: `integration_connects_first_30d ≥ 3`	Onboarding activation gate — 2.7× lower churn (log-rank p<0.001)
`plan_tier`	Categorical → ordinal	`raw.customers`	free=0, starter=1, growth=2, enterprise=3, custom=4
`industry`	Categorical → ordinal	`raw.customers`	fintech=0, healthtech=1, legaltech=2, proptech=3, saas=4

All features are pre-aggregated by dbt in mart_customer_churn_features. Feature engineering lives entirely in dbt — not duplicated in Python.

Performance Metrics (RANDOM_SEED=42, out-of-time test set)¶

Metric	Target	Status
AUC-ROC	> 0.80	✅ Met
Brier score	< 0.15	✅ Met
Precision @ top decile	> 0.60	✅ Met
Calibration per tier	±15pp of KM baseline	✅ Met

Calibration note: predict_proba() uses CalibratedClassifierCV. SHAP values are computed on the underlying XGBoost base model — relative rankings and directions are preserved (calibration is monotonic). See docs/shap-analysis.md for details.

Training Strategy¶

Point-in-time correctness: churned customers' features are computed as of their churn_date. Active customers use REFERENCE_DATE = 2026-03-14. This prevents leakage where post-churn behaviour contaminates the feature vector.

Label: is_churned — a churned-vs-active discriminator. The model learns the pre-churn signal pattern (usage decay, ticket spikes) from all 1,471 labelled churn examples. The "90-day horizon" is the CS intervention window communicated to business stakeholders, not the label definition.

Class imbalance: handled with scale_pos_weight = n_negative / n_positive in XGBoost. The calibration layer further corrects probability estimates per tier.

Reproducibility: dvc repro retrains from scratch deterministically. All hyperparameters are logged in models/churn_model_metadata.json.

SHAP Explainability¶

Every prediction returns top_shap_features — the 5 features with largest |SHAP impact|:

Top driver	Direction	CS action
Low `events_last_30d`	↓ risk when high	Schedule product walkthrough
Low `avg_adoption_score`	↓ risk when high	Assign onboarding specialist
High `days_since_last_event`	↑ risk	Re-engagement campaign — silent churn risk
High `high_priority_tickets`	↑ risk	Escalate to senior CSM
Low `integration_connects_first_30d`	↓ risk when high	Integration health check call

Full global importance rankings and individual waterfall charts: notebooks/churn_model_training_and_calibration.ipynb § SHAP Analysis.

Risk Tiers¶

Tier	Probability range	Recommended action
`low`	< 0.30	Monitor quarterly; standard CS cadence
`medium`	0.30 – 0.60	Proactive check-in within 30 days
`high`	0.60 – 0.80	Escalate to CSM; schedule EBR within 14 days
`critical`	> 0.80	Escalate to senior CSM immediately; schedule EBR within 7 days

Known Limitations & Bias Considerations¶

Synthetic data: The model is trained on Faker-generated data with baked-in correlations. Real-world performance should be validated on production data before acting on predictions for live customers.
Industry imbalance: FinTech and HealthTech customers dominate the training set. Industries with fewer examples (PropTech, LegalTech) have wider prediction uncertainty.
No temporal drift detection: Model training cutoff is 2025-06-01. After 90 days from any deployment date, a data drift check should be triggered. See ADR-004 for the drift monitoring implementation.
Label definition: The model distinguishes churned from active customers — it does not predict future churn probability for customers who have never churned. Calibration per tier addresses this but does not eliminate uncertainty.
Human-in-the-loop required: Model output must be reviewed by a CS manager before automated outreach or contract actions. recommended_action is advisory only.

Reproducing the Model¶

# 1. Generate data (if not already done)
dvc repro generate_data build_duckdb

# 2. Build the dbt feature mart
docker compose exec dbt dbt run --select mart_customer_churn_features

# 3. Train (deterministic — RANDOM_SEED=42 fixed in params.yaml)
uv run python -m src.infrastructure.ml.train_churn_model

# 4. Run accuracy gates
pytest tests/model_accuracy/test_churn_model.py -v --no-cov

# 5. Check metadata
cat models/churn_model_metadata.json

Versioning¶

Model artifacts are DVC-tracked and not committed to git: - models/churn_model.pkl — serialised CalibratedClassifierCV wrapping the sklearn Pipeline - models/churn_model_metadata.json — version, training date, AUC, Brier, feature list, data cutoff

To pull a specific model version: dvc pull models/churn_model.pkl.