Model Card
See ML Models for full documentation of the XGBoost and Isolation Forest models.
Quick reference
| Property | XGBoost | Isolation Forest |
|---|---|---|
| Type | Supervised classifier | Unsupervised anomaly detector |
| Target | Known fraud patterns | Novel/zero-day clusters |
| Features | 4 (amount, geo_velocity, typing_entropy, device_is_emulator) | Same 4 |
| File | data/models/xgb_fraud.joblib |
data/models/iso_anomaly.joblib |
| Used in API | Yes (predict_proba) |
No (backtest only) |
| Metric | AUPRC | Recall/precision vs is_fraud labels |
Intended use
- In scope: Real-time fraud scoring for payment transactions at neobanks and fintechs.
- Out of scope: Credit underwriting, identity verification as a primary KYC method, non-financial fraud.
Limitations
- Models are trained on synthetic data generated by the DIBB simulation engine. Performance on real production transaction data will differ.
- The Isolation Forest is evaluated in
backtest_flow.pybut its score is not yet fused into the API fast path. The ensemble decision is driven solely by the XGBoost score. - Metrics reported by the training pipeline are on synthetic simulation data. They should be used for relative comparison (before/after a rule change), not as absolute production performance claims.
Fallback behavior
If data/models/xgb_fraud.joblib does not exist (e.g. before running make train):
load_model()returns aMockModelwith aRuntimeWarning- MockModel returns
predict_proba = [0.98, 0.02]for all inputs (2% fraud probability) - SHAP computation is skipped (MockModel has no
get_booster()method) - Rule-based decisions still function normally
Run make train to build the production model.