Skip to content

Model Card

See ML Models for full documentation of the XGBoost and Isolation Forest models.


Quick reference

Property XGBoost Isolation Forest
Type Supervised classifier Unsupervised anomaly detector
Target Known fraud patterns Novel/zero-day clusters
Features 4 (amount, geo_velocity, typing_entropy, device_is_emulator) Same 4
File data/models/xgb_fraud.joblib data/models/iso_anomaly.joblib
Used in API Yes (predict_proba) No (backtest only)
Metric AUPRC Recall/precision vs is_fraud labels

Intended use

  • In scope: Real-time fraud scoring for payment transactions at neobanks and fintechs.
  • Out of scope: Credit underwriting, identity verification as a primary KYC method, non-financial fraud.

Limitations

  • Models are trained on synthetic data generated by the DIBB simulation engine. Performance on real production transaction data will differ.
  • The Isolation Forest is evaluated in backtest_flow.py but its score is not yet fused into the API fast path. The ensemble decision is driven solely by the XGBoost score.
  • Metrics reported by the training pipeline are on synthetic simulation data. They should be used for relative comparison (before/after a rule change), not as absolute production performance claims.

Fallback behavior

If data/models/xgb_fraud.joblib does not exist (e.g. before running make train):

  • load_model() returns a MockModel with a RuntimeWarning
  • MockModel returns predict_proba = [0.98, 0.02] for all inputs (2% fraud probability)
  • SHAP computation is skipped (MockModel has no get_booster() method)
  • Rule-based decisions still function normally

Run make train to build the production model.