Model Card

See ML Models for full documentation of the XGBoost and Isolation Forest models.

Quick reference

Property	XGBoost	Isolation Forest
Type	Supervised classifier	Unsupervised anomaly detector
Target	Known fraud patterns	Novel/zero-day clusters
Features	4 (amount, geo_velocity, typing_entropy, device_is_emulator)	Same 4
File	`data/models/xgb_fraud.joblib`	`data/models/iso_anomaly.joblib`
Used in API	Yes (`predict_proba`)	No (backtest only)
Metric	AUPRC	Recall/precision vs is_fraud labels

In scope: Real-time fraud scoring for payment transactions at neobanks and fintechs.
Out of scope: Credit underwriting, identity verification as a primary KYC method, non-financial fraud.

Models are trained on synthetic data generated by the DIBB simulation engine. Performance on real production transaction data will differ.
The Isolation Forest is evaluated in backtest_flow.py but its score is not yet fused into the API fast path. The ensemble decision is driven solely by the XGBoost score.
Metrics reported by the training pipeline are on synthetic simulation data. They should be used for relative comparison (before/after a rule change), not as absolute production performance claims.

If data/models/xgb_fraud.joblib does not exist (e.g. before running make train):

load_model() returns a MockModel with a RuntimeWarning
MockModel returns predict_proba = [0.98, 0.02] for all inputs (2% fraud probability)
SHAP computation is skipped (MockModel has no get_booster() method)
Rule-based decisions still function normally

Run make train to build the production model.