Overview

A complete credit risk scorecard pipeline built on Lending Club loan data (380K+ records) implementing the industry-standard WOE/IV methodology used in consumer lending. The system engineers features, trains a regularized model, addresses selection bias through reject inference, and produces a deployable credit score mapping.

Pipeline

1. Data Preparation — Filters to terminal loan statuses, creates binary target (Good/Bad), and computes Information Value (IV) to rank feature predictive power.

2. Feature Engineering — Transforms all raw features into Weight of Evidence (WOE) values using parallelized computation, placing all features on a common predictive scale.

3. Model Training — Elastic net logistic regression via glmnet with 3-fold cross-validation optimized on ROC AUC. L1 regularization performs automatic feature selection.

4. Reject Inference — Applies the model to rejected applicants (who have no observed outcome), addressing the selection bias inherent in lending models that only observe approved loans.

5. Scorecard Scaling — Maps predicted probabilities to a 100–1000 credit score via log-odds linear transformation, where higher scores indicate lower risk.

Tech Stack

View on GitHub