Overview
A complete credit risk scorecard pipeline built on Lending Club loan data (380K+ records) implementing the industry-standard WOE/IV methodology used in consumer lending. The system engineers features, trains a regularized model, addresses selection bias through reject inference, and produces a deployable credit score mapping.
Pipeline
1. Data Preparation — Filters to terminal loan statuses, creates binary target (Good/Bad), and computes Information Value (IV) to rank feature predictive power.
2. Feature Engineering — Transforms all raw features into Weight of Evidence (WOE) values using parallelized computation, placing all features on a common predictive scale.
3. Model Training — Elastic net logistic regression via glmnet with 3-fold cross-validation optimized on ROC AUC. L1 regularization performs automatic feature selection.
4. Reject Inference — Applies the model to rejected applicants (who have no observed outcome), addressing the selection bias inherent in lending models that only observe approved loans.
5. Scorecard Scaling — Maps predicted probabilities to a 100–1000 credit score via log-odds linear transformation, where higher scores indicate lower risk.
Tech Stack
- Language: R
- Modeling:
caret,glmnet(elastic net),pROC - Feature Engineering:
Information(WOE/IV) - Parallelization:
parallel,doParallel - Visualization:
ggplot2 - Data Source: Lending Club open dataset