Comprehensive credit scoring datasets for robust testing: Out-of-sample, out-of-time, and out-of-universe evaluation

用于稳健测试的综合信用评分数据集:样本外、时间外和总体外评估

阅读:1

Abstract

This data article curates datasets from Freddie Mac's Single-Family Loan-Level Dataset (SFLLD) quarterly snapshots. The SFLLD tracks loan originations in the USA along with the ensuing repayment trends. This live dataset undergoes quarterly updates. The current work is based on over 50 million fully amortized fixed-rate mortgage loans, which were initiated from 1999 through June 2022. Monthly performance metrics for these loans span from 1999 to September 30, 2022. Loan origination and repayment data were integrated using a unique loan ID, with defaults being identified when three payments were missed within specific performance windows (12-, 24-, 36-, 48-, and 60-months). To ensure rigorous model evaluation, only loans initiated post-2008 and their performance up to 2019 were considered, intentionally sidestepping external influences from the 2007 to 2008 financial crisis and the COVID-19 pandemic. The data was stratified by credit scores, leading to 10 folders with three distinct datasets for model training, out-of-sample testing, and out-of-time testing. We designed the out-of-time testing dataset to mimic real-life conditions as closely as possible. A unique "out-of-universe" test dataset was further constructed from 2019-originated loans, capturing their performance throughout the pandemic. In each dataset, there are 1464 covariates and a binary target label. With the release of these datasets, we hope to empower researchers to utilize common datasets, especially in the credit-scoring area, where access to proprietary datasets is limited.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。