Learning to Defer (L2D) algorithms improve human-AI collaboration by deferring decisions to human experts when they are likely to be more accurate than the AI model. These can be crucial in high-stakes tasks like fraud detection, where false negatives can cost victims their life savings. The primary challenge in training and evaluating these systems is the high cost of acquiring expert predictions, often leading to the use of simplistic simulated expert behavior in benchmarks. We introduce OpenL2D, a framework generating synthetic experts with adjustable decision-making processes and work capacity constraints for more realistic L2D testing. Applied to a public fraud detection dataset, OpenL2D creates the financial fraud alert review dataset (FiFAR), which contains predictions from 50 fraud analysts for 30âK instances. We show that FiFAR's synthetic experts are similar to real experts in metrics such as consistency and inter-expert agreement. Our L2D benchmark reveals that performance rankings of L2D algorithms vary significantly based on the available experts, highlighting the need to consider diverse expert behavior in L2D benchmarking.
A benchmarking framework and dataset for learning to defer in human-AI decision-making.
阅读:4
作者:Alves Jean V, Leitão Diogo, Jesus Sérgio, Sampaio Marco O P, Liébana Javier, Saleiro Pedro, Figueiredo Mário A T, Bizarro Pedro
| 期刊: | Scientific Data | 影响因子: | 6.900 |
| 时间: | 2025 | 起止号: | 2025 Apr 23; 12(1):506 |
| doi: | 10.1038/s41597-025-04664-y | ||
特别声明
1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。
2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。
3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。
4、投稿及合作请联系:info@biocloudy.com。
