Abstract
Financial regulation requires the submission of diverse and often highly granular data from financial institutions to regulators. In turn, regulators face the challenge of condensing this data into a comprehensive map that captures the mutual similarity or distance between different institutions and identifies clusters or outliers based on features like size, credit portfolio, or business model. Additionally, missing data due to varying regulatory requirements for different types of institutions, can further complicate this task. To address these challenges, we interpret the credit data of financial institutions as probability distributions whose respective distances can be assessed through optimal transport theory. Specifically, we propose a variant of Lloyd's algorithm that applies to probability distributions and uses generalized Wasserstein barycenters to construct a metric space. Our approach provides a solution for the mapping of the banking landscape, enabling regulators to identify clusters of financial institutions and assess their relative similarity or distance. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s11579-025-00394-2.