Abstract
Creating generalizable models is a conserved aim in deep learning-however, misleading claims of transferability threaten to obfuscate reliable performance evaluation. We outline the severity of this issue in the biosciences, and suggest potential solutions.