Abstract
Nanobody binding is largely governed by the HCDR3 loop, which adopts distinct placement regimes relative to the framework: compact, framework-contacting (kinked blueprint) and solvent-exposed (extended blueprint). Many nanobodies also contain additional cysteines that form non-canonical disulphide bonds, imposing covalent constraints on binding-loop conformations. Current structure predictors are typically trained and benchmarked with smooth coordinate-based objectives, so models may appear reasonable under root-mean-square deviation (RMSD), while adopting an incorrect HCDR3 blueprint or failing to recover the native disulphide connectivity, impacting paratope geometry and functional interpretation. Here, we show that the HCDR3 blueprint is predictable from sequence alone, allowing for explicit constraints during modelling. We implement these principles into NbForge, a lightweight nanobody folding model that incorporates blueprint- and disulphide-aware inductive biases and is trained with filtered self-distillation. NbForge improves recovery of HCDR3 blueprint and non-canonical disulphide formation over previous lightweight models and achieves coordinate accuracy at par to state-of-the-art, large, resource-intensive predictors, while running at sub-second inference speed. We show that using NbForge monomer models as templates further improves the success rate of predicting nanobody-antigen complexes. Together, these results motivate blueprint- and disulphide-aware benchmarks for nanobody modelling beyond RMSD, and show that appropriate inductive biases can close the performance gap to heavyweight predictors. We make the sequence classifier (NbFrame) and NbForge available for download and via a user-friendly web server.