Abstract
Orthognathic surgery corrects craniomaxillofacial deformities by repositioning skeletal structures to improve facial aesthetics and function. Conventional orthognathic surgical planning is largely bone-driven, where bone repositioning is first defined and soft-tissue outcomes are predicted. However, this is limited by its reliance on surgeon-defined bone plans and the inability to directly optimize for patient-specific aesthetic outcomes. To address these limitations, the soft-tissue-driven paradigm seeks to first predict a patient-specific optimal facial appearance and subsequently derive the skeletal changes required to achieve it. In this work, we introduce FAPOS (Facial Appearance Prediction for Orthognathic Surgery), a novel transformer-based latent diffusion framework that directly predicts a normal-looking 3D facial outcome from pre-operative scans to allow soft-tissue driven planning. FAPOS utilizes a dense 282-landmark representation and is trained on a combined dataset of 44,602 public 3D faces, overcoming limitations of data scarcity, lack of correspondence. Our three-phase training pipeline combines geometric encoding, latent diffusion modeling, and patient-specific conditioning. Quantitative and qualitative results show that FAPOS outperforms prior methods with improved facial symmetry and identity preservation. These results mark an important step toward enabling soft-tissue-driven surgical planning, with FAPOS providing an optimal facial target that serves as the basis for estimating the skeletal adjustments in subsequent stages.