Abstract
BACKGROUND: Life course factors play an important role in the multifactorial etiology of obesity, yet quantitative analysis of patient-originated, unstructured weight gain cause narratives remains a challenge. This study automated the thematic labeling of such narratives with a large language model to assess the clinical relevance of patient-reported weight gain cause data in weight loss prediction and patient phenotyping. SUBJECTS AND METHODS: A total of 2,463 patients with overweight or obesity shared open weight gain cause narratives prior to starting a multidisciplinary medical-nutritional weight loss treatment, followed until reaching a pre-defined weight loss target or dropout. Narratives were labeled using 12 thematic categories in a GPT4.1 large language model. Associations of reported causal themes with age, sex, BMI class and treatment outcomes were evaluated using group-wise statistical comparisons and a Random Forest classifier. Weight gain cause co-occurrence patterns were modeled with a direct association network and pairwise risk ratio analyses. A partitional unsupervised clustering model integrating age, sex, baseline BMI and weight gain cause themes was designed to elucidate patient phenotypes defined by reported weight gain trajectories. Cluster-specific outcomes were compared using descriptive tests and linear mixed models. RESULTS: Mean weight loss was 9.2 ± 6.8% over 108.6 ± 111.6 days. Automated weight gain narrative categorization achieved precision and recall of 0.906 and 0.897 against a reference sample. Reported weight gain causes were associated with age and sex but not BMI class. Associations between attributed causes and treatment outcomes were moderate, while between individual causes, strong associations were found. Disrupted schedules, mental health and external circumstances increased the risk ratio of unhealthy eating habits [3.65 (2.63-5.65), 2.16 (1.89-2.48), 1.51 (1.25-1.81) respectively], while medical issues and external circumstances increased physical inactivity risk [1.58 (1.31-1.90), 1.49 (1.23-1.82)]. Based on weight gain cause reports, age, sex and BMI class, seven clusters were identified with different demographic, clinical, treatment outcome and adherence characteristics. CONCLUSION: Patient-reported weight gain narrative analysis can be accurately automated using large language models, providing clinically relevant insights into obesity heterogeneity. While individual causes show modest associations with weight loss, their combined patterns allow the identification of distinct behavioral phenotypes with differential treatment responses. Integrating patient narratives into data-driven frameworks supports a more precise, person-centered obesity management.