Abstract
Background/Objectives: Trichoscopy is an important diagnostic tool for hair and scalp disorders, but it requires significant expertise. Publicly available large language models (LLMs) are becoming more popular among both physicians and patients, yet their usefulness in trichology is unknown. We aimed to evaluate the diagnostic accuracy of four publicly available LLMs when interpreting trichoscopic images, as well as to compare their performance with that of dermatology residents, board-certified dermatologists, and trichology experts. Method: In this prospective comparative study, a preprocessed set of trichoscopic images was assessed in an online image-based survey. To reduced recognition bias from public image repositories, all images were structurally transformed while preserving diagnostic features. Fifteen dermatologists (five residents, four board-certified dermatologists, six trichology experts) provided a suspected diagnosis (SD), and up to three the differential diagnoses (DD). Four LLMs (ChatGPT-4o, Claude Sonnet 4, Gemini 2.5 Flash, and Grok-3) evaluated the images under the same conditions. Results: The overall diagnostic accuracy among 15 dermatologists was 58.1% (95% CI, 53.0-63.0) for SD and 68.3% (95% CI, 63.4-72.8) for SD + DD. Experts significantly outperformed residents and board-certified dermatologists. AI models achieved an accuracy of 18.2% (95% CI, 11.8-26.9) for SD and 44.4% (95% CI, 35.0-54.3) for SD + DD. Gemini 2.5 Flash performed best, with an accuracy of 62.5% for SD + DD. Agreement among dermatologists increased with experience (AC1 up to 0.65 for experts), while agreement among AI models was moderate to good (AC1 up to 0.70). Agreement between AI models and dermatologists was only slight to fair (AC1 = 0.06 for SD and 0.21 for SD + DD). All human-AI differences were statistically significant (p < 0.001). Conclusions: In trichology, publicly available LLMs currently underperform compared to human experts, especially in providing a single correct diagnosis. These models require further development and specialized training before they can reliably assist with trichological diagnoses in routine care.