Abstract
Single-view-based anomaly detection approaches present challenges due to the lack of context, particularly for multi-label problems. In this work, we demonstrate the efficacy of using multiview image data for improved classification using a hierarchical learning approach. Using 170,958 images from the International Mouse Phenotyping Consortium (IMPC) repository, a specimen-wise multiview dataset comprising 54,046 specimens was curated. Next, two hierarchical classification frameworks were developed by customizing ConvNeXT and a convolutional autoencoder (CAE) as CNN backbones, respectively. The customized architectures were trained at three hierarchy levels with increasing anatomical granularity, enabling specialized layers to learn progressively more detailed features. At the top level (L1), multiview (MV) classification performed about the same as single views, with a high mean AUC of 0.95. However, using MV images in the hierarchical model greatly improved classification at levels 2 and 3. The model showed consistently higher average AUC scores with MV compared to single views such as dorsoventral or lateral. For example, at Level 2 (L2), the model divided abnormal cases into three subclasses, achieving AUCs of 0.65 for DV, 0.76 for LV, and 0.87 for MV. Then, at Level 3 (L3), it further divided these into ten specific abnormalities, with AUCs of 0.54 for DV, 0.59 for LV, and 0.82 for MV. A similar performance was achieved by the CAE-driven architecture, with mean AUCs of 0.87, 0.88, and 0.89 at Level 2 (L2) and 0.74, 0.78, and 0.81 at Level 3 (L3), respectively, for DV, LV, and MV views. The overall results demonstrate the advantage of multiview image data coupled with hierarchical learning for skeletal abnormality detection in a multi-label context.