Abstract
Heterogeneous graphs are widely employed in applications such as social networks, recommendation systems, and bioinformatics. However, node attributes in real-world heterogeneous graphs are often missing or corrupted, which substantially degrades representation quality and downstream task performance. Existing approaches typically rely on deterministic imputation or static masking schemes, limiting their ability to model the uncertainty induced by attribute missingness and the complex multi-relational dependencies present in real-world heterogeneous graphs. To address these challenges, we propose HGGAE (Heterogeneous Graph Generative Autoencoder), a generative autoencoder framework based on a perturbation-recovery paradigm for heterogeneous graphs with incomplete attributes. HGGAE explicitly models attribute missingness as a controllable perturbation process, and performs progressive attribute restoration and representation learning through the joint design of a schedulable noise generator and relation-specific structural perturbation modules. Unlike traditional masking-based methods, HGGAE adaptively adjusts perturbation intensity during training, enabling more effective modeling of the stochastic nature of attribute degradation. To improve training efficiency, HGGAE adopts a sparse-target objective and a local reconstruction design, which reduce the supervision and gradient-accumulation cost of attribute reconstruction, while the overall computation remains dominated by full-graph message passing in the encoder. Experiments on four benchmark heterogeneous graph datasets demonstrate that HGGAE achieves overall strong and competitive performance on node classification, achieving up to 7.8% Macro-F1 and 8.5% Micro-F1 gains on IMDB, while delivering competitive or superior performance on Yelp, ACM, and DBLP. These results validate the effectiveness, robustness, and generalization capability of HGGAE under attribute-missing scenarios.