Abstract
Current document understanding methods struggle with complex layouts and fail to grasp the deep logical connections between elements like text, figures, and tables. To address this, we introduce the Document Relationship Entity Embedding Learner (D-REEL). This is a novel representation learning framework designed to model intricate semantic relationships within documents. D-REEL works by generating extraction candidates for each article. It then learns dense vector representations (embeddings) for these candidates. By comparing these embeddings, the system accurately assesses semantic correlations between document fields. This allows it to effectively determine if articles are related, regardless of their position on the page. This approach uniquely combines spatial information with domain specific schemas. This enables precise extraction and robust correlation scoring, even across diverse and irregular document layouts. To quantify these connections, we also propose the Semantic Structural Congruence (SSC). This new metric uses location agnostic localization to measure relationships effectively. Experiments on public datasets show significant improvements in correlation accuracy and extraction performance. We achieved an average mAP increment of 2-3% and SSC improvement of almost 10% for the PRIMA dataset.