Abstract
Automated literature mining is key to building structured biomedical materials databases, yet current methods struggle with large publication volumes, complex entity relations and domain-specific terminology. We propose a hierarchical natural language processing (NLP) framework for extracting structured data from biomedical materials texts. Our pipeline uses named entity recognition (NER) to identify entities such as compositions, synthesis methods and properties. Sentence-level relation extraction captures direct associations (e.g. temperature, morphology), while a paragraph-level graph convolutional network (GCN) module resolves cross-sentence co-references. Rule-based templates enhance precision in specific cases. Extracted relations are integrated into a biomedical materials knowledge graph, enabling scalable and extensible data representation. Experiments show that the sentence-level model achieves 84.7% accuracy and the GCN-based module achieves 84.0%. This approach offers an efficient pipeline for structuring complex scientific texts, reducing manual effort and supporting large-scale knowledge extraction in biomedical materials and related domains.