Abstract
Nowadays, Internet of Things (IoT) application systems play an essential role in smart cities, industry, healthcare, agriculture, and smart homes. For non-expert users, designing and implementing IoT application systems remains challenging, especially when configuring sensors, edge devices, and server platforms. To support configuration tasks of IoT application systems, we have developed an AI-based setup assistance tool. However, AI models still fail to reliably support newly released or previously unseen devices, sometimes producing incomplete or erroneous outputs that may lead to configuration failures. Incorporating their technical-document information into Retrieval-Augmented Generation (RAG) is an effective way to supplement AI knowledge and improve reliability. In this paper, we propose a generative AI-based technical data extraction tool to address the challenges. It extracts essential technical information using the schema-based extraction from given PDF or HTML datasheets and converts it into a structured format suitable for AI-supported configurations. A local vector database is used to enable semantic similarity retrieval and provide document-grounded evidence for RAG-based answering, ensuring consistent support for previously unseen IoT devices. For evaluations, we applied the proposal to several sensor and device datasheets and compared extracted specifications with ground-truth values to measure accuracy and completeness. Then, we compared end-to-end configuration QA reliability against a commercial baseline (ChatPDF) using the golden benchmark. The results show that the proposed tool reliably acquires key specifications and significantly improves end-to-end configuration QA reliability. Across 960 golden QA pairs, the proposed method improves Recall from 0.636 to 0.926 and Accuracy from 0.595 to 0.807 compared with ChatPDF.