AI-assisted patient matching for personalized cancer medicine

人工智能辅助患者匹配，实现个性化癌症治疗

阅读：1

作者：Lin,Yupei,Mekala,Venugopalareddy,Li,Jianrong,Wang,Xiang,Aminu,Muhammad,Wu,Jia,Zhang,Jianjun,Ripley,Robert Taylor,Amos,Christopher I,Cheng,Chao,Eder,Joseph Paul,Macdonald,Susan,Webster,Irena,Xu,Wenxin (Vincent),Xie,Wanling,Liu,Fuguo,Birch,Grace,Welch,Casey,Nguyen,Maily,Stanojevic,Mila,Viswanathan,Srinivas,McGregor,Bradley A,Ritz,Jerome,Braun,David,Nikiforow,Sarah,Romee,Rizwan,Choueri,Toni,Saxena,Sumedha,Ma,Edmond S K,El Helali,Aya,Shih,David J H

期刊：		影响因子：
时间：	2025	起止号：	2025 Dec;26(Suppl 1):i12-4
doi：	10.1101/2025.07.29.664792	研究方向：	肿瘤

Abstract

BACKGROUND: Advanced sequencing techniques have facili- tated the implementation of cancer precision medicine, in which treatments are targeted against specific genomic alter- ations. To advance the development of precision oncology drugs, it is important to improve the participation of can- cer patients in clinical trials. In this context, the Molecular Tumor Board (MTB) plays a crucial role in facilitating the matching of patients with appropriate clinical trials [1], but some challenges remain unresolved. Trials can have complex eligibility criteria that are often presented in an unstructured format, making it difficult to extract key information for checking patient eligibility. Similarly, patient data are often in an unstructured format such as free text, tabular data, and images. Disease diagnoses in clinical notes also do not use standardized terminologies. These technical challenges complicate patient-trial matching. The primary objective of this project is to match cancer patients in Hong Kong to the clinical trials in Hong Kong and nearby regions, with better accuracy and efficiency. To achieve this, we seek to harmonize patient data from multiple sources and formats and convert them to a structured format. Additionally, we aim to harmonize clinical trials and their eligibility criteria from different regions, facilitating a more efficient patient-trial matching process, which aligns with the recent regional consensus on precision oncology practices [2]. Methodology: DEPLOYMENT OF THE MATCHMINER APPLICATION: We set up MatchMiner, an open-source platform developed by Dana- Farber Cancer Institute, to match patient-specific genomic and clinical data with trial eligibility criteria [3]. It employs MatchEngine to match patient-trial data and produce patient specific trial options. Fig. 1 shows the overview of the Match- Miner system. CUSTOMIZATION OF MATCHMINER FOR THE LOCAL CONTEXT: We extended MatchMiner’s matching platform to include additional clinical criteria, including HER2, ER, PR, and PD- L1 status. To standardize diagnosis, we use the stable release 2025(_)04(_)08 of the OncoTree ontology [4]. We further refined the variant classification for genomic criteria matching. DATA EXTRACTION WITH LLMS: To convert patient data and trial eligibility criteria into a structured format, we evaluated multiple large language models (LLMs) for data extraction tasks. Key tasks include mapping diagnoses to OncoTree terms and extracting biomarkers and genomic criteria from the unstructured text in trial eligibility criteria. As input, we pulled 81 trials from ClinicalTrials.gov and prepared ground truths for the tasks mentioned above. We engineered prompts for all tasks and tested them with multiple LLMs, comparing the results based on timing and correctness against ground truth. Based on results, DeepSeek-R1-Distill-Qwen- 32B-quantized.w4a16 was selected for its superior accuracy, and used subsequently for extracting information from raw data. Fig. 2 summarizes the tested models and associated tasks. WEB INTERFACE FOR PATIENT DATA ENTRY: To capture patient data, we developed a web-based user interface application (matchminer-patient) that facilitates the input and conversion of the following data: Clinical data: Diagnosis and patient history in free-form text, which will be processed by LLMs to identify an OncoTree diagnostic term and extract disease character- istics. Genomic data: Clinical sequencing results in the form of screenshot images. We use SuryaOCR to extract text from each image, which is then processed by LLMs to identify actionable genes and genetic alterations. Images are deleted immediately to ensure that no protected health information (PHI) is stored. The output is structured patient data compatible with MatchMiner. HARMONIZATION OF CLINICAL TRIAL DATA: To harmonize clinical trial data, we developed nct2ctml, an application that retrieves data from ClinicalTrials.gov and other international registries, facilitating local synchronization of eligibility cri- teria and status in our database. Fig. 3 shows the workflow for processing trial documents from ClinicalTrials.gov and converting to CTML format. The process involves: 1) Data extraction: Extract key trial details such as free- text diagnoses, eligibility criteria in terms of clinical and genomic features. 2) LLM-based processing: Use a locally hosted LLM to convert unstructured trial data into structured formats by mapping free-text diagnoses to OncoTree terms, extracting biomarkers like HER2, ER, PR, and PD-L1 with their values from eligibility criteria, and identifying inclusion and exclusion criteria for gene alterations using a curated COSMIC gene list. The output of nct2ctml is a Clinical Trial Markup Language (CTML) document containing metadata such as trial status, principal investigator, trial arms, drugs along with structured clinical and genomic criteria of the trial. SYNCHRONIZATION OF LOCAL AND INTERNATIONAL TRIAL REG- ISTRIES: To keep our local database synchronized, we continu- ously aggregate trial data from local sources like HKU Clinical Oncology, Medical Oncology, HKU Clinical Trial Registry, and CUHK’s Comprehensive Cancer Trial Unit—as well as from international repositories like ClinicalTrials.gov (US), the EU Clinical Trials Register, and national registries in South Korea, Singapore, and Taiwan. A key challenge is that local trial registries usually do not include international identifiers, which necessitates manual lookup and curation to match local trials to international trial registries. Our nct2ctml application also detects duplicates and merges local and international trial data in the database. RESULTS: We harmonized data for 124 cancer clinical trials in Hong Kong and nearby regions, converting them into CTML format for MatchMiner. AI-assisted CTML creation required manual intervention in approximately 11% of trials. Patient data from multiple sources — including sequencing reports from multiple vendors and free-form clinical notes — were integrated and converted to PHI-free structured data. AI- assisted generation of structured patient data required manual intervention in about 30% of cases. With our aggregated clinical trial database, MatchMiner generates 7–10 trial matches on average per patient. Beginning in Q1 2025, these trial matches are being reviewed at monthly MTB meetings to support personalized treatment planning. CONCLUSION: We developed a suite of AI-enabled software tools to collect unstructured patient and clinical trial data and convert them into structured formats, enabling effective matching of cancer patients to relevant clinical trials. This software suite supports multidisciplinary MTBs by providing actionable insights for personalized treatment planning and clinical trial enrollment. CODE AVAILABILITY: Source code for this project is available at-: https://github.com/sumedhasaxena/matchminer-setup. https://github.com/sumedhasaxena/nct2ctml. https://github.com/sumedhasaxena/matchminer-patient. ACKNOWLEDGMENT: This work was supported by the Health and Medical Research Fund, the Health Bureau, The Govern- ment of the Hong Kong Special Administrative Region (project 11,222,156). REFERENCES: 1. El Helali A, Lam TC, Ko EY, Shih DJH, et al. ‘The impact of the multi- disciplinary molecular tumour board and integrative next generation sequencing on clinical outcomes in advanced solid tumours.’ Lancet Reg Health West Pac. 2023;36:100775. 2. Lam TC, Cho WC, Au JS, Ma ES, Lam ST, et al.; Precision Oncology Working Group (POWG). ‘Consensus statements on precision oncology in the China Greater Bay Area.’ JCO Precis Oncol. 2023;7:e2200649. 3. Klein H, Mazor T, Siegel E, et al. ‘MatchMiner: an open-source platform for cancer precision medicine.’ npj Precis Onc. 2022;6:69. 4. Smyth LM, et al. ‘OncoTree: a cancer classification system for precision oncology.’ JCO Clin Cancer Inform. 2021;5:221–230.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。

肿瘤免疫

炎症

T细胞

线粒体

凋亡

转录调控

巨噬细胞

自噬

传染病

氧化应激

肠道菌群

磷酸化

血管生成

囊泡

3D/类器官

单细胞

中性粒细胞

外泌体

DNA甲基化

miRNA

药物研究

铁死亡

细胞衰老

乙酰化

缺氧低氧

泛素化

树突状细胞

炎性小体

组蛋白修饰

肿瘤微环境

lncRNA

代谢重编程

焦亡

m6A/m5C/m7G

内质网应激

空间多组学

细胞基因治疗

治疗耐药

相分离

Treg

上皮间质转化

免疫代谢

染色质重塑

脂质过氧化

蛋白质稳态

脂代谢

细胞极性

铁代谢

氨基酸代谢

碱基编辑

cGAS-STING

肠脑轴

蛋白降解

乳酸化

翻译调控

circRNA

piRNA

肿瘤异质性

NK 细胞

氧化脂质

MDSC

NETosis

低氧缺氧

溶酶体功能

琥珀酰化

细胞干性

CAR-NK

冷应激

RNA 编辑

Tfh

巴豆酰化

器官芯片

表观遗传记忆

铜死亡

器官纤维化

线粒体未折叠蛋白反应

空间代谢组

程序性坏死

自噬流

MAIT 细胞

肠肝轴

丙酰化