Abstract
Plasma cell-free RNA (cfRNA) metagenomics is increasingly explored for blood-based pathogen detection, but the structure of the shared background "blood microbiome", the reproducibility of reported signals, and the practical limits of this approach remain unclear. We performed a critical re-analysis and benchmarking ("stress test") of host-filtered blood RNA sequencing data from two cohorts: a bacteriologically confirmed tuberculosis (TB) cohort (n = 51) previously used only to derive host cfRNA signatures, and a coronary artery disease (CAD) cohort (n = 16) previously reported to show a CAD-shifted "blood microbiome" enriched for periodontal taxa. Both datasets were processed with a unified pipeline combining stringent human read removal and taxonomic profiling using the latest versions of specialized tools Kraken2 and MetaPhlAn4. Across both cohorts, only a minority of non-host reads were classifiable; under strict host filtering, classified non-host reads comprised 7.3% (5.0-12.0%) in CAD and 21.8% (5.4-31.5%) in TB, still representing only a small fraction of total cfRNA. Classified non-host communities were dominated by recurrent, low-abundance taxa from skin, oral, and environmental lineages, forming a largely shared, low-complexity background in both TB and CAD. Background-derived bacterial signatures showed only modest separation between disease and control groups, with wide intra-group variability. Mycobacterium tuberculosis-assigned reads were detectable in many TB-positive samples but accounted for ≤0.001% of total cfRNA and occurred at similar orders of magnitude in a subset of TB-negative samples, precluding robust discrimination. Phylogeny-aware visualization confirmed that visually "enriched" taxa in TB-positive plasma arose mainly from background-associated clades rather than a distinct pathogen-specific cluster. Collectively, these findings provide a quantitative benchmark of the background-dominated regime and practical limits of plasma cfRNA metagenomics for pathogen detection, highlighting that practical performance is constrained more by a shared, low-complexity background and sparse pathogen-derived fragments than by large disease-specific shifts, underscoring the need for transparent host filtering, explicit background modeling, and integration with targeted or orthogonal assays.