Abstract
Comparative biology seeks to unlock the power of cross-species trait variation to learn the rules of life. In this venture, modern studies increasingly leverage large datasets spanning many traits and levels of biological organization and complexity. To analyze these complex data in a statistically-sound manner, researchers must choose a phylogeny that is assumed to model the mean trait values across species-an assumption that may be tenuous depending on the true evolutionary architecture of the traits. Yet the consequences of this decision remain poorly understood, particularly for modern studies seeking to analyze multiple, distinct traits within the same framework. Here, we conduct a comprehensive simulation study to examine how tree choice impacts phylogenetic regression in large-scale analyses of many traits and species. We find that regression outcomes are highly sensitive to the assumed tree, sometimes yielding alarmingly high false positive rates as the number of traits and species increase together. Counterintuitively, adding more data exacerbates rather than mitigates this issue, highlighting the risks inherent for high-throughput analyses typical of modern comparative research. Experimental manipulations of tree topology in an empirical case study of gene expression and longevity traits further reveal extreme sensitivity to tree choice. While significant challenges remain in aligning traits with appropriate trees, we find compelling promise with robust estimators, which can mitigate the effects of tree misspecification under realistic evolutionary scenarios. Collectively, our findings underscore the critical need for careful tree selection in comparative studies while pointing to robust regression as a powerful tool for navigating phylogenetic uncertainty in modern evolutionary research.