Abstract
Differential gene expression (DGE) analysis enables researchers to investigate the link between gene expression and the phenotypic responses observed in organisms across time, experimental, or field conditions. Accurate quantification of gene expression is essential when performing DGE experiments, with a range of methods having been developed to enable the study of gene expression within a species. Quantifying differences in expression not just within but across multiple species can also be used to reveal the genetic mechanisms underlying phenotypic differences observed between species. Accurate quantification of gene expression across multiple species requires a suitable reference; it should include each species' own expressed transcripts to mitigate reference bias, with the orthology relationships of transcripts being used to facilitate comparison of expression at the gene level. Production of such a reference remains a challenge, despite its necessity for minimising bias during multispecies DGE analysis. Our software BINge specifically aims to address this need through use of a novel approach to modelling orthology which results in multispecies transcript clusters that accurately reflect their locus orthology. Evaluation experiments demonstrate the effectiveness of this approach over existing clustering methods which have not been designed for producing a reference suitable for multispecies DGE analysis. Source code and documentation for BINge are available from the GitHub repository at https://github.com/zkstewart/BINge.