Abstract
Giant viruses are distinguished not only by their large particle size, but also by their extensive genomes, often reaching megabase levels. Many sequences within these genomes are considered to have been introduced by hosts, surrounding organisms, or other viruses. Since the natural hosts of many giant viruses remain unidentified, analyzing sequences potentially derived from other organisms may aid in clarifying their hosts. In the present study, we identified eukaryote-homologous sequences by isolating those not shared among viruses, an aspect previously overlooked. Our primary focus was on pandoravirus, which, with a genome size of ~2 Mb, is the largest among giant viruses. We obtained 375 BLAST hits with an average sequence identity of ~90%. Among the 102 detected species, those with higher hits included Mus musculus, Lampetra planeri, Melanogrammus aeglefinus, Lampetra fluviatilis, Scylla paramamosain, Cardiocondyla obscurior, Monodelphis domestica, Vespula pensylvanica, Micromonas pusilla, Physcomitrium patens, and Peromyscus californicus. Similar anal-yses of Cedratvirus and Pithovirus, which share an amphora-shaped particle structure with pandoraviruses, yielded fewer data (48 and 5 hits, respectively), with no common taxa at the order level. Thirteen BLAST hits exceeded 100 bp, including conserved non-coding elements (CNEs) in fish and other taxa, along with sequences of unknown functions. These results indicate the presence of short regions with sequence similarity in non-shared sequences, although direct host identification proved difficult.