Abstract
There is increasing support for reporting evidential strength as a likelihood ratio (LR) and increasing interest in (semi-)automated LR systems. The log-likelihood ratio cost (C(llr)) is a popular metric for such systems, penalizing misleading LRs further from 1 more. C(llr) = 0 indicates perfection while C(llr) = 1 indicates an uninformative system. However, beyond this, what constitutes a "good" C(llr) is unclear. Aiming to provide handles on when a C(llr) is "good", we studied 136 publications on (semi-)automated LR systems. Results show C(llr) use heavily depends on the field, e.g., being absent in DNA analysis. Despite more publications on automated LR systems over time, the proportion reporting C(llr) remains stable. Noticeably, C(llr) values lack clear patterns and depend on the area, analysis and dataset. As LR systems become more prevalent, comparing them becomes crucial. This is hampered by different studies using different datasets. We advocate using public benchmark datasets to advance the field.