Abstract
Funding agencies rely on panel or consensus meetings to summarise individual evaluations of grant proposals into a final ranking. However, previous research has shown inconsistency in decisions and inefficiency of consensus meetings. Using data from the Marie Skłodowska-Curie Actions, we aimed at investigating the differences between an algorithmic approach to summarise the information from grant proposal individual evaluations to decisions after consensus meetings, and we present an exploratory comparative analysis. The algorithmic approach employed was a Bayesian hierarchical model resulting in a Bayesian ranking of the proposals using the individual evaluation reports cast prior to the consensus meeting. Parameters from the Bayesian hierarchical model and the subsequent ranking were compared to the scores, ranking and decisions established in the consensus meeting reports. The results from the evaluation of 1,006 proposals submitted to three panels (Life Science, Mathematics, Social Sciences and Humanities) in two call years (2015 and 2019) were investigated in detail. Overall, we found large discrepancies between the consensus reports and the scores a Bayesian hierarchical model would have predicted. The discrepancies were less pronounced when the scores were aggregated into funding rankings or decisions. The best agreement between the final funding ranking can be observed in the case of funding schemes with very low success rates. While we set out to understand if algorithmic approaches, with the aim of summarising individual evaluation scores, could replace consensus meetings, we concluded that currently individual scores assigned prior to the consensus meetings are not useful to predict the final funding outcomes of the proposals. Following our results, we would suggest to use individual evaluations for a triage and subsequently not discuss the weakest proposals in panel or consensus meetings. This would allow a more nuanced evaluation of a smaller set of proposals and help minimise the uncertainty and biases when allocating funding.