Abstract
Clinical prediction models are statistical models or machine learning algorithms that combine information on a set of predictor variables about an individual to estimate their risk of a given clinical outcome. It is crucial to ensure that the sample size of the data used to develop or validate a clinical prediction model is large enough. If the data are inadequate, developed models can be unstable and estimates of predictive performance imprecise. This can lead to models that are unfit or even harmful for clinical practice. Recently, there have been a series of sample size formulae developed to estimate the minimum required sample size for prediction model development or external validation. The aim of this statistical primer is to provide an overview of these criteria, describe what information is required to make the calculations and illustrate their implementation through worked examples. The software that is available to implement the sample size criteria is reviewed, and code is provided for all the worked examples.