Abstract
Empirical studies often have to work with incomplete samples, with scholars rarely accounting for under-registration: in cultural heritage e.g. the age-long loss of artefacts can yield an under-estimation of the original richness of assemblages. Recently, it has been argued that unseen species models from ecology can estimate the unobserved diversity in cultural collections. We report an extension on shared diversity, i.e. the number of types that are common to two assemblages. As a case study, we use stories in medieval French and Dutch (ca. 1150-1450), which were frequently shared. We apply an established estimator (Chao-shared) with a novel bootstrap procedure. The estimator suggests that the surviving data underestimate the original number of shared stories: for example, when its source is no longer extant, a translation can no longer be identified as such. Interestingly, there is less evidence for the total loss of shared stories: precisely because of the redundancy caused by inter-vernacular translation, shared stories were less likely to be lost in both languages simultaneously. These results go beyond previous studies in that they provide more insight into the composition of the unobserved share of cultural diversity (instead of its mere size).