Abstract
Computational linguistic phylogenetics has so far relied heavily on cognate data. In contrast, the potential of morphosyntactic characters as a valuable source for phylogenetic analysis has been largely overlooked. We argue that morphosyntactic characters may conflate historical signal with the results of homoplasies, horizontal transfer, and universal tendencies, and must be scrutinized in terms of their propensity to change and borrowing, analogously to the curation of lexical data which produced the Swadesh lists. In this paper we make a start by evaluating a set of morphosyntactic characters based on the World Atlas of Language Structures using three methods: we (1) calculated Pearson correlation coefficients for each character against different language groupings, reflecting either shared ancestry (genera) or contact (geographical proximity); (2) counted the minimum number of mutations needed for the distribution of a character's states on a cognate-based reference tree (parsimony score), testing whether they correctly reflect language change known from historical linguistics; and (3) ran a classic hill-climbing algorithm to determine which random subsets of characters produced a phylogeny closest to a reference tree. We conclude that these are useful tools, but expect that making the definitions of the characters more theoretically informed will produce a stronger historical signal.