Abstract
In survey data some questions are asked only of a subset of applicable participants. This frequently occurs together with floor effects of the provided responses. For example, in the longitudinal Population Assessment of Tobacco and Health (PATH) survey, nicotine dependence is assessed only for a sub-sample of individuals at each occasion and when assessed often has value at the lower end of the scale. To capture trends over time in an unbiased and efficient way, it is important to jointly model the probabilities of being asked the questions of interest, of giving a response at the lower end of the scale and of the mean response when above the lower end of the scale. We propose a three-part model for such data which consists of two logistic sub-models and a truncated normal model. Correlations among repeated observations on the same individual are induced by random effects. Maximum likelihood estimation and inference is performed in SAS PROC NLMIXED. The PATH data on young adults are used for illustration. A simulation study investigates bias and efficiency of the three-part model compared to simpler models. The three-part model has much lower bias and better coverage probabilities for the regression coefficients than simpler models.