Abstract
BACKGROUND: Sensor-based digital health technologies (sDHTs) are increasingly used to support scientific and clinical decision-making. The digital measures (DMs) they generate offer significant potential to accelerate the drug development timeline, decrease clinical trial costs, and improve access to care. However, choosing an appropriate statistical methodology when conducting analytical validation (AV) of a DM is complicated, particularly for novel DMs, for which appropriate, established reference measures (RMs) may not exist. More understanding of, and a standardized approach to, AV in these scenarios is needed. OBJECTIVE: In a prior simulation study, 3 statistical methods were tested for their ability to estimate a simulated relationship between a sDHT-derived DM and several clinical outcome assessment (COA) RMs. The aim of this work was to assess the feasibility of implementation of these methods in real data and to examine the impact of AV study design factors on the relationships estimated. METHODS: Four real-world datasets, captured using sDHTs, were used to prepare hypothetical AV studies representing a range of scenarios with respect to 3 key study design properties: temporal coherence, construct coherence, and data completeness. The datasets analyzed were as follows: Urban Poor (comparing nighttime awakenings to measures of psychological well-being), STAGES (comparing daily step count to psychological and fatigue measures), mPower (comparing daily smartphone screen taps to measures of function in Parkinson’s disease), and Brighten (comparing smartphone communication activity to measures of psychological well-being). For each hypothetical AV study, 3 statistical methods were leveraged: the Pearson correlation coefficient (PCC) between DM and RM, simple linear regression (SLR) between DM and RM, multiple linear regression (MLR) between DMs and combinations of RMs, and 2-factor, correlated-factor confirmatory factor analysis (CFA) models. Performance measures were the PCC magnitudes (for PCC), R(2) and adjusted R(2) statistics (for SLR and MLR, respectively), and factor correlations (for CFA). RESULTS: Most of the CFA models exhibited an acceptable fit according to the majority of the fit statistics employed, and each model was able to estimate a factor correlation. For each model, these correlations were greater than or equal to the corresponding PCC in magnitude. Correlations were the strongest in the hypothetical studies with strong temporal and construct coherence. CONCLUSIONS: The performance of the selected statistical methods shown in this work supports their feasibility when implemented in real-world data. Our findings, in particular, support the use of CFA to assess the relationship between a novel DM and a COA RM. The observed impact of AV study design factors on the relationships estimated allowed the authors to determine practical recommendations for study design in AV of novel DMs. By using a standardized methodology for evaluating novel DMs, sDHT developers, biostatisticians, and clinical researchers can navigate the complex validation landscape more easily, with more certainty, and with more tools at their disposal.