Interrater Reliability of mHealth App Rating Measures: Analysis of Top Depression and Smoking Cessation Apps

移动健康应用评分指标的评分者间信度：对热门抑郁症和戒烟应用的分析

阅读：2

作者：Powell,Adam C,Torous,John,Chan,Steven,Raynor,Geoffrey Stephen,Shwarts,Erik,Shanahan,Meghan,Landman,Adam B

期刊：	Jmir Mhealth and Uhealth	影响因子：	6.200
时间：	2016	起止号：	2016 Feb 10;4(1):e15
doi：	10.2196/mhealth.5176	研究方向：	神经科学
疾病类型：	抑郁症

Abstract

BACKGROUND: There are over 165,000 mHealth apps currently available to patients, but few have undergone an external quality review. Furthermore, no standardized review method exists, and little has been done to examine the consistency of the evaluation systems themselves. OBJECTIVE: We sought to determine which measures for evaluating the quality of mHealth apps have the greatest interrater reliability. METHODS: We identified 22 measures for evaluating the quality of apps from the literature. A panel of 6 reviewers reviewed the top 10 depression apps and 10 smoking cessation apps from the Apple iTunes App Store on these measures. Krippendorff's alpha was calculated for each of the measures and reported by app category and in aggregate. RESULTS: The measure for interactiveness and feedback was found to have the greatest overall interrater reliability (alpha=.69). Presence of password protection (alpha=.65), whether the app was uploaded by a health care agency (alpha=.63), the number of consumer ratings (alpha=.59), and several other measures had moderate interrater reliability (alphas>.5). There was the least agreement over whether apps had errors or performance issues (alpha=.15), stated advertising policies (alpha=.16), and were easy to use (alpha=.18). There were substantial differences in the interrater reliabilities of a number of measures when they were applied to depression versus smoking apps. CONCLUSIONS: We found wide variation in the interrater reliability of measures used to evaluate apps, and some measures are more robust across categories of apps than others. The measures with the highest degree of interrater reliability tended to be those that involved the least rater discretion. Clinical quality measures such as effectiveness, ease of use, and performance had relatively poor interrater reliability. Subsequent research is needed to determine consistent means for evaluating the performance of apps. Patients and clinicians should consider conducting their own assessments of apps, in conjunction with evaluating information from reviews.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。