Abstract
Amazon's Mechanical Turk (AMT) is a Web application that provides instant access to thousands of potential participants for survey-based psychology experiments, such as the acceptability judgment task used extensively in syntactic theory. Because AMT is a Web-based system, syntacticians may worry that the move out of the experimenter-controlled environment of the laboratory and onto the user-controlled environment of AMT could adversely affect the quality of the judgment data collected. This article reports a quantitative comparison of two identical acceptability judgment experiments, each with 176 participants (352 total): one conducted in the laboratory, and one conducted on AMT. Crucial indicators of data quality--such as participant rejection rates, statistical power, and the shape of the distributions of the judgments for each sentence type--are compared between the two samples. The results suggest that aside from slightly higher participant rejection rates, AMT data are almost indistinguishable from laboratory data.