A diachronic study determining syntactic and semantic features of Urdu-English neural machine translation

一项历时性研究,旨在确定乌尔都语-英语神经机器翻译的句法和语义特征

阅读:1

Abstract

Machine translation produces marginal accuracy rates for low-resource languages, but its deep learning model expects to yield improved accuracy with time. This longitudinal study investigates how Google Translate's Urdu-to-English translated output has evolved between 2018 and 2021. Accuracy and acceptability of the translations have been determined by, a) an interlinear gloss that identifies core semantic units and grammatical functions to be translated and, b) a descriptive comparison of the translated text's syntactic and semantic properties with those of the source text. Overall, despite a 50 % error rate that persists over the three-year interval, the research reports significant improvement in the overall intelligibility of the translations, in contrast to initial results from 2018, which exhibited rampant non-localized errors. Working backwards from instances of errors to morphosyntactic and semantic patterns underlying them, the study concludes that the pro-drop feature of Urdu, Urdu's case-marking system, identification of clause boundaries, polysemous terms, and orthographically similar words pose the greatest difficulty in neural machine translation. These results point to the need for incorporating syntactic information in training data.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。