Training large language models on narrow tasks can lead to broad misalignment
在狭窄的任务上训练大型语言模型可能会导致广泛的不匹配。
期刊:Nature
影响因子:48.5
doi:10.1038/s41586-025-09937-5
Betley, Jan; Warncke, Niels; Sztyber-Betley, Anna; Tan, Daniel; Bao, Xuchan; Soto, Martín; Srivastava, Megha; Labenz, Nathan; Evans, Owain