APBench and benchmarking large language model performance in fundamental astrodynamics problems for space engineering

APBench 和基准测试大型语言模型在空间工程基础天体动力学问题中的性能

阅读:4

Abstract

The problem-solving abilities of Large Language Models (LLMs) have become a major focus of research across various STEM fields, including mathematics and physics. Substantial progress has been made in both measuring and enhancing these abilities. Among the multitude of ways to advance space engineering research, one promising direction is the application of LLMs and foundational models in aiding in solving Ph.D. level research problems. To understand the full potential of LLMs in astrodynamics, we have developed the first Astrodynamics Problems Benchmark (APBench) to evaluate the capabilities of LLMs in this field. We crafted for the first time a collection of questions and answers that range a wide variety of subfields in astrodynamics, astronautics, and space engineering. On top of this first dataset built for space engineering, we evaluate the performance of foundational models, both open source models and closed ones, to validate their current capabilities in space engineering, and paving the road for further advancements towards a definition of intelligence for space.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。