From Concept to Representation: Modeling Driving Capability and Task Demand with a Multimodal Large Language Model

从概念到表征:利用多模态大型语言模型对驾驶能力和任务需求进行建模

阅读:1

Abstract

Driving safety hinges on the dynamic interplay between task demand and driving capability, yet these concepts lack a unified, quantifiable formulation. In this work, we present a framework based on a multimodal large language model that transforms heterogeneous driving signals-scene images, maneuver descriptions, control inputs, and surrounding traffic states-into low-dimensional embeddings of task demand and driving capability. By projecting both embeddings into a shared latent space, the framework yields an interpretable measurement of task difficulty that alerts to capability shortfalls before unsafe behavior arises. Built upon a customized BLIP 2 backbone and fine-tuned on diverse simulated driving scenarios, the model respects consistency within tasks, captures impairment-related capability degradation, and can transfer to real-world motorway data without additional training. These findings endorse the framework as a concise yet effective step toward proactive, explainable risk assessment in intelligent vehicles.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。