Improved image reconstruction from brain activity through automatic image captioning

通过自动图像描述改进基于脑活动的图像重建

阅读:1

Abstract

Significant progress has been made in the field of image reconstruction using functional magnetic resonance imaging (fMRI). Certain investigations reconstructed images with visual information decoded from brain signals, yielding insufficient accuracy and quality. The combination of semantic information in the reconstruction was recommended to improve performance. However, this issue continues to come across numerous difficulties. To address such problems, we proposed an approach that combines semantically complex details with visual details for reconstruction. Our proposed method consists of two main modules: visual reconstruction and semantic reconstruction. In the visual reconstruction module, visual information is decoded from brain data using a decoder. This module employs a deep generator network (DGN) to produce images and utilizes a VGG19 network to extract visual features from the generated images. Image optimization is performed iteratively to minimize the error between features decoded from brain data and features extracted from the generated image. In the semantic reconstruction module, two models BLIP and LDM are employed. Using the BLIP model, we generate 10 captions for each training image. The semantic features extracted from the image captions, along with brain data obtained from training sessions, are used to train a decoder. The trained decoder is then utilized to decode semantic features from human brain activity. Finally, the reconstructed image from the visual reconstruction module is used as input to the LDM model, while the semantic features decoded from brain activity are provided as conditional input for semantic reconstruction. Including decoded semantic features improves reconstruction quality, as confirmed by our ablation study. Our strategy is superior both qualitatively and quantitatively to Shen et al.'s method, which utilizes a similar dataset. Our methodology achieved an accuracy of 0.812 and 0.815 for the inception and contrastive language-image pre-training (CLIP) metrics, respectively, which are excellent for the quantitative evaluation of semantic content. We achieved an accuracy of 0.328 in the structural similarity index measure (SSIM), indicating superior performance as a low-level metric. Moreover, our proposed approach for semantic reconstruction of artificial shapes and imagined images achieved acceptable success, attaining accuracies of 0.566 and 0.627 based on the CLIP metric, and 0.671 and 0.565 based on the SSIM metric, respectively.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。