A Comparative Analysis of Data Synthesis Techniques to Improve Classification Accuracy of Raman Spectroscopy Data

提高拉曼光谱数据分类准确率的数据合成技术比较分析

阅读:1

Abstract

Raman spectra are examples of high dimensional data that can often be limited in the number of samples. This is a primary concern when Deep Learning frameworks are developed for tasks such as chemical species identification, quantification, and diagnostics. Open-source data are difficult to obtain and often sparse; furthermore, the collecting and curating of new spectra require expertise and resources. Deep generative modeling utilizes Deep Learning architectures to approximate high dimensional distributions and aims to generate realistic synthetic data. The evaluation of the data and the performance of the deep models is usually conducted on a per-task basis and provides no indication of an increase to robustness, or generalization, on a wider scale. In this study, we compare the benefits and limitations of a standard statistical approach to data synthesis (weighted blending) with a popular deep generative model, the Variational Autoencoder. Two binary data sets are divided into 3-fold to simulate small, limited samples. Synthetic data distributions are created per fold using the two methods and then augmented into the training of two Deep Learning algorithms, a Convolutional Neural Network and a Fully-Connected Neural Network. The goal of this study is to observe the trends in learning as synthetic data are continually augmented to the training data in increasing batches. To determine the impact of each synthetic method, Principal Component Analysis and the discrete Fréchet distance are implemented to visualize and measure the distance between the source and synthetic distributions along with the Machine Learning metric balanced accuracy for evaluating performance on imbalanced data.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。