Abstract
OBJECTIVES: The automated classification of electrocardiogram (ECG) scatter plots has significant clinical value for the rapid diagnosis of cardiac arrhythmias. However, existing methods based on convolutional neural networks (CNNs) are constrained by their local receptive fields, which limit their ability to effectively capture the non-local contextual relationships and global features essential for classification. This study aims to design and validate a novel deep learning model to overcome these limitations by effectively learning long-range dependencies and key discriminative regions within ECG scatter plots, thereby improving the accuracy of arrhythmia classification. METHODS: This study proposes a vision transformer (ViT) network based on Token Selection. The method first segments the ECG scatter plot into a sequence of patches and utilizes the self-attention mechanism of the Transformer encoder to model global contextual information. The core innovation is the introduction of a Token Selection module deep within the network, which dynamically filters the most discriminative patches (Tokens) to be used for final classification. This enables the model to focus on critical regions decisive for diagnosis while reducing interference from redundant information. RESULTS: The proposed model was validated on real ECG scatter plot datasets. Experimental results demonstrate that the method achieves superior classification accuracy, outperforming traditional CNN-based models. CONCLUSION: This study establishes a vision transformer model with token selection, providing an effective and precise solution for the automated classification of ECG scatter plots. By overcoming the limitations of conventional CNN methods, this model demonstrates exceptional capability in capturing both global and key local features, thus offering a novel approach for the advancement of automated arrhythmia diagnosis technology.