Abstract
EEG-based subject identification is an emerging biometric approach with strong potential for secure authentication, but reliable performance requires optimisation of the entire processing pipeline. The key difficulty lies in improving signal quality while preserving the subtle neural signatures that uniquely distinguish individuals . In this study, we propose a complete framework that integrates lenient preprocessing, spectral feature extraction, and ensemble classification. Using the Brain Encoding Dataset(BED), we evaluated three data variants: raw EEG recordings, signals processed with a modified Pre-processing (PREP) pipeline using relaxed thresholds, and expert-curated pre-extracted features. All datasets were analyzed with mel-frequency cepstral coefficients(MFCC), and classification was performed within an ensemble architecture that combined decision trees, random forests, support vector machines, and XGBoost. The experiments covered 21 subjects, 33 sessions, and twelve stimulus conditions including resting state, cognitive tasks, and visual evoked potentials. XGBoost achieved peak accuracy of 98.00% using Visual Evoked Potential Complex stimulation at 10 Hz on cleaned data, representing a 5.3% improvement over raw signals and an 8.4% improvement over pre-extracted features. Statistical validation confirmed that these improvements are robust across all experimental conditions at ([Formula: see text]). Cross-session evaluation further demonstrated the expected temporal variability in EEG-based biometrics but showed that the proposed pipeline improves robustness compared with both raw and conventionally processed data, with Rest Closed Eyes emerging as the most stable paradigm. These findings establish a principled framework for EEG-based subject identification and provide practical guidelines for optimizing preprocessing, feature extraction, classification, and stimulus paradigms for real-world deployment with consumer-grade hardware and system approach.