Abstract
This data article presents the CubeSat Cybersecurity Dataset for Intrusion Detection (CuCD-ID), a collection of labelled command and telemetry data designed to support machine learning-based security research for space systems. The data were generated in a high-fidelity software-in-the-loop environment using NASA's Operational Simulator for Small Satellites (NOS3) running the core Flight System (cFS). Telemetry was captured across five scripted scenarios: one nominal case and four adversarial tactics aligned with the Space Attack Research and Tactic Analysis (SPARTA) framework, specifically command flooding, false data injection, storage exhaustion, and defence impairment. All scenarios were driven by commands issued from the COSMOS v4 ground station software. The repository contains two primary tabular datasets in Comma-Separated Values (CSV) format: a raw, balanced dataset with 25,000 records and 31 features, and an augmented, noised dataset with 22,465 records and 23 features. Each record contains features parsed from Consultative Committee for Space Data Systems (CCSDS) packet headers or engineered from a 20-second sliding window, alongside system-level metrics and a numeric class label. The augmented data incorporates nine documented noise categories to emulate plausible in-orbit disturbances and improve model robustness against benign variability: White Noise, Analog Outliers, Gaps, Trends, Signal Shifts, Frequency Changes, Sensor Dropout, Magnitude Warping, and Window Time Warping. The dataset is suitable for developing and benchmarking supervised and unsupervised intrusion detection methods, including on-board and Tiny Machine Learning applications. All COSMOS v4 scripts used to generate the scenarios are also provided to ensure full reproducibility and enable extension of the data collection.