Abstract
Malware has become more challenging to trace as attackers use obfuscation, polymorphism, and automated generation of very similar variants. As a result, security software must not only be able to detect malicious files but also detect their larger families and more specific variants to facilitate effective analysis and correlation. In this paper, we present a three-level deep learning architecture for malware and benign file detection, malware family classification, and subfamily assignment based solely on grayscale images extracted from Windows PE executable files. Each file is statically and dynamically analyzed and then represented as a normalized 224 × 224 grayscale image. The labelled dataset consists of benign samples, the five most prevalent malware families, and 33 subfamilies. We compare the performance of three CNN-based hybrid models under a common multi-output framework: CNN with a Temporal Convolutional Network (TCN) head, CNN with a Capsule Network (CapsNet) block, and CNN with a Bidirectional LSTM (BiLSTM) layer. A single forward pass yields predictions for all levels of the classification hierarchy. Experimental outcomes indicate that CNN + TCN reaches 99% binary accuracy, 98% family accuracy, and 94% subfamily accuracy, while CNN+CapsNet reaches 100%, 97%, and 93%, and CNN+BiLSTM reaches 100%, 98%, and 94%, respectively.