Abstract
Multiply-Accumulate (MAC) units are essential components in real-time Digital Signal Processing (DSP) applications such as image filtering, audio processing, and neural networks, where speed and energy efficiency are critical. This work presents a scalable and automated Hardware Description Language (HDL) generation framework for efficient MAC unit design, enabling faster integration into System-on-Chip (SoC) architectures. A Python automation script is developed to generate synthesizable Verilog code supporting user-defined bit-widths and computation cycles. The framework integrates a Grouping and Decomposition (GD) multiplier with a carry-skip adder, enhancing computational speed and minimizing energy consumption through parallelized data processing. The generated Verilog code is functionally verified using Cadence(®) Nclaunch, while synthesis and physical implementation are performed using Cadence(®) Genus and Innovus tools across multiple technology nodes. Experimental evaluation shows that the 8-bit GD-based MAC implemented in 180 nm technology achieves 14.96times lower power, 9.86times improvement in Power Delay Product (PDP) and Energy per Operation (EOP), and 6.52times improvement in Energy Delay Product (EDP) compared to existing designs. Likewise, the 16-bit MAC in 90 nm technology achieves a 70.08% reduction in delay and a 91.2% improvement in EDP. Overall, the proposed framework delivers a scalable, energy-efficient, and automation-driven solution for high-performance MAC design.