Abstract
The purpose of this work is to explore methods of visual communication based on generative artificial intelligence in the context of new media. This work proposes an image automatic generation and recognition model that integrates the Conditional Generative Adversarial Network (CGAN) with the Transformer algorithm. The generator component of the model takes noise vectors and conditional variables as inputs. Subsequently, a Transformer module is incorporated, where the multi-head self-attention mechanism enables the model to establish complex relationships among different data points. This is further refined through linear transformations and activation functions to enhance feature representations. Ultimately, the self-attention mechanism captures the long-range dependencies within images, facilitating the generation of high-quality images that meet specific conditions. The model's performance is assessed, and the findings show that the accuracy of the proposed model reaches 95.69%, exceeding the baseline algorithm Generative Adversarial Network by more than 4%. Additionally, the Peak Signal-to-Noise Ratio of the model's algorithm is 33dB, and the Structural Similarity Index is 0.83, indicating higher image generation quality and recognition accuracy. Therefore, the model proposed achieves high recognition and prediction accuracy of generated images, and higher image quality, promising significant application value in visual communication in the new media era.