Abstract
The evaluation and enhancement of image aesthetics play a pivotal role in the development of visual media, impacting fields including photography, design, and computer vision. Composition, a key factor shaping visual aesthetics, significantly influences an image's vividness and expressiveness. However, existing image optimization methods face practical challenges: compression-induced distortion, imprecise object extraction, and cropping-caused unnatural proportions or content loss. To tackle these issues, this paper proposes an image aesthetic evaluation with composition and similarity (IACS) method that harmonizes composition aesthetics and image similarity through a unified function. When evaluating composition aesthetics, the method calculates the distance between the main semantic line (or salient object) and the nearest rule-of-thirds line or central line. For images featuring prominent semantic lines, a modified Hough transform is utilized to detect the main semantic line, while for images containing salient objects, a salient object detection method based on luminance channel salience features (LCSF) is applied to determine the salient object region. In evaluating similarity, edge similarity measured by the Canny operator is combined with the structural similarity index (SSIM). Furthermore, we introduce a Framework for Image Aesthetic Evaluation with Composition and Similarity-Based Optimization (FIACSO), which uses semantic segmentation and generative adversarial networks (GANs) to optimize composition while preserving the original content. Compared with prior approaches, the proposed method improves both the aesthetic appeal and fidelity of optimized images. Subjective evaluation involving 30 participants further confirms that FIACSO outperforms existing methods in overall aesthetics, compositional harmony, and content integrity. Beyond methodological contributions, this study also offers practical value: it supports photographers in refining image composition without losing context, assists designers in creating balanced layouts with minimal distortion, and provides computational tools to enhance the efficiency and quality of visual media production.