Abstract
INTRODUCTION: Developing short-form psychological measures is essential for reducing respondent burden, saving time, and conserving resources. However, existing short-form development approaches typically require full-scale administration and rely on factor analysis or machine learning techniques based on response data. METHODS: This study proposes a novel, data-independent method for item reduction using transformer-based semantic embeddings. Items from the International Personality Item Pool Big-Five Factor Markers (IPIP-50) were embedded using the sentence-t5-xxl model to generate dense semantic representations. These embeddings were clustered via K-means, and representative items were selected based on their proximity to cluster centroids. RESULTS: The resulting 30-item short form preserved the original five-factor structure and demonstrated strong psychometric properties. When compared with Classical Test Theory and a Genetic Algorithm, the proposed method achieved comparable levels of reliability, convergent validity, and predictive performance. DISCUSSION: These findings highlight the potential of transformer-based embedding approaches for efficient item reduction and item development. The results support the feasibility of a resource-efficient, linguistically grounded alternative to data-dependent reduction methods.