Abstract
Self-harm is an increasing public health problem with high prevalence rates in adolescents. Furthermore, it can be an indicator of different mental health disorders (e.g., depression). Recently, diverse computational methods have leveraged user-generated data from social media to study and identify this issue following a text classification perspective. In this context, previous work has shown that personal statements -phrases containing first-person pronouns- contain valuable information for modeling the profiles of authors, including some mental health conditions. Motivated by these discoveries, this paper examines the relevance of personal statements to tackle the self-harm detection task on social media. Furthermore, we adapted approaches that pay special attention to words in this type of sentence to reveal characteristics of users, specifically, self-harming behavior. Currently, these approaches assign the same level of importance to words contained in personal phrases without distinguishing which are more associated with the personal contexts of authors. Hence, we introduced a novel weighting factor that exploits the proximity between personal pronouns and words to quantify their relevance in the task. This novel weighting factor, inspired by findings from author profiling and depression detection studies, is being evaluated for the first time in the context of self-harm detection. Our experimental results demonstrated significant improvements over state-of-the-art methods in self-harm detection, including transformer-based methods and pretrained language models for mental healthcare. This refined approach not only surpasses the previous weighting factor in its application to depression detection, but also exceeds it by a difference of more than 3.9%.