A platform for connecting social media data to domain-specific topics using large language models: an application to student mental health

利用大型语言模型将社交媒体数据与特定领域主题连接起来的平台:一个应用于学生心理健康的案例

阅读:1

Abstract

OBJECTIVES: To design a novel artificial intelligence-based software platform that allows users to analyze text data by identifying various coherent topics and parts of the data related to a specific research theme-of-interest (TOI). MATERIALS AND METHODS: Our platform uses state-of-the-art unsupervised natural language processing methods, building on top of a large language model, to analyze social media text data. At the center of the platform's functionality is BERTopic, which clusters social media posts, forming collections of words representing distinct topics. A key feature of our platform is its ability to identify whole sentences corresponding to topic words, vastly improving the platform's ability to perform downstream similarity operations with respect to a user-defined TOI. RESULTS: Two case studies on mental health among university students are performed to demonstrate the utility of the platform, focusing on signals within social media (Reddit) data related to depression and their connection to various emergent themes within the data. DISCUSSION AND CONCLUSION: Our platform provides researchers with a readily available and inexpensive tool to parse large quantities of unstructured, noisy data into coherent themes, as well as identifying portions of the data related to the research TOI. While the development process for the platform was focused on mental health themes, we believe it to be generalizable to other domains of research as well.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。