Abstract
Artificial intelligence (AI) text detection tools are considered a means of preserving the integrity of scholarly publication by identifying whether a text is written by humans or generated by AI. This study evaluates three popular tools (GPTZero, ZeroGPT, and DetectGPT) through two experiments: first, distinguishing human-written abstracts from those generated by ChatGPT o1 and Gemini 2.0 Pro Experimental; second, evaluating AI-assisted abstracts where the original text has been enhanced by these large language models (LLMs) to improve readability. Results reveal notable trade-offs in accuracy and bias, disproportionately affecting non-native speakers and certain disciplines. This study highlights the limitations of detection-focused approaches and advocates a shift toward ethical, responsible, and transparent use of LLMs in scholarly publication.