Abstract
Proteins are fundamental biological macromolecules responsible for regulating nearly all cellular processes, and their functions are largely determined by the underlying amino acid sequences. Understanding the relationship between protein sequences and their structural and functional properties remains a major challenge in molecular biology. Although experimental techniques provide accurate insights, they are often time-consuming and expensive, which has led to increasing reliance on computational approaches for protein sequence analysis and structure prediction. Numerous techniques-including homology modelling, machine learning, deep learning, natural language processing and other artificial intelligence methods-have been developed for analysing protein sequences and predicting protein-protein interactions. However, the rapidly growing number of computational methods makes it difficult for researchers to systematically evaluate and select suitable approaches for specific biological applications. This study presents a comprehensive review of recent computational techniques for protein sequence analysis, focusing on machine learning- and deep learning-based frameworks used for predicting protein structure and interactions. This review systematically categorises existing approaches based on their methodological foundations, datasets and performance characteristics and provides a comparative discussion of their advantages and limitations. Furthermore, this study highlights current research gaps, challenges in large-scale protein analysis and emerging trends in AI-driven protein modelling. The findings of this review provide a structured reference for researchers and practitioners, facilitating the selection of appropriate computational techniques and supporting future advancements in protein sequence analysis, disease research and drug discovery.