
AI-enabled exploration of the "dark matter" of the protein universe
- 1 Beijing No. 159 High School
* Author to whom correspondence should be addressed.
Abstract
The "dark matter" of the protein universe, consisting of proteins lacking structural information or functional annotations, represents a significant challenge in understanding the complexity of life. Recent breakthroughs in artificial intelligence (AI), particularly in protein structure prediction, have revolutionized our ability to illuminate this uncharted territory. AI-based methods such as AlphaFold and RoseTTAFold can predict protein structures with unprecedented accuracy and scale, while large-scale databases provide access to the predicted structural models for hundreds of millions of proteins. Leveraging these AI tools and databases, researchers can uncover novel protein families, folds, and functions, and even design new proteins, paving the way for advances in basic biology, biotechnology, and medicine. This review discusses the recent progress of AI-enabled exploration of the "dark matter" of the protein universe, highlights recent advancements, and outlines future challenges and opportunities in this field.
Keywords
protein universe, AI-driven structure prediction, protein structural, functional annotation, de novo protein design.
[1]. Akdel, M., et al., A structural biology community assessment of AlphaFold2 applications. Nat Struct Mol Biol, 2022. 29(11): p. 1056-1067.
[2]. Jaroszewski, L., et al., Exploration of uncharted regions of the protein universe. PLoS Biol, 2009. 7(9): p. e1000205.
[3]. Perrakis, A. and T.K. Sixma, AI revolutions in biology: The joys and perils of AlphaFold. EMBO Rep, 2021. 22(11): p. e54046.
[4]. Lam, S.D., et al., An overview of comparative modelling and resources dedicated to large-scale modelling of genome sequences. Acta Crystallogr D Struct Biol, 2017. 73(Pt 8): p. 628-640.
[5]. Jumper, J., et al., Highly accurate protein structure prediction with AlphaFold. Nature, 2021. 596(7873): p. 583-589.
[6]. Baek, M., et al., Accurate prediction of protein structures and interactions using a three-track neural network. Science, 2021. 373(6557): p. 871-876.
[7]. Orengo, C.A. and J.M. Thornton, Protein families and their evolution-a structural perspective. Annu Rev Biochem, 2005. 74: p. 867-900.
[8]. Senior, A.W., et al., Improved protein structure prediction using potentials from deep learning. Nature, 2020. 577(7792): p. 706-710.
[9]. Pereira, J., et al., High-accuracy protein structure prediction in CASP14. Proteins, 2021. 89(12): p. 1687-1699.
[10]. Krishna, R., et al., Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science, 2024. 384(6693): p. eadl2528.
[11]. Abramson, J., et al., Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, 2024.
[12]. Varadi, M., et al., AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res, 2022. 50(D1): p. D439-D444.
[13]. Barrio-Hernandez, I., et al., Clustering predicted structures at the scale of the known protein universe. Nature, 2023. 622(7983): p. 637-645.
[14]. Durairaj, J., et al., Uncovering new families and folds in the natural protein universe. Nature, 2023. 622(7983): p. 646-653.
[15]. Bryant, P., G. Pozzati, and A. Elofsson, Improved prediction of protein-protein interactions using AlphaFold2. Nat Commun, 2022. 13(1): p. 1265.
[16]. Gligorijevic, V., et al., Structure-based protein function prediction using graph convolutional networks. Nat Commun, 2021. 12(1): p. 3168.
[17]. Zhang, F., et al., DeepFunc: A Deep Learning Framework for Accurate Prediction of Protein Functions from Protein Sequences and Interactions. Proteomics, 2019. 19(12): p. e1900019.
[18]. Bileschi, M.L., et al., Using deep learning to annotate the protein universe. Nat Biotechnol, 2022. 40(6): p. 932-937.
[19]. Bugnon, L.A., et al., Transfer learning: The key to functionally annotate the protein universe. Patterns (N Y), 2023. 4(2): p. 100691.
[20]. Huang, P.S., S.E. Boyken, and D. Baker, The coming of age of de novo protein design. Nature, 2016. 537(7620): p. 320-7.
[21]. Kortemme, T., De novo protein design-From new structures to programmable functions. Cell, 2024. 187(3): p. 526-544.
[22]. Anishchenko, I., et al., De novo protein design by deep network hallucination. Nature, 2021. 600(7889): p. 547-552.
[23]. Watson, J.L., et al., De novo design of protein structure and function with RFdiffusion. Nature, 2023. 620(7976): p. 1089-1100.
[24]. Versini, R., et al., A Perspective on the Prospective Use of AI in Protein Structure Prediction. J Chem Inf Model, 2024. 64(1): p. 26-41.
[25]. Tamburrini, K.C., et al., Predicting Protein Conformational Disorder and Disordered Binding Sites. Methods Mol Biol, 2022. 2449: p. 95-147.
[26]. Tsaban, T., et al., Harnessing protein folding neural networks for peptide-protein docking. Nat Commun, 2022. 13(1): p. 176.
Cite this article
Song,E.Z. (2024). AI-enabled exploration of the "dark matter" of the protein universe. Theoretical and Natural Science,59,85-92.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 4th International Conference on Biological Engineering and Medical Science
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).