Application of Artificial Intelligence in Drug Discovery and Development: Targeted Design and Toxicological Property Prediction

Xinran Wang

doi:10.54254/2753-8818/2025.AU27100

1. Introduction

The application of Artificial Intelligence (AI) in the pharmaceutical industry has advanced considerably over the past few years. Continuous advancements in AI technologies have driven innovation in drug discovery and development, as well as clinical treatment models [1]. In the early stages, AI-generated visualization systems for molecular structures and chemical properties were developed. These tools provided researchers with more intuitive representations of molecular forms, which lay a solid foundation for drug design processes. With the progression of deep learning algorithms and complex computational models, AI has achieved remarkable breakthroughs in drug discovery and development applications, which significantly shorten the time required to identify candidate drugs by precisely predicting molecular interactions [2]. AI has also demonstrated high efficiency in critical processes such as tumor target identification and high-throughput screening (HST) [2,3]. This has facilitated the transformation of new drug innovation and development from traditional experience-driven approaches to data-driven methods, which improve the overall effectiveness of pharmaceutical research.

Currently, AI can integrate multiple types of data to optimize drug dosage and manufacturing processes, reduce research and development costs, accelerate technology transfer, and provide technological support [1]. The deep integration of AI into the end-to-end drug discovery process has become a key future development direction. In clinical precision therapy, the fusion of AI and medical big data has catalyzed a new model of personalized treatment. By using comprehensive genomic sequencing databases and electronic medical record (EMR) systems, AI can integrate multidimensional data including patient genomic features, clinical phenotypes, and treatment response feedback, to construct accurate models for predicting medication use [4-6]. In summary, the in-depth application of artificial intelligence in fields such as drug target design, property prediction, and clinical applications has been widely recognized. This article focuses on elaborating on artificial intelligence in two areas: drug target design and pharmacotoxicological property prediction.

2. AI-driven strategies for cancer drug target identification and optimization

Protein three-dimensional structure analysis is fundamental to drug target design, but traditional methods are time-consuming and rely heavily on experiments, which limit drug development efficiency. AlphaFold predicted the structure of CDK20 while Chemistry 42 designed and synthesized 8,918 compounds [4]. Ultimately, biological tests were conducted on the seven most promising compounds. In the initial screening, the binding constant (Kd) of compound ISM042-2-001 with CDK20 was 8.9 ± 1.6 μM. In the second round of AI-driven compound generation, a more active compound, ISM042-2-048, was identified with a Kd of 210.0 ± 42.4 nM. The entire process, from target screening to inhibitor identification, took only 30 days with only 13 compounds synthesized. This study provides a new framework for using AlphaFold to accelerate drug discovery. It is the first to apply AlphaFold in drug discovery, which demonstrates its potential in structure prediction and drug design. By integrating AlphaFold with AI platforms, an efficient transition from target selection to inhibitor identification has been achieved.

Generative drug design holds the potential to explore vast chemical spaces and discover novel compounds, but molecules generated by existing methods are often impractical due to poor physicochemical properties or lack of biological validation. To address this, Wu et al. proposed TamGen, a target-aware molecule generation method based on a chemical language model [5]. TamGen consists of three core modules: a GPT-like compound decoder (pre-trained on 10 million SMILES) responsible for molecule generation, a Transformer protein encoder processing target binding pocket information, and a VAE contextual encoder assisting in compound optimization. On the CrossDocked2020 dataset, compounds generated by TamGen ranked among the top two across 5 metrics, with 100 compounds generated in just 9 seconds, 394 times faster than 3D-AR. For Mycobacterium tuberculosis ClpP, 14 inhibitors were identified, with the optimal one having an IC50 of 1.9 μM. Compared with 3D generation methods such as liGAN, TamGen compounds have 1.78 fused rings (close to FDA-approved drugs) and better synthetic accessibility, which demonstrates significant advantages in target-specific drug design. Moreover, TamGen generated 2,612 unique compounds, from which four leads were identified. Refinement produced 8,635 derivatives, with 296 tested experimentally. From a commercial library of 446,000 molecules, 159 structural analogs were identified, and five showed strong inhibitory activity (IC50 < 20 μM). The best candidate, Analog-005, achieved an IC50 of 1.9 μM. TamGen also showed high efficiency, generating 100 molecules in 9 seconds, much faster than ResGen, TargetDiff, Pocket2Mol, and 3D-AR.

Protein-ligand binding prediction represents a critical step in drug discovery, but existing deep learning models often rely on the topology of protein-ligand bipartite networks rather than molecular features, which resultes in poor generalizability to novel structures. To address this, AI-Bind combines network sampling strategies with unsupervised pre-training, balancing positive and negative samples through protein-ligand pairs with a shortest path distance of ≥ 7, and using larger chemical libraries for pre-training molecular embeddings to learn more structural patterns [6]. Its VecNet model performs excellently in inductive tests, with an AUROC of 0.75 ± 0.032 and an AUPRC of 0.718 ± 0.029, which significantly outperform DeepPurpose (AUROC 0.61 ± 0.074) and MolTrans (AUROC 0.612 ± 0.028) [6]. In predictions for COVID-19-related proteins, 74 out of 84 top predictions were validated by docking (F1-Score = 0.82), and it can identify active binding sites such as pockets in Trim59 [6]. Compared to models relying on topological shortcuts, AI-Bind has significant advantages in predicting novel molecules, providing a high-throughput tool for drug-target screening.

ScreenDL is a new deep learning framework designed for clinical precision oncology. The model uses two separate branches to capture drug chemical information and tumor transcriptomic features, which are integrated in a shared sub-network. This network predicts drug activity by measuring the half-maximal inhibitory concentration (IC50). ScreenDL uses a three-step training process: pre-training, transfer learning, and patient-specific fine-tuning. In real tests, the model used data from 50 patients with triple-negative breast cancer (TNBC), along with patient-derived xenografts (PDX) and patient-derived tumor organoids (PDxO). After the three training steps, the model’s predictive performance improved significantly. In pre-training, ScreenDL achieved a median Pearson correlation of 0.15 per drug, far better than the baseline model’s 0.03. After transfer learning, the correlation rose to 0.39, and after patient-specific fine-tuning, it reached 0.51. This demonstrates clear improvement in clinically relevant predictions [7].

3. Advances in AI-driven tools for drug toxicology prediction

Drug-induced liver injury (DILI) is a major reason for drug development failure and withdrawal. Rapid, reliable prediction tools are therefore critical. Zheng et al. developed an AI model called DILITracer, which predicts DILI levels using brightfield images of human liver organoids (HLOs). The study included three steps: drug treatment, image collection, and model training. Researchers selected 30 drugs with varying liver toxicity levels from the FDA DILIrank database, treated HLOs with these drugs, and collected numerous brightfield images across different times and focal planes to build training and validation datasets. DILITracer uses the BEiT-V2 visual transformer to extract image features, with pre-training on approximately 700,000 cell images to enhance recognition of HLO morphological changes. Researchers added spatial encoding (ViT module) and time encoding (Bi-LSTM) to capture temporal changes. The model can make predictions from single images and track liver toxicity development using time-series data. It achieved an overall prediction accuracy of 82.34%, with 90.16% accuracy for no-liver-toxicity (N0-DILI) drugs, and can classify drugs into three levels: Most-DILI, Less-DILI, and No-DILI. This approach is more aligned with clinical needs than simple yes/no classification. This study shows that combining organoid models, brightfield imaging, and AI provides a fast, scalable, human-relevant method for predicting liver toxicity. Future work should integrate pharmacokinetic and toxicity data to enhance clinical utility [8].

Di Stefano et al. developed VenomPred 2.0, which aggregates data from databases such as ToxCast/Tox21 and ChEMBL, and utilizes 4 classification algorithms including random forest and support vector machine, combined with Morgan, RDKit, and PubChem chemical fingerprints to construct 12 models for each toxicity endpoint [9]. Prediction results showed excellent performance in androgenic activity prediction (MCC > 0.90), with an average MCC of ~0.50 for acute oral toxicity prediction (best model = 0.55) and over 75% accuracy for toxicological predictions of test set compounds (0.78). The average MCC for eye and skin irritation predictions was ~0.40 (highest = 0.44 and 0.49, respectively). The best models are based on the MLP algorithm, with precision indicating that at least 65% of compounds predicted to be toxic are correctly labeled, and specificity scores exceeding 0.80, which show high reliability in predicting harmless compounds. Both models achieved over 0.75 accuracy. VenomPred 2.0 provides visual results and uses the SHAP method to explain toxicity predictions, which identifies toxic-related structures such as the dihydrofuran moiety in AFB1 and the phenolic fragment in 6-ketoestrone. Compared with other tools, it covers a wider range of toxicity endpoints and performs better on most prediction metrics, which offer significant advantages in toxicological prediction for drug development and helping researchers efficiently evaluate small molecule toxicity [10].

4. Conclusion

Traditional drug development has long been constrained by the inefficiency of experience-driven models, with notable bottlenecks in target discovery accuracy, molecular design innovation, and forward-looking safety assessment, which results in lengthy development cycles and substantial resource consumption. In recent years, the deep integration of artificial intelligence technologies has brought revolutionary breakthroughs to this field. Through precise analysis of protein structures, artificial intelligence accelerates target validation; relies on generative models to expand the design space for novel molecular scaffolds; and leverages multimodal data integration to enhance the generalization ability of drug activity and toxicity predictions. These advancements fundamentally reshape the entire workflow logic of drug development, which drive the industry toward a data-driven, precision-oriented paradigm. In the development of drugs for complex diseases such as cancer, artificial intelligence not only enables efficient connectivity from target identification to candidate compound optimization but also demonstrates potential surpassing traditional methods in critical areas like toxicological evaluation, which provide new solutions for balancing drug safety and efficacy. Such technological empowerment is reflected not only in improved research efficiency but also in breaking the cognitive boundaries of traditional research, which turns previously inaccessible chemical spaces and biological mechanisms into exploitable new frontiers. However, the application of artificial intelligence in drug development still faces multidimensional challenges. Issues such as standardized integration of multi-source data, mechanistic interpretation of complex biological systems, and expansion of model generalization boundaries have not yet been fully resolved. Going forward, interdisciplinary integration should serve as the pathway. Through algorithm iteration and data ecosystem construction, the interpretability and reliability of models must be strengthened. This will drive artificial intelligence from an auxiliary tool to a decision-making hub, ultimately achieving intelligent, personalized, and accessible drug development, and providing sustainable innovative momentum for global health initiatives.

References

[1]. Bajwa, J., Munir, U., Nori, A. and Williams, B. (2021) Artificial intelligence in healthcare: transforming the practice of medicine. Future Healthcare Journal, 8(2), e188-e194.

[2]. Bender, A. and Cortes-Ciriano, I. (2021) Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data. Drug Discovery Today, 26(4), 1040-1052.

[3]. Wu, H.C., Chang, D.K. and Huang, C.T. (2006) Targeted therapy for cancer. Journal of Cancer Molecules, 2(2), 57-66.

[4]. Ren, F., Ding, X., Zheng, M., Korzinkin, M., Cai, X., Zhu, W., Mantsyzov, A., Aliper, A., Aladinskiy, V., Cao, Z., Kong, S., Long, X., Liu, B. H. M., Liu, Y., Naumov, V., Shneyderman, A., Ozerov, I. V., Wang, J., Pun, F. W., Polykovskiy, D. A., Sun, C., Levitt, M., Aspuru-Guzik, A. and Zhavoronkov, A. (2022) AlphaFold Accelerates Artificial Intelligence Powered Drug Discovery: Efficient Discovery of a Novel Cyclin-dependent Kinase 20 (CDK20) Small Molecule Inhibitor. Chemical Science, 14(6), 1443-1452.

[5]. Wu, K., Xia, Y., Deng, P., Liu, R., Zhang, Y., Guo, H., Cui, Y., Pei, Q., Wu, L., Xie, S., Chen, S., Lu, X., Hu, S., Wu, J., Chan, C.-K., Chen, S., Zhou, L., Yu, N., Chen, E., Liu, H., Guo, J., Qin, T. and Liu, T-Y. (2024) TamGen: drug design with target-aware molecule generation through a chemical language model. Nature Communications, 15, Article 9360.

[6]. Chatterjee, A., Walters, R., Shafi, Z., Ahmed, O. S., Sebek, M., Gysi, D., Yu, R., Eliassi-Rad, T., Barabási, A-L. and Menichetti, G. (2023) Improving the generalizability of protein-ligand binding predictions with AI-Bind. Nat Commun. 14(1), 1989.

[7]. Sederman, C., Di Sera, T., Qiao, Y., Huang, X., Welm, B. E., Welm, A. L. and Marth, G. (2024) Abstract 907: ScreenDL: A transfer learning framework integrating tumor omics and functional drug screening for personalized clinical drug response prediction. Cancer Research, 84(6_Supplement), 907.

[8]. Tan, S., Ding, Y., Wang, W., Rao, J., Cheng, F., Zhang, Q., Xu, T., Hu, T., Hu, Q., Ye, Z., Yan, X., Wang, X., Li, M., Xie, P., Chen, Z., Liang, G., Pu, Y., Zhang, J. and Gu, Z. (2025) Development of an AI Model for DILI-Level Prediction Using Liver Organoid Brightfield Images. Commun. Biol., 8, Article 886.

[9]. Richard, A. M., Huang, R., Waidyanatha, S., Shinn, P., Collins, B. J., Thillainadarajah, I., Grulke, C. M., Williams, A. J., Lougee, R. R., Judson, R. S., Houck, K. A., Shobair, M., Yang, C., Rathman, J. F., Yasgar, A., Fitzpatrick, S. C., Simeonov, A., Thomas, R. S., Crofton, K. M., Paules, R. S., Bucher, J. R., Austin, C. P., Kavlock, R. J. and Tice, R. R. (2021) The Tox21 10K Compound Library: Collaborative Chemistry Advancing Toxicology. Chemical Research in Toxicology, 34(2), 189-216.

[10]. Di Stefano, M., Galati, S., Piazza, L., Granchi, C., Mancini, S., Fratini, F., Macchia, M., Poli, G. and Tuccinardi, T. (2024) VenomPred 2.0: A Novel In Silico Platform for an Extended and Human Interpretable Toxicological Profiling of Small Molecules. Journal of Chemical Information and Modeling, 64(7), 2275-2289.

Cite this article

Wang,X. (2025). Application of Artificial Intelligence in Drug Discovery and Development: Targeted Design and Toxicological Property Prediction. Theoretical and Natural Science,137,8-12.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of ICBioMed 2025 Symposium: AI for Healthcare: Advanced Medical Data Analytics and Smart Rehabilitation

ISBN：978-1-80590-371-0(Print) / 978-1-80590-372-7(Online)

Editor：Alan Wang

Conference website: https://2025.icbiomed.org/auckland.html

Conference date: 17 October 2025

Series: Theoretical and Natural Science

Volume number: Vol.137

ISSN：2753-8818(Print) / 2753-8826(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).

References

[1]. Bajwa, J., Munir, U., Nori, A. and Williams, B. (2021) Artificial intelligence in healthcare: transforming the practice of medicine. Future Healthcare Journal, 8(2), e188-e194.

[3]. Wu, H.C., Chang, D.K. and Huang, C.T. (2006) Targeted therapy for cancer. Journal of Cancer Molecules, 2(2), 57-66.