1. Introduction
Alzheimer's Disease (AD) is a serious neurodegenerative disease characterized by memory loss, cognitive decline, language impairment, and loss of life skills. Clinical manifestations of AD include memory loss and cognitive dysfunction [1]. In recent years, with the advancement of imaging technology, the use of magnetic resonance imaging (MRI) and positron emission tomography (PET) in the diagnosis of AD has been increasing, which helps to identify pathological changes. MRI and PET are the main imaging tools for the early diagnosis of AD.MRI detects structural changes in the brain, such as atrophy of the hippocampus, which helps to diagnose early and distinguish between different types of dementia. PET, on the other hand, helps detect early metabolic abnormalities in the brain by tracking glucose metabolism or amyloid deposition, and is an important tool for identifying pathological changes in AD. Multimodal image fusion (e.g., combining MRI and PET) further improves diagnostic accuracy [2]. The application of deep learning (e.g., convolutional neural networks, CNN) in medical image analysis has been rapidly developing, which significantly improves the efficiency and accuracy of diagnosis.AI techniques can automatically analyze large amounts of image data and capture early features, which show great potential in early diagnosis. In this study, we designed a new AI structure to improve the accuracy of early detection of AD. We utilize a multimodal image fusion technique (combining MRI and PET) and introduce a self-attention mechanism (Transformer) to capture the dependencies between brain regions. Study innovations include: multimodal image fusion The introduction of the self-attention mechanism improves detection efficiency [3]. However, due to insidious symptoms, many patients are not diagnosed in time and miss the opportunity for early treatment. Early diagnosis is crucial to slowing down the disease and improving the effectiveness of treatment. In modern medicine, many diseases show only weak and easily overlooked symptoms in their early stages, like subtle clues hidden in the darkness, which are difficult to detect. Once an early diagnosis is missed, the condition often deteriorates rapidly, bringing heavy physical and psychological burdens as well as huge economic pressures to the patients and their families. From the social point of view, a large number of patients suffer from delayed treatment, leading to soaring treatment costs, which undoubtedly aggravates the burden on the entire healthcare system and affects the rational allocation and efficient utilization of medical resources.
2. Related Work
2.1. Traditional Alzheimer's Disease Image Analysis Methods
Traditional AD image analysis methods primarily rely on hand-crafted feature extraction and classical machine learning models [4]. These methods depend on expert-designed feature extraction steps, such as extracting brain structure volumes or texture features from MRI images and then applying classical classification algorithms like Support Vector Machine (SVM) or Random Forest for classification. Although these methods have achieved some success on small datasets, they suffer from dependence on hand-crafted features, the need for extensive prior knowledge, and poor generalization ability. Furthermore, these methods often struggle with multimodal image fusion and complex feature extraction.
2.2. AI Applications in Alzheimer's Disease
In recent years, Artificial Intelligence (AI), particularly deep learning techniques, have made significant progress in medical image analysis for AD [5]. Deep learning models, such as Convolutional Neural Networks (CNN), can automatically extract high-dimensional features, avoiding the need for hand-crafted features, and have shown high accuracy and robustness in tasks like brain image classification and segmentation. For example, UNet and other deep networks have effectively detected brain atrophy areas and related pathological features when segmenting and diagnosing AD patients' MRI images. However, current AI models often lack interpretability and tend to perform less well when facing data differences between different imaging modalities (such as MRI and PET). Thus, improving model interpretability and adaptability while maintaining high accuracy remains an open problem.
2.3. Multimodal Fusion and Joint Analysis
Combining multimodal images (such as MRI and PET) for joint analysis is a significant research focus [6]. This method can integrate complementary information from different imaging modalities to improve the diagnostic performance for AD. For example, MRI provides detailed information about brain structure, while PET can show brain metabolism and amyloid-beta (Aβ) deposition. By fusing these two modalities, more accurate early-stage diagnoses can be made. Additionally, multimodal fusion-based joint learning helps mitigate the issues of data scarcity and noise, enhancing both diagnostic robustness and accuracy.
3. Methodology
3.1. Dataset
This research uses the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the Open Access Series of Imaging Studies (OASIS) datasets [7].
• ADNI: A longitudinal, multicenter research project designed to collect MRI, PET imaging, and clinical assessment data to develop biomarkers for the early detection and tracking of AD.
• OASIS: Provides multimodal neuroimaging data, including MRI, PET, and cognitive and biomarker data, covering normal aging and AD patients.
3.2. Data Preprocessing and Augmentation
Before training the model, the following preprocessing and augmentation steps were applied to the image data:
• Denoising: Spatial domain denoising methods, such as median filtering, were used to reduce noise and improve image quality.
• Normalization: Image data was normalized to a consistent intensity range to ensure comparability across different samples.
• Segmentation: Automatic or manual segmentation techniques were used to extract relevant brain regions and remove irrelevant background information.
• Data Augmentation: Operations such as rotation, flipping, and scaling were applied to expand the dataset and improve the model's generalization ability.
3.3. Model Architecture
The AI model designed in this research includes the following components:
• Convolutional Neural Network (CNN): A ResNet architecture was used, leveraging its deep feature extraction capability to capture complex patterns in brain images.
• Multimodal Joint Learning: MRI and PET image data were fused using parallel CNN branches, with each branch processing one modality’s input. The features were then fused at the feature layer to combine information from both modalities, improving diagnostic accuracy.
• Self-Attention Mechanism (Transformer): After feature fusion, a Transformer module was introduced to capture long-range dependencies between brain regions, enhancing the model’s understanding of complex patterns.
3.4. Loss Function and Optimization
• Loss Function: The cross-entropy loss function was used to measure the difference between predicted and true labels.
• Optimization Method: The Adam optimizer was used, with an adaptive learning rate adjustment mechanism to accelerate model convergence and improve training efficiency.
4. Experiments
4.1. Experimental Setup
The experiment was conducted using the ADNI dataset, which includes MRI and PET imaging data. The dataset was split as follows:
• Training Set: 70% of the data, used for model training.
• Validation Set: 15% of the data, used for hyperparameter tuning and preventing overfitting.
• Test Set: 15% of the data, used for evaluating the model's generalization performance.
Evaluation metrics include:
• Accuracy: The proportion of correctly classified samples out of total samples.
• Sensitivity: The proportion of AD patients correctly identified.
• Specificity: The proportion of healthy individuals correctly identified.
• AUC (Area Under the Curve): A comprehensive metric for evaluating the model’s discrimination ability.
4.2. Model Training and Tuning
Training details:
• Hyperparameter Tuning: Grid search was used to optimize hyperparameters such as learning rate, batch size, and regularization parameters.
• Data Augmentation: Data augmentation techniques such as rotation, translation, scaling, and flipping were applied to increase data diversity and improve model robustness.
• Early Stopping: The validation loss was monitored, and if there was no improvement after several rounds, training was stopped early to prevent overfitting.
4.3. Experimental Results
Performance on the test set:
• Accuracy: 92.5%
• Sensitivity: 90.8%
• Specificity: 93.7%
• AUC: 0.95
Comparison with traditional methods:
• Traditional CNN Model: Accuracy = 85.3%, AUC = 0.88
• Classic Machine Learning Methods (e.g., SVM): Accuracy = 80.2%, AUC = 0.82
The proposed multimodal fusion model outperforms traditional methods in all metrics, particularly in AUC, demonstrating superior discrimination ability.
4.4. Visualization Analysis
To help clinicians understand the model’s decision-making process, Grad-CAM was used to generate heatmaps that highlight the brain regions the model focuses on. For AD patients, the heatmap showed the model focusing on regions like the hippocampus and entorhinal cortex, which are known to be affected in AD.
5. Discussion
5.1. Performance Analysis
With an accuracy of 92.5% and an AUC of 0.95, the suggested multimodal fusion model beat conventional CNN models and standard machine learning techniques in early AD detection, exhibiting exceptional performance [8]. There are two primary reasons for the performance enhancement. First, by combining MRI and PET data, the multimodal data fusion technique allowed for a thorough use of both structural and functional information, which improved the discriminative capacity of the model. Second, the self-attention mechanism's inclusion was essential because it improved the model's comprehension of intricate patterns by capturing long-range dependencies across different parts of the brain. Nevertheless, the model still has several drawbacks in spite of its outstanding performance. One drawback is how much it depends on the quality of the data. Noise or artifacts in the data may have a detrimental effect on diagnosis accuracy because the model's performance is heavily dependent on high-quality imaging data. The enormous amount of processing resources needed is another drawback. The self-attention mechanism and multimodal fusion greatly raise the computational complexity of the model, requiring a large amount of hardware resources to function.
5.2. Model Interpretability
To enhance model interpretability, an advanced technique known as Grad-CAM was employed [9]. This powerful tool was specifically utilized to generate highly detailed and informative heatmaps. These heatmaps serve as a visual representation, clearly indicating the precise regions within the brain that the model directs its attention towards during the diagnostic process. The generated heatmaps vividly demonstrated that the model predominantly focused on the hippocampus and entorhinal cortex. These areas are of particular significance as they are the ones that are typically severely affected by Alzheimer's Disease (AD). The visualization provided by these heatmaps is of immense value. It offers clinicians a unique opportunity to gain a deeper understanding of the complex decision-making process that the AI model undertakes. By being able to peer into the model's "thought process," clinicians can have increased confidence and trust in the diagnoses that are driven by AI. This, in turn, can lead to more seamless integration of AI technology into clinical practice, potentially revolutionizing the way neurodegenerative diseases such as AD are diagnosed and treated.
5.3. Application Challenges and Limitations
Even while the model performs well, there are still a number of issues with it in practical implementations. First off, the model heavily relies on diversified and high-quality data in terms of data quality [10]. However, bias, missing values, and noise are common in real-world clinical data, which may affect how well it performs. Second, in terms of computing complexity, the self-attention mechanism and multimodal fusion add complexity that might limit the model's use in settings with limited resources. Thirdly, additional validation is still needed to determine whether the model can generalize across various populations, scanning technologies, and clinical contexts. Future enhancements might include a variety of topics. The model's ability to adapt to various data distributions can be improved, for instance, by using data augmentation and transfer learning. The computational complexity can also be decreased by using model compression and acceleration strategies like pruning and quantization. To further ensure the model's resilience and generalizability, cross-domain validation should be carried out to verify its performance across various clinical settings and demographics.
6. Conclusion
In this study, an AI model based on multimodal fusion and self-attention mechanism was designed to perform well in the early detection of Alzheimer's disease. The model captures the complex relationships between brain regions by fusing MRI and PET image data, which improves the diagnostic accuracy. In addition, Grad-CAM technology was used to enhance the interpretability of the model, providing clinicians with effective decision support. Despite the excellent performance of the model in experiments, it still faces many challenges in practical applications. It requires high data quality and diversity, and the actual clinical data may be noisy, missing or biased, which will affect the model performance. Multimodal fusion and self-attention mechanisms increase computational complexity, which may limit the application in resource-constrained environments; and its generalizability to different populations, scanning devices, and clinical environments has yet to be verified. Future improvements include data augmentation and migration learning, i.e., data augmentation techniques and migration learning to improve the model's adaptability to different data distributions, and model compression techniques such as pruning and quantization to reduce the computational complexity, in order to increase the potential for application in resource-constrained environments. Cross-domain validation in different populations and clinical environments is performed to ensure the generalizability and robustness of the model. Future research can be conducted in the following directions: first, expanding clinical datasets to validate the model performance and improve its degradation capability on larger and diverse clinical datasets; second, exploring cross-modal data fusion, such as fusing genomic data and other modal data such as clinical records, to further improve the diagnostic accuracy; and third, developing real-time diagnostic systems that can be integrated into clinical workflows to assist physicians in making rapid and accurate AD diagnoses. and accurate AD diagnosis. With the above improvements, it is expected that the model in this study will play a greater role in practical clinical applications and help early detection and intervention of Alzheimer's disease.
References
[1]. World Health Organization. (2021). Dementia. Retrieved from https://www.who.int/news-room/fact-sheets/detail/dementia
[2]. Alzheimer's Disease International. (2019). World Alzheimer Report 2019: Attitudes to dementia. Retrieved from https://www.alzint.org/
[3]. Brodaty, H., & Donkin, M. (2009). Family caregivers of people with dementia. International Psychogeriatrics, 21(3), 317-332. doi:10.1017/S1041610209008375
[4]. McKhann, G. M., Knopman, D. S., Chertkow, H., et al. (2011). The diagnosis of dementia due to Alzheimer's disease: recommendations from the National Institute on Aging-Alzheimer's Association workgroups. Alzheimer's & Dementia, 7(3), 263-269. doi:10.1016/j.jalz.2011.03.005
[5]. Sperling, R. A., Aisen, P. S., Beckett, L. A., et al. (2011). Toward defining the preclinical stages of Alzheimer's disease: Recommendations from the National Institute on Aging-Alzheimer's Association workgroups. Alzheimer's & Dementia, 7(3), 280-292. doi:10.1016/j.jalz.2011.03.003
[6]. Frisoni, G. B., Fox, N. C., Jack, C. R., et al. (2010). The clinical use of structural MRI in Alzheimer disease. Nat Rev Neurol, 6(2), 67-77.
[7]. Jack, C. R., Knopman, D. S., Jagust, W. J., et al. (2010). Hypothetical model of dynamic biomarkers of the Alzheimer’s pathological cascade. Lancet Neurol, 9(1), 119-128.
[8]. Liu, M., Cheng, D., Wang, K., & Wang, Y. (2018). Multi-modality cascaded convolutional neural networks for Alzheimer's disease diagnosis. Neuroinformatics, 16(3-4), 295-308
[9]. Zhang, D., Wang, Y., Zhou, L., Yuan, H., & Shen, D. (2011). Multimodal classification of Alzheimer's disease and mild cognitive impairment. NeuroImage, 55(3), 856-867.
[10]. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (pp. 618-626).
Cite this article
Wang,H. (2025). Research on an AI Model for Early Alzheimer's Detection. Applied and Computational Engineering,116,168-173.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 5th International Conference on Signal Processing and Machine Learning
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).
References
[1]. World Health Organization. (2021). Dementia. Retrieved from https://www.who.int/news-room/fact-sheets/detail/dementia
[2]. Alzheimer's Disease International. (2019). World Alzheimer Report 2019: Attitudes to dementia. Retrieved from https://www.alzint.org/
[3]. Brodaty, H., & Donkin, M. (2009). Family caregivers of people with dementia. International Psychogeriatrics, 21(3), 317-332. doi:10.1017/S1041610209008375
[4]. McKhann, G. M., Knopman, D. S., Chertkow, H., et al. (2011). The diagnosis of dementia due to Alzheimer's disease: recommendations from the National Institute on Aging-Alzheimer's Association workgroups. Alzheimer's & Dementia, 7(3), 263-269. doi:10.1016/j.jalz.2011.03.005
[5]. Sperling, R. A., Aisen, P. S., Beckett, L. A., et al. (2011). Toward defining the preclinical stages of Alzheimer's disease: Recommendations from the National Institute on Aging-Alzheimer's Association workgroups. Alzheimer's & Dementia, 7(3), 280-292. doi:10.1016/j.jalz.2011.03.003
[6]. Frisoni, G. B., Fox, N. C., Jack, C. R., et al. (2010). The clinical use of structural MRI in Alzheimer disease. Nat Rev Neurol, 6(2), 67-77.
[7]. Jack, C. R., Knopman, D. S., Jagust, W. J., et al. (2010). Hypothetical model of dynamic biomarkers of the Alzheimer’s pathological cascade. Lancet Neurol, 9(1), 119-128.
[8]. Liu, M., Cheng, D., Wang, K., & Wang, Y. (2018). Multi-modality cascaded convolutional neural networks for Alzheimer's disease diagnosis. Neuroinformatics, 16(3-4), 295-308
[9]. Zhang, D., Wang, Y., Zhou, L., Yuan, H., & Shen, D. (2011). Multimodal classification of Alzheimer's disease and mild cognitive impairment. NeuroImage, 55(3), 856-867.
[10]. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (pp. 618-626).