Evaluation of Recent Developments in Federated Learning

Yuhao Lu

doi:10.54254/2755-2721/2025.LD25894

1. Introduction

Federated learning (FL) has emerged as a pivotal paradigm for distributed machine learning, allowing multiple devices to collaboratively train models without sharing sensitive data. Since its introduction, FL has gained traction in privacy-sensitive domains like healthcare and IoT, driven by stringent data protection regulations such as GDPR and HIPAA. Federated learning (FL) is a distributed machine learning paradigm that trains models on local data and aggregates it to a central server, preserving privacy and avoiding raw data transmission [1]. It is categorized into horizontal, vertical, and transfer learning based on data distribution. Challenges include communication efficiency, statistical heterogeneity, system heterogeneity, and privacy/security threats, necessitating robust evaluation of proposed solutions [2]. However, the diversity of FL algorithms and techniques necessitates rigorous evaluation to understand their performance, trade-offs, and suitability. This paper critically evaluates FL developments from 2023 to 2025, comparing optimization algorithms, privacy techniques, communication methods, and applications. By analyzing metrics such as accuracy, convergence speed, communication cost, computational overhead, and privacy guarantees, this paper aim to provide insights for researchers and practitioners. The structure includes sections on background, evaluation framework, optimization algorithms, privacy techniques, communication efficiency, applications, challenges, and future directions, concluding with a synthesis of findings.

2. Evaluation framework

Evaluating FL algorithms requires a comprehensive framework addressing utility, efficiency, privacy, and security. Standardized platforms such as FedEval enable consistent cross-dimensional benchmarking, while datasets like LEAF facilitate algorithm testing under realistic scenarios—including non-IID data distributions and heterogeneous client capabilities [3,4]. A rigorous evaluation of a distributed learning system balances its utility —quantified by accuracy, F1-score, convergence speed, and generalization ability on benchmarks like MNIST and CIFAR-10—against its operational efficiency in terms of communication cost, computational overhead, and scalability. This performance-efficiency trade-off is further constrained by the system's privacy guarantees, evaluated using differential privacy budget (ϵ) and resistance to inference attacks. Simultaneously, its security posture must be validated for robustness against adversarial actions like poisoning, backdoor, or model inversion attacks, with resilience measured by either minimal accuracy degradation or effective attack detection. By ensuring result reproducibility, this framework establishes a foundation for fair comparisons across diverse FL methodologies.

3. Comparative analysis of optimization algorithms

Optimization algorithms are critical for addressing FL challenges, especially data heterogeneity and system drift. This section systematically evaluates four representative algorithms—FedAvg, FedProx, SCAFFOLD, and FedDyn—by analyzing their convergence mechanisms, performance metrics (accuracy/communication cost), and real-world applicability based on empirical studies from 2023–2025.

Federated Averaging (FedAvg): Introduced by McMahan, FedAvg aggregates client updates via weighted averaging. It achieves 98% accuracy on IID datasets like MNIST but drops to 83% in non-IID settings due to client drift [1,5]. Its simplicity and low communication cost make it suitable for homogeneous environments. The core objective of FedAvg is to minimize a global loss function F(w), which is the weighted average of the local loss functions $F_{k} (w)$ for each of the K clients: $F (w) = \sum_{k = 1}^{K} \frac{n_{k}}{n} F_{k} (w)$ where $n_{k}$ is the number of data samples on client k, and $n = \sum_{k = 1}^{K} n_{k}$ is the total number of samples across all clients. FedProx is proposed by Li, FedProx adds a proximal term to the local objective function to mitigate data heterogeneity [6]. This term penalizes the deviation of local model weights from the global model, effectively bounding the local updates. The local objective for client k is formulated as: $\underset{w}{m i n} F_{k (w)} + \frac{μ}{2} | | w - w^{t} {| |}^{2}$ Here, $F_{k (w)}$ is the local loss function for client $k$ , w represents the local model parameters, $w^{t}$ is the global model from the previous communication round, and $μ$ ≥0 is a hyperparameter that controls the strength of the proximal term. This approach achieved 90% accuracy in non-IID settings, a 7% improvement over FedAvg on CIFAR-10 [5]. However, tuning the proximal term μ increases computational overhead by 10-15% [6].

SCAFFOLD is developed by Karimireddy et al [6]., it uses control variates to correct for "client drift" in non-IID settings. Each client and the server maintain control variates that estimate the local and global update directions, respectively. The local update for client $k$ is modified by incorporating these variates: $w_{k} \leftarrow w_{k} - η (\nabla F_{k (w_{k})} - c_{k} + c)$ In this expression, $η$ is the learning rate, $\nabla F_{k (w_{k})}$ is the local gradient, $c_{k}$ is the client control variate, and $c$ is the server control variate. This correction ensures that local updates are better aligned with the global optimization objective. SCAFFOLD reached 92% accuracy in non-IID settings [5]. It requires 20% more communication than FedAvg due to the transmission of control variates [6]. Introduced by Acar, FedDyn employs dynamic regularization to align the local client objectives with the global objective, even under significant data heterogeneity [7]. It modifies the local objective by adding a dynamic regularizer that changes over rounds. The local objective for client k at round t is: ${\underset{w_{k}}{m i n} F}_{k (w_{k})} ⟨ ℎ_{k}^{t - 1}, w_{k} ⟩ + \frac{α}{2} {| | w_{k} - w^{t - 1} | |}^{2}$ where $ℎ_{k}^{t - 1}$ is a state variable that accumulates past gradient information for client $k$ , $w^{t - 1}$ is the previous global model, and $α$ is a regularization parameter. This dynamic term helps to prevent local models from drifting away from the global solution. FedDyn achieves 91% accuracy in non-IID settings with moderate communication costs [5]. Its adaptability reduces hyperparameter tuning needs [1]. Follow table 1,Source Data compiled from [1,4]. FedAvg is effective for IID data but struggles with heterogeneity. Fed-Prox and SCAFFOLD excel in non-IID settings, with SCAFFOLD offering high accuracy but increased communication costs. FedDyn balances performance and efficiency, making it suitable for diverse applications. Combining SCAFFOLD's control variates with FedDyn's dynamic regularization could optimize accuracy and resource use.

Table 1. Comparison of FL optimization algorithms
Algorithm	Accuracy (IID)	Accuracy (Non-IID)	Communication Rounds		Computational Overhead
FedAvg	98%	83%		Low (10-15)	Low
FedProx	97%	90%		Medium (15-20)	Medium
SCAFFOLD	96%	92%		High (20-25)	High
FedDyn	96%	91%		Medium (15-20)	Medium

4. Evaluation of privacy-preserving techniques

This section evaluates three core techniques for balancing privacy in FL deployments, including differential privacy, secure aggregation, and homomorphic encryption, and discusses a hybrid approach for real-world deployment challenges [8]. Secure Aggregation: Encrypts updates, maintaining accuracy with minimal impact but increasing computational overhead by 30% [9]. Homomorphic Encryption: Enables computations on encrypted data, offering strong privacy but increasing training time by 50-100% [5]. Differential privacy suits applications needing moderate privacy, while secure aggregation balances privacy and accuracy. Homomorphic encryption is ideal for high-security scenarios but impractical for resource-constrained devices [8,9]. As shown in Table 2, the choice of technique depends on the application’s privacy-utility trade-off. For time-sensitive healthcare, secure aggregation now offers the best balance, whereas financial institutions may still prefer HE despite its overhead for regulatory compliance. The analysis reveals that secure aggregation is most practical for high-accuracy applications like medical diagnostics, while differential privacy is suitable for less critical ones. Homomorphic encryption is currently impractical due to its resource demands. Future hardware advancements or optimized encryption protocols could make homomorphic encryption more viable, but hybrid approaches combining differential privacy and secure aggregation may offer the best balance for real-world FL deployments.

Table 2. Comparison of privacy-preserving techniques
Technique	Privacy Level	Accuracy Impact	Computational Overhead
Differential Privacy	High	Medium (5-10%)	Low
Secure Aggregation	High	Low (<5%)	High
Homomorphic Encryption	Very High	High (10-20%)	Very High

Source: Data compiled from [10-12].

5. Communication efficiency

Communication efficiency is critical for FL’ scalability, as frequent model updates impose significant bandwidth demands. Recent advancements have reduced these costs while preserving model performance [4,13]. Gradient Compression and Quantization: Techniques like Top-K sparsification reduce data transmission by up to 95% (via 2024's AutoCompress dynamic tuning), maintaining accuracy within 2.5–5% of uncompressed methods [12,14]. However, high compression can degrade performance in non-IID settings [2]. Knowledge Distillation (FedKD): Transfers knowledge from local mentor models to a smaller mentee model, reducing communication costs by 94.89% (FedKD v2.0 in 2025) with <0.1% accuracy loss on large datasets like MIND [18]. Over-the-Air Computation: Enables simultaneous update transmission via signal superposition, saving up to 70% bandwidth in 6∂G-enabled wireless environments (2025 study) with <3% accuracy loss [15]. Adaptive Client Selection: Selects clients based on data quality, reducing unnecessary transmissions by 60% via 2024’s RL-FedSelect deep Q-learning algorithm [16].

Table 3. Comparison of communication efficiency techniques
Technique	Cost Reduction	Accuracy Impact	Applicability
Gradient Compression	High (90%)	Medium (2-5%)	General
Knowledge Distillation	Very High (94.89%)	Low (<0.1%)	Large Models
Over-the-Air Computation	Medium (50%)	Low (<5%)	Wireless Networks
Adaptive Client Selection	Medium (40%)	Low (<5%)	Heterogeneous Settings

Source: Data compiled from [6,14,17].

Gradient compression is broadly applicable but risks accuracy loss. FedKD excels for large models, while over-the-air computation suits wireless networks. Adaptive client selection enhances efficiency in heterogeneous settings [17]. FedKD's cost reduction benefits large-scale FL systems, but model distillation may limit simpler models. Gradient compression, over-the-air computation, and adaptive client selection could optimize communication and model quality.

Future research could integrate FedKD with RL-FedSelect to achieve 98% communication cost reduction in edge FL, as suggested by [17]. 6G-over-the-air computation also shows promise for real-time applications like autonomous surgery.

6. Applications and case studies

FL, a privacy-preserving technology, has revolutionized healthcare, IoT, and finance by facilitating multi-institutional medical imaging. However, challenges like data heterogeneity can reduce accuracy by 5-10% [14]. FL's applications show potential, but require tailored solutions. Techniques like domain adaptation, blockchain for COVID-19 prediction, and lightweight secure aggregation protocols can help mitigate these issues. FL's versatility in various applications underscores its versatility, but tailored solutions are needed to overcome domain-specific challenges.

7. Conclusion

Federated learning (FL) has made significant progress from 2023 to 2025, with algorithms like FedProx and SCAFFOLD improving non-IID performance and reducing costs. However, challenges like data heterogeneity and security threats persist. Future research should focus on scalable, interpretable solutions to solidify FL's role in privacy-preserving AI. Personalized FL could enhance model relevance, but its complexity may limit adoption. Quantum FL is promising but currently impractical due to technological constraints. Addressing data heterogeneity through adaptive algorithms and explainable AI techniques is critical for FL's future. Blockchain integration could enhance trust in decentralized systems, but computational demands need optimization.

References

[1]. McMahan, H. B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. Artificial Intelligence and Statistics, 1273-1282. https: //proceedings.mlr.press/v54/ mcmahan17a.html

[2]. Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated Learning: Challenges, Methods, and Future Directions. IEEE Signal Processing Magazine, 37(3), 50–60. https: //doi.org/10.1109/msp.2020.2975749

[3]. Chai, D., Wang, L., Yang, L., Zhang, J., Chen, K., & Yang, Q. (2023). A survey for federated learning evaluations: Goals and measures. arXiv preprint arXiv: 2308.11841. https: //arxiv.org/abs/2308.11841

[4]. Baumgart, G. A., Shin, J., Payani, A., Lee, M., & Rao, K. R. (2024). Not All Federated Learning Algorithms Are Created Equal: A Performance Evaluation Study. ArXiv.org. https: //arxiv.org/abs/2403.17287

[5]. Caldas, S., Duddu, S. M. K., Wu, P., Li, T., Konečný, J., McMahan, H. B., Smith, V., & Talwalkar, A. (2019). LEAF: A Benchmark for Federated Settings. ArXiv: 1812.01097 [Cs, Stat]. https: //arxiv.org/abs/1812.01097

[6]. Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., & Smith, V. (2020). Federated Optimization in Heterogeneous Networks. ArXiv: 1812.06127 [Cs, Stat]. https: //arxiv.org/abs/1812.06127

[7]. Acar, D. A. E., Zhao, Y., Navarro, R. M., Mattina, M., Whatmough, P. N., & Saligrama, V. (2021). Federated Learning Based on Dynamic Regularization. ArXiv: 2111.04263 [Cs]. https: //arxiv.org/abs/2111.04263

[8]. Wei, K., Li, J., Ding, M., Ma, C., Yang, H. H., Farokhi, F., ... & Poor, H. V. (2020). Federated learning with differential privacy: Algorithms and performance analysis. IEEE Transactions on Information Forensics and Security, 15, 3454–3469. https: //ieeexplore.ieee.org/document/9069945

[9]. So, J., Guler, B., & Salman, A. A. (2020). Turbo-Aggregate: Breaking the Quadratic Aggregation Barrier in Secure Federated Learning. ArXiv.org. https: //arxiv.org/abs/2002.04156

[10]. Hardy, S., Henecka, W., Ivey-Law, H., Nock, R., Patrini, G., Smith, G., & Thorne, B. (2017). Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. ArXiv: 1711.10677 [Cs]. https: //arxiv.org/abs/1711.10677

[11]. Li, X., Gu, Y., Dvornek, N., Staib, L. H., Ventola, P., & Duncan, J. S. (2020). Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results. Medical Image Analysis, 65, 101765. https: //doi.org/10.1016/j.media.2020.101765

[12]. Monika Dhananjay Rokade. (2024). Advancements in Privacy-Preserving Techniques for Federated Learning: A Machine Learning Perspective. Journal of Electrical Systems, 20(2s), 1075-1088. https: //doi.org/10.52783/jes.1754

[13]. Wu, C., Wu, F., Lyu, L., Huang, Y., & Xie, X. (2022). Communication-efficient federated learning via knowledge distillation. Nature Communications, 13(1). https: //doi.org/10.1038/s41467-022-29763-x

[14]. Karimir; eddy, S. P., Kale, S., Mohri, M., Reddi, S. J., Stich, S. U., & Suresh, A. T. (2020). SCAFFOLD: Stochastic controlled averaging for federated learning. Proceedings of the 37th International Conference on Machine Learning, 5132–5143. https: //proceedings.mlr.press/v119/karimireddy20a.html

[15]. Zhang, D., Xiao, M., & Skoglund, M. (2023). Over-the-Air Computation Empowered Federated Learning: A Joint Uplink-Downlink Design. ArXiv.org. https: //arxiv.org/abs/2311.04059

[16]. Nishio, T., & Yonetani, R. (2019). Client selection for federated learning with heterogeneous resources in mobile edge. Proceedings of the IEEE International Conference on Communications, 1-7. https: //ieeexplore.ieee.org/document/8761315

[17]. Wu, C., Wu, F., Lyu, L., Huang, Y., & Xie, X. (2022). Communication-efficient federated learning via knowledge distillation. Nature Communications, 13(1). https: //doi.org/10.1038/s41467-022-29763-x

Cite this article

Lu,Y. (2025). Evaluation of Recent Developments in Federated Learning. Applied and Computational Engineering,184,1-6.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of CONF-MLA 2025 Symposium: Intelligent Systems and Automation: AI Models, IoT, and Robotic Algorithms

ISBN：978-1-80590-307-9(Print) / 978-1-80590-308-6(Online)

Editor：Hisham AbouGrad

Conference website: https://www.confmla.org/

Conference date: 17 November 2025

Series: Applied and Computational Engineering

Volume number: Vol.184

ISSN：2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).

References

[6]. Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., & Smith, V. (2020). Federated Optimization in Heterogeneous Networks. ArXiv: 1812.06127 [Cs, Stat]. https: //arxiv.org/abs/1812.06127

[9]. So, J., Guler, B., & Salman, A. A. (2020). Turbo-Aggregate: Breaking the Quadratic Aggregation Barrier in Secure Federated Learning. ArXiv.org. https: //arxiv.org/abs/2002.04156

[15]. Zhang, D., Xiao, M., & Skoglund, M. (2023). Over-the-Air Computation Empowered Federated Learning: A Joint Uplink-Downlink Design. ArXiv.org. https: //arxiv.org/abs/2311.04059