
High dimensional sports statistics and machine learning in NBA
- 1 Chinese University of Hong Kong
* Author to whom correspondence should be addressed.
Abstract
In this study, we delve into the intersection of high-dimensional statistics and machine learning within the realm of sports analytics, with a particular focus on real-time prediction of NBA game outcomes. We harness cutting-edge data techniques and innovative AI models to boost our predictive capabilities and real-time performance. By combining advanced data processing with the latest in machine and deep learning, we're able to deliver more accurate and timely insights across a range of complex scenarios. Our approach integrates Bayesian statistical methods to quantify prediction uncertainty, ensuring robust and interpretable models. We utilize a combination of traditional machine learning models, such as Random Forest and Logistic Regression, alongside advanced deep learning architectures, including CNNs, RNNs, LSTMs, and Transformer networks. Our comprehensive preprocessing pipeline includes advanced statistical techniques for handling missing values and outliers, ensuring data consistency, and feature selection and dimensionality reduction methods like PCA and RFE. Implementing real-time data streaming technologies such as Apache Kafka and distributed databases like Apache Cassandra ensures high availability, scalability, and efficient handling of large volumes of data. This study highlights the significant potential of integrating high-dimensional statistics and deep learning in sports analytics, offering deeper insights, more accurate predictions, and real-time analysis capabilities, paving the way for future innovations and applications in the field.
Keywords
high-dimensional statistics, machine learning, sports analytics, deep learning, Bayesian method
[1]. Chenjie Cao. “Sports data mining technology used in basketball outcome prediction”. (2012).
[2]. Nguyen Hoang Nguyen et al. “The application of machine learning and deep learning in sport: predicting NBA players’ performance and popularity”. In: Journal of Information and Telecommunication 6.2 (2022), pp. 217–235.
[3]. Jingru Wang and Qishi Fan. “Application of machine learning on nba data sets”. In: Journal of Physics: Conference Series. Vol. 1802. 3. IOP Publishing. 2021, p. 032036.
[4]. Bojan Georgievski and Sabahudin Vrtagic. “Machine learning and the NBA Game”. In: Journal of Physical Education and Sport 21.6 (2021), pp. 3339–3343.
[5]. Christopher Rackauckas et al. “Universal differential equations for scientific machine learning”. In: arXiv preprint arXiv:2001.04385 (2020).
[6]. Karima Echihabi, Kostas Zoumpatianos, and Themis Palpanas. “Scalable machine learning on high- dimensional vectors: From data series to deep network embeddings”. In: Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics. 2020, pp. 1–6.
[7]. Ben Adlam and Jeffrey Pennington. “The neural tangent kernel in high dimensions: Triple descent and a multi-scale theory of generalization”. In: International Conference on Machine Learning. PMLR. 2020, pp. 74–84.
[8]. Raphael Petegrosso, Zhuliu Li, and Rui Kuang. “Machine learning and statistical methods for clustering single-cell RNA-sequencing data”. In: Briefings in bioinformatics 21.4 (2020), pp. 1209– 1223.
[9]. Shun Liu. “Model-Agnostic Interpretation Framework in Machine Learning: A Comparative Study in NBA Sports”. In: arXiv preprint arXiv:2401.02630 (2024). [10] Mason McComb, Robert Bies, and Murali Ramanathan. “Machine learning in pharmacometrics: Opportunities and challenges”. In: British Journal of Clinical Pharmacology 88.4 (2022), pp. 1482– 1499.
[10]. Mason McComb, Robert Bies, and Murali Ramanathan. “Machine learning in pharmacometrics: Opportunities and challenges”. In: British Journal of Clinical Pharmacology 88.4 (2022), pp. 1482– 1499.
[11]. Lucas Clarté et al. “Theoretical characterization of uncertainty in high-dimensional linear classification”. In: Machine Learning: Science and Technology 4.2 (2023), p. 025029.
[12]. Hichem Sahli. “An introduction to machine learning”. In: TORUS 1–toward an open resource using Services: Cloud computing for environmental data (2020), pp. 61–74.
[13]. José A Carrillo et al. “A consensus-based global optimization method for high dimensional machine learning problems”. In: ESAIM: Control, Optimisation and Calculus of Variations 27 (2021), S5.
[14]. Jeff Shuford. “Deep Reinforcement Learning Unleashing the Power of AI in Decision-Making”. In: Journal of Artificial Intelligence General science (JAIGS) ISSN: 3006-4023 1.1 (2024).
[15]. Arman Malekloo et al. “Machine learning and structural health monitoring overview with emerging technology and high-dimensional data source highlights”. In: Structural Health Monitoring 21.4 (2022), pp. 1906–1955.
[16]. Lin Hao et al. “Deep learning-based survival analysis for high-dimensional survival data”. In: Mathematics 9.11 (2021), p. 1244.
[17]. Javier Sánchez García and Salvador Cruz Rambaud. “Machine Learning Regularization Methods in High-Dimensional Monetary and Financial VARs”. In: Mathematics 10.6 (2022), p. 877.
[18]. Alexandru Niculescu-Mizil and Rich Caruana. “Predicting good probabilities with supervised learning”. In: Proceedings of the 22nd international conference on Machine learning. 2005, pp. 625–632.
[19]. Lars Ruthotto et al. “A machine learning framework for solving high-dimensional mean field game and mean field control problems”. In: Proceedings of the National Academy of Sciences 117.17 (2020), pp. 9183–9193.
[20]. Mohammed Mudassir et al. “Time-series forecasting of Bitcoin prices using high-dimensional features: a machine learning approach”. In: Neural computing and applications (2020), pp. 1–15.
[21]. Annette Spooner et al. “A comparison of machine learning methods for survival analysis of high- dimensional clinical data for dementia prediction”. In: Scientific reports 10.1 (2020), p. 20410.
[22]. Jasmin Praful Bharadiya. “A review of Bayesian machine learning principles, methods, and ap- plications”. In: International Journal of Innovative Science and Research Technology 8.5 (2023), pp. 2033–2038.
[23]. Abraham García-Aliaga et al. “In-game behaviour analysis of football players using machine learning techniques based on player statistics”. In: International Journal of Sports Science & Coaching 16.1 (2021), pp. 148–157.
[24]. Hesheng Song, Carlos Enrique Montenegro-Marin, and Sujatha Krishnamoorthy. “Secure prediction and assessment of sports injuries using deep learning based convolutional neural network”. In: Journal of Ambient Intelligence and Humanized Computing 12 (2021), pp. 3399–3410.
[25]. Md Faisal Kabir, Tianjie Chen, and Simone A Ludwig. “A performance analysis of dimensionality reduction algorithms in machine learning models for cancer prediction”. In: Healthcare Analytics 3 (2023), p. 100125.
[26]. Theodoros Georgiou et al. “A survey of traditional and deep learning-based feature descriptors for high dimensional data in computer vision”. In: International Journal of Multimedia Information Retrieval 9 (2020), pp. 135–170.
[27]. José Pedro Pinto, André Pimenta, and Paulo Novais. “Deep learning and multivariate time series for cheat detection in video games”. In: Machine Learning 110.11 (2021), pp. 3037–3057.
[28]. Mei-Ling Huang and Yun-Zhi Li. “Use of machine learning and deep learning to predict the outcomes of major league baseball matches”. In: Applied Sciences 11.10 (2021), p. 4499.
[29]. Yasaman Bahri et al. “Statistical mechanics of deep learning”. In: Annual Review of Condensed Matter Physics 11.1 (2020), pp. 501–528.
[30]. Zhongbo Bai and Xiaomei Bai. “Sports big data: management, analysis, applications, and challenges”. In: Complexity 2021.1 (2021), p. 6676297.
[31]. Ye Tian and Yang Feng. “Transfer learning under high-dimensional generalized linear models”. In: Journal of the American Statistical Association 118.544 (2023), pp. 2684–2697.
[32]. Aijun Liu, Rajendra Prasad Mahapatra, and AVR Mayuri. “Hybrid design for sports data visualization using AI and big data analytics”. In: Complex & Intelligent Systems 9.3 (2023), pp. 2969– 2980.
Cite this article
Zhu,Z. (2024). High dimensional sports statistics and machine learning in NBA. Advances in Engineering Innovation,11,78-94.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title:
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).