
Forecasting red wine quality: A comparative examination of machine learning approaches
- 1 Art and Science, University of Roche, Rochester, 14627, United State
* Author to whom correspondence should be addressed.
Abstract
This research explores the forecast of red wine quality utilizing machine learning algorithms, with a particular emphasis on the impact of alcohol content, sulphates, total sulfur dioxide, and citric acid. The original dataset, comprised of Portuguese "Vinho Verde" red wine data from 2009, was bifurcated into binary classes to delineate low-quality (ratings 1-5) and high-quality (ratings 6-10) wines. A heatmap verified the potent correlation between the chosen variables and wine quality, paving the way for their inclusion in our analysis. Four machine learning techniques were employed: Logistic Regression, K-Nearest Neighbors (KNN), Decision Tree, and Naive Bayes. Each technique was trained and assessed through resulting metrics and graphical visualizations, with diverse proportions of data assigned for training and testing. Among these techniques, Logistic Regression achieved an accuracy score of 72.08%, while KNN slightly surpassed it with an accuracy rate of 74%. The Decision Tree technique rendered the peak accuracy of 74.7%, while Naive Bayes underperformed with a score of 60.2%. From a comparative viewpoint, the Decision Tree technique exhibited superior performance, positioning it as a viable instrument for future predictions of wine quality. The capacity to predict wine quality carries significant implications for wine production, marketing, customer satisfaction, and quality control. It enables the identification of factors contributing to high-quality wine, optimization of production processes, refinement of marketing strategies, enhancement of customer service, and potential early identification of substandard wines before reaching consumers, thereby safeguarding the brand reputation of wineries.
Keywords
red wine quality, Logistic Regression, decision tree, Naive Bayes, machine learning
[1]. Feher J, Lenguello G and Lugasi A. The cultural history of wine—Theoretical background to wine therapy. Cent. Eur. J. Med. 2007,2, 379–391.
[2]. Sirén H, Sirén K and Sirén J. Evaluation of organic and inorganic compounds levels of red wines processed from Pinot Noir grapes.Anal. Chem. Res. 2015, 3, 26–36.
[3]. Gupta Y. Selection of important features and predicting wine quality using machine learning techniques. Procedia Computer Science. 2018; 125:305-312.
[4]. Grewal P, Sharma P, Rathee A and Gupta S. COMPARATIVE ANALYSIS OF MACHINE LEARNING MODELS. EPRA International Journal of Research and Development (IJRD). 2022;7(6):62–75.
[5]. Reimann C, Filzmoser P, Hron, K, Kynčlová, P and Garrett, R G. A new method for correlation analysis of compositional (environmental) data – a worked example. Sci. Total Environ. 2017, 607–608, 965-971.
[6]. Al-Ghamdi A S, Using logistic regression to estimate the influence of accident factors on accident severity. Accid. Anal. Prev. 2002, 34(6), 729-741.
[7]. Bisong E. Introduction to Scikit-learn. In: Building Machine Learning and Deep Learning Models on Google Cloud Platform. Apress, Berkeley, CA, 2019.
[8]. Susmaga R. Confusion Matrix Visualization. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol 25. Springer, Berlin, Heidelberg, 2004
[9]. Guo G, Wang H, Bell D, Bi Y and Greer K. KNN Model-Based Approach in Classification. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds) On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE. OTM 2003. Lecture Notes in Computer Science, vol 2888. Springer, Berlin, Heidelberg, 2003.
[10]. Song Y Y, Lu Y. Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry, vol 27, no. 2, 2015, pp.130-5.
[11]. Apté C, Weiss S. Data mining with decision trees and decision rules. Future Generation Computer Systems, vol 13, issues 2–3, 1997, pp. 197-210.
[12]. Rish I. An empirical study of the naive Bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence, vol. 3, no. 22, 2001, pp. 41-46.
Cite this article
Zhan,B. (2024). Forecasting red wine quality: A comparative examination of machine learning approaches. Applied and Computational Engineering,32,58-65.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 2023 International Conference on Machine Learning and Automation
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).