Research Article
Open access
Published on 5 July 2024
Download pdf
Xie,C. (2024). A refined approach to early movie box office prediction leveraging ensemble learning and feature encoding. Applied and Computational Engineering,75,273-284.
Export citation

A refined approach to early movie box office prediction leveraging ensemble learning and feature encoding

Chuang Xie *,1,
  • 1 Hefei University of Technology

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2755-2721/75/20240555

Abstract

Predicting the revenue of a movie prior to its release presents a significant challenge. The ability to predict pre-release revenue enables movie production companies to devise effective marketing strategies and mitigate the risks associated with potential box office failures. The primary hurdles in this endeavor stem from managing the myriad factors influencing box office outcomes and accurately forecasting a movie's revenue before it becomes available to the public. To overcome these challenges, we introduce a sophisticated Early Movie Box Office Prediction Model that incorporates Ensemble Learning and Feature Encoding techniques. This model amalgamates multiple foundational models, utilizing regression and decision trees to forecast box office revenues. Our composite model demonstrates superior performance over its constituent models, achieving an impressive accuracy rate of 91.4%.

Keywords

Early Movie Box Office Prediction, Ensemble Learning, XGBoost, GDBT

[1]. M. Mestyán, T. Yasseri, and J. Kertész, “Early Prediction of Movie Box Office Success Based on Wikipedia Activity Big Data,” PLoS One, vol. 8, no. 8, pp. 1–13, 2013, doi: 10.1371/journal.pone.0071226.

[2]. S. Dey, “Predicting Gross Movie Revenue,” Apr. 2018, [Online]. Available: http://arxiv.org/abs/1804.03565

[3]. P. Chakraborty, M. Zahidur, and S. Rahman, “Movie Success Prediction using Historical and Current Data Mining,” Int. J. Comput. Appl., vol. 178, no. 47, pp. 1–5, 2019, doi: 10.5120/ijca2019919415.

[4]. G. Verma and H. Verma, “Predicting Bollywood Movies Success Using Machine Learning Technique,” Proc. - 2019 Amity Int. Conf. Artif. Intell. AICAI 2019, no. June 2020, pp. 102–105, 2019, doi: 10.1109/AICAI.2019.8701239.

[5]. M. Agarwal, S. Venugopal, R. Kashyap, and R. Bharathi, “A Comprehensive Study on Various Statistical Techniques for Prediction of Movie Success,” pp. 17–30, 2021, doi: 10.5121/csit.2021.111802.

[6]. K. R. Apala, M. Jose, S. Motnam, C. C. Chan, K. J. Liszka, and F. De Gregorio, “Prediction of movies box office performance using social media,” Proc. 2013 IEEE/ACM Int. Conf. Adv. Soc. Networks Anal. Mining, ASONAM 2013, no. October 2015, pp. 1209–1214, 2013, doi: 10.1145/2492517.2500232.

[7]. M. Galvão and R. Henriques, “Forecasting Movie Box Office Profitability,” J. Inf. Syst. Eng. Manag., vol. 3, no. 3, 2018, doi: 10.20897/jisem/2658.

[8]. N. Quader, M. O. Gani, D. Chaki, and M. H. Ali, “A machine learning approach to predict movie box-office success,” 20th Int. Conf. Comput. Inf. Technol. ICCIT 2017, vol. 2018–Janua, pp. 1–7, 2018, doi: 10.1109/ICCITECHN.2017.8281839.

[9]. N. Quader, M. O. Gani, and Di. Chaki, “Performance evaluation of seven machine learning classification techniques for movie box office success prediction,” 3rd Int. Conf. Electr. Inf. Commun. Technol. EICT 2017, vol. 2018–Janua, no. December, pp. 1–6, 2018, doi: 10.1109/EICT.2017.8275242.

[10]. Y. Liu and T. Xie, “Machine learning versus econometrics: prediction of box office,” Appl. Econ. Lett., vol. 26, no. 2, pp. 124–130, 2019, doi: 10.1080/13504851.2018.1441499.

[11]. S. Wu, Y. Zheng, Z. Lai, F. Wu, and C. Zhan, “Movie box office prediction based on ensemble learning,” ISPCE-CN 2019 - IEEE Int. Symp. Prod. Compliance Eng. 2019, no. July, pp. 1–4, 2019, doi: 10.1109/ISPCE-CN48734.2019.8958631.

[12]. S. Lee, B. KC, and J. Y. Choeh, “Comparing performance of ensemble methods in predicting movie box office revenue,” Heliyon, vol. 6, no. 6, p. e04260, 2020, doi: 10.1016/j.heliyon.2020.e04260.

[13]. Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A Neural Probabilistic Language Model,” J. Mach. Learn. Res., vol. 3, no. 6, pp. 1137–1155, 2003, doi: 10.1162/153244303322533223.

[14]. “TMDB Box Office Prediction: Can you predict a movie’s worldwide box office revenue?” Accessed: Oct. 12, 2023. [Online]. Available: https://www.kaggle.com/c/tmdb-box-office-prediction

[15]. L. Jiang, “Research on Marvel Studios’ Product Marketing Strategy in the New Media Environment,” SHS Web Conf., vol. 181, no. 1, p. 04009, Jan. 2024, doi: 10.1051/shsconf/202418104009.

[16]. D. Delen and R. Sharda, “Predicting the Financial Success of Hollywood Movies Using An Information Fusion Approach,” Endüstri Mühendisligi Derg., vol. 21, no. 1, pp. 30–37, 2010.

[17]. L. Breiman, “Random Forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.

[18]. J. H. Friedman, “Greedy Function Approximation: A Gradient Boosting Machine,” Ann. Stat., vol. 29, no. 5, pp. 1189–1232, Mar. 2001, [Online]. Available: http://www.jstor.org/stable/2699986

[19]. T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., vol. 13–17–Augu, pp. 785–794, 2016, doi: 10.1145/2939672.2939785.

[20]. Y. Sun, X. Wang, C. Zhang, and M. Zuo, “Multiple Regression: Methodology and Applications,” Highlights Sci. Eng. Technol., vol. 49, pp. 542–548, 2023, doi: 10.54097/hset.v49i.8611.

[21]. R. Tibshirani, “Regression Shrinkage and Selection Via the Lasso,” J. R. Stat. Soc. Ser. B, vol. 58, no. 1, pp. 267–288, 1996, doi: 10.1111/j.2517-6161.1996.tb02080.x.

[22]. D. Berrar, “Cross-validation,” Encycl. Bioinforma. Comput. Biol. ABC Bioinforma., vol. 1–3, no. April, pp. 542–545, 2018, doi: 10.1016/B978-0-12-809633-8.20349-X.

[23]. X. He, K. Zhao, and X. Chu, “AutoML: A survey of the state-of-the-art,” Knowledge-Based Syst., vol. 212, no. Dl, 2021, doi: 10.1016/j.knosys.2020.106622.

[24]. Y. Liao, Y. Peng, S. Shi, V. Shi, and X. Yu, “Early box office prediction in China’s film market based on a stacking fusion model,” Ann. Oper. Res., vol. 308, no. 1–2, pp. 321–338, 2022, doi: 10.1007/s10479-020-03804-4.

Cite this article

Xie,C. (2024). A refined approach to early movie box office prediction leveraging ensemble learning and feature encoding. Applied and Computational Engineering,75,273-284.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 2nd International Conference on Software Engineering and Machine Learning

Conference website: https://www.confseml.org/
ISBN:978-1-83558-509-2(Print) / 978-1-83558-510-8(Online)
Conference date: 15 May 2024
Editor:Stavros Shiaeles
Series: Applied and Computational Engineering
Volume number: Vol.75
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).