Research Article
Open access
Published on 12 October 2024
Download pdf
Hou,Y. (2024). Predictive modeling in high-frequency trading using machine learning. Applied and Computational Engineering,90,61-65.
Export citation

Predictive modeling in high-frequency trading using machine learning

Yanbo Hou *,1,
  • 1 David R. Cheriton School of Computer Science, University of Waterloo, 200 University Ave W, Waterloo, ON N2L 3G1, Canada

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2755-2721/90/20241764

Abstract

High-frequency trading (HFT) has transformed financial markets by enabling rapid trade execution and exploiting minute market inefficiencies. This study explores the application of machine learning (ML) techniques to predictive modeling in HFT. Four ensemble boosting methods—Adaptive Boosting, Logic Boosting, Robust Boosting, and Random Under-Sampling (RUS) Boosting—were evaluated using order book data from Euronext Paris. The models were trained and validated on data from a single trading day, with performance assessed using precision, recall, ROC curves, and feature importance analysis. Results indicate that Robust Boosting achieves the highest precision (90%), while Adaptive Boosting and RUS Boosting demonstrate higher recall (94% and 93%, respectively). This research highlights the potential of ML in enhancing HFT strategies, with implications for future trading system developments.

Keywords

High-frequency trading, Predictive modeling, Ensemble boosting, Order book data

[1]. Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3), 307-327.

[2]. Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. New York: Springer.

[3]. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.

[4]. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). New York: Springer.

[5]. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. New York: Springer.

[6]. Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. Cambridge: MIT Press.

[7]. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.

[8]. Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical Machine Learning Tools and Techniques (4th ed.). Burlington: Morgan Kaufmann.

[9]. Aggarwal, C. C., & Zhai, C. (2012). Mining Text Data. New York: Springer.

[10]. Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B (Methodological), 36(2), 111-133.

[11]. Arnott, R., Hsu, J., & Moore, P. (2019). Improved beta. The Journal of Portfolio Management, 45(1), 17-29.

[12]. Kreps, J., Narkhede, N., & Rao, J. (2011, June). Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB (Vol. 11, No. 2011, pp. 1-7).

[13]. Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., & Stoica, I. (2013, November). Discretized streams: Fault-tolerant streaming computation at scale. In Proceedings of the twenty-fourth ACM symposium on operating systems principles (pp. 423-438).

[14]. Dean, J., & Ghemawat, S. (2004). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.

Cite this article

Hou,Y. (2024). Predictive modeling in high-frequency trading using machine learning. Applied and Computational Engineering,90,61-65.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 6th International Conference on Computing and Data Science

Conference website: https://2024.confcds.org/
ISBN:978-1-83558-609-9(Print) / 978-1-83558-610-5(Online)
Conference date: 12 September 2024
Editor:Alan Wang, Ammar Alazab
Series: Applied and Computational Engineering
Volume number: Vol.90
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).