User clustering-based GAN-LSTM model for viewport prediction in 360-degree video streaming

Research Article
Open access

User clustering-based GAN-LSTM model for viewport prediction in 360-degree video streaming

Yi Liang 1*
  • 1 Guilin University of Electronic Technology    
  • *corresponding author liangyiguet@163.com
Published on 10 June 2025 | https://doi.org/10.54254/2977-3903/2025.24025
AEI Vol.16 Issue 6
ISSN (Print): 2977-3911
ISSN (Online): 2977-3903

Abstract

Accurate viewport prediction is crucial for enhancing user experience in 360-degree video streaming. However, due to significant behavioral differences among user groups, traditional single LSTM models tend to fall into local optima and fail to achieve precise predictions. To address this, this paper proposes a hybrid prediction model based on user clustering. First, a Density-Based Clustering Algorithm (DBSCAN) is used to group users with similar behavioral patterns. Then, a hybrid prediction model combining Generative Adversarial Networks (GANs) and Long Short-Term Memory networks (LSTMs) is designed to effectively mitigate data imbalance and overfitting through collaborative training. Experiments conducted on three real-world datasets from YouTube demonstrate that this approach significantly outperforms existing methods based on user trajectories or video saliency in terms of prediction accuracy and stability.

Keywords:

360-degree video streaming, viewport prediction, LSTM

Liang,Y. (2025). User clustering-based GAN-LSTM model for viewport prediction in 360-degree video streaming. Advances in Engineering Innovation,16(6),65-73.
Export citation

References

[1]. Bastug, E., Bennis, M., Medard, M., & Debbah, M. (2017). Toward interconnected virtual reality: Opportunities, challenges, and enablers. IEEE Communications Magazine, 55(6), 110–117. https://doi.org/10.1109/MCOM.2017.1601089

[2]. Mao, Y., Sun, L., Liu, Y., & Wang, Y. (2020). Low-latency FoV-adaptive coding and streaming for interactive 360° video streaming. Proceedings of the 28th ACM International Conference on Multimedia (pp. 3696–3704). https://doi.org/10.1145/3394171.3413751

[3]. Zhang, X., Cheung, G., Zhao, Y., Le Callet, P., Lin, C., & Tan, J. Z. G. (2021). Graph learning based head movement prediction for interactive 360 video streaming. IEEE Transactions on Image Processing, 30, 4622–4636. https://doi.org/10.1109/TIP.2021.3073283

[4]. Wang, S., Yang, S., Su, H., Zhao, C., Xu, C., Qian, F., Wang, N., & Xu, Z. (2024). Robust saliency-driven quality adaptation for mobile 360-degree video streaming. IEEE Transactions on Mobile Computing, 23(2), 1312–1329. https://doi.org/10.1109/TMC.2023.3294698

[5]. Wang, S., Yang, S., Su, H., Zhao, C., Xu, C., Qian, F., Wang, N., & Xu, Z. (2024). Robust saliency-driven quality adaptation for mobile 360-degree video streaming. IEEE Transactions on Mobile Computing, 23(2), 1312–1329. https://doi.org/10.1109/TMC.2024.3360123

[6]. Tu, J., Chen, C., Yang, Z., Li, M., Xu, Q., & Guan, X. (2023). PSTile: Perception-sensitivity-based 360° tiled video streaming for industrial surveillance. IEEE Transactions on Industrial Informatics, 19(9), 9777–9789. https://doi.org/10.1109/TII.2022.3216812

[7]. Jiang, Z., Ji, B., Wu, F., Liu, Y., & Zhang, Y. (2020). Reinforcement learning based rate adaptation for 360-degree video streaming. IEEE Transactions on Broadcasting, 67(2), 409–423. https://doi.org/10.1109/TBC.2020.3034157

[8]. Zhang, J., Qin, Q., Wan, T., & Luo, X. (2022). Perception-based pseudo-motion response for 360-degree video streaming. IEEE Signal Processing Letters, 29, 1973–1977. https://doi.org/10.1109/LSP.2022.3200882

[9]. Park, S., Lee, J., & Choi, J. (2021). Mosaic: Advancing user quality of experience in 360-degree video streaming with machine learning. IEEE Transactions on Network and Service Management, 18(1), 1000–1015. https://doi.org/10.1109/TNSM.2020.3047821

[10]. Hou, X., Wang, S., Zhou, Z., Wang, Y., & Wu, D. (2020). Predictive adaptive streaming to enable mobile 360-degree and VR experiences. IEEE Transactions on Multimedia, 23, 716–731. https://doi.org/10.1109/TMM.2020.2990446

[11]. Zou, J., Hao, T., Yu, C., & Sun, H. (2019). Probabilistic tile visibility-based server-side rate adaptation for adaptive 360-degree video streaming. IEEE Journal of Selected Topics in Signal Processing, 14(1), 161–176. https://doi.org/10.1109/JSTSP.2019.2951538

[12]. Yuan, H., Chen, Y., Yu, M., & Zhu, W. (2019). Spatial and temporal consistency-aware dynamic adaptive streaming for 360-degree videos. IEEE Journal of Selected Topics in Signal Processing, 14(1), 177–193. https://doi.org/10.1109/JSTSP.2019.2952001

[13]. Hassan, S. M. H. U., Lee, Y., & Kim, M. (2023). User profile-based viewport prediction using federated learning in real-time 360-degree video streaming. 2023 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (pp. 1–7). https://doi.org/10.1109/BMSB58369.2023.10211189

[14]. Chen, J., Sun, Y., He, D., & Wu, E. (2021). Sparkle: User-aware viewport prediction in 360-degree video streaming. IEEE Transactions on Multimedia, 23, 3853–3866. https://doi.org/10.1109/TMM.2020.3030440

[15]. Dong, P., Zhang, S., Zhang, H., & Xu, Y. (2023). Predicting long-term field of view in 360-degree video streaming. IEEE Network, 37(1), 26–33. https://doi.org/10.1109/MNET.123.2200371

[16]. Jin, Y., Zhang, Z., Wang, W., & Li, S. (2023). Ebublio: Edge-assisted multiuser 360° video streaming. IEEE Internet of Things Journal, 10(17), 15408–15419. https://doi.org/10.1109/JIOT.2023.3283859

[17]. Chen, J., Sun, Y., He, D., & Wu, E. (2023). Live360: Viewport-aware transmission optimization in live 360-degree video streaming. IEEE Transactions on Broadcasting, 69(1), 85–96. https://doi.org/10.1109/TBC.2022.3220616

[18]. Nguyen, A., & Yan, Z. (2023). Enhancing 360 video streaming through salient content in head-mounted displays. Sensors, 23(5), Article 2470. https://doi.org/10.3390/s23052470

[19]. Xu, X., Chen, L., & Liu, Y. (2023). Multi-features fusion based viewport prediction with GNN for 360-degree video streaming. 2023 IEEE International Conference on Metaverse Computing, Networking and Applications (pp. 57–64). https://doi.org/10.1109/MetaCom57706.2023.00020

[20]. Wang, W., Shen, J., Guo, F., Cheng, M., & Borji, A. (2018). Revisiting video saliency: A large-scale benchmark and a new model. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4894–4903). https://doi.org/10.1109/CVPR.2018.00514

[21]. Xu, M., Song, Y., Wang, J., Qiao, M., Huo, L., & Wang, Z. (2017). Predicting head movement in panoramic video: A deep reinforcement learning approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(11), 2693–2708. https://doi.org/10.1109/TPAMI.2018.2858783

[22]. Chopra, L., Chakraborty, D., & Mondal, A. (2021). PARIMA: Viewport adaptive 360-degree video streaming. Proceedings of the Web Conference 2021 (pp. 2379–2391). https://doi.org/10.1145/3442381.3449857

[23]. Li, J., Zhu, C., Xu, Z., Liu, Y., & Zhang, Z. (2023). Spherical convolution empowered FoV prediction in 360-degree video multicast with limited FoV feedback. IEEE Transactions on Circuits and Systems for Video Technology, 33(12), 7245–7259. https://doi.org/10.1109/TCSVT.2023.3275976

[24]. Li, C., Xu, M., Zhang, S., & Le Callet, P. (2019). Very long term field of view prediction for 360-degree video streaming. 2019 IEEE Conference on Multimedia Information Processing and Retrieval (pp. 297–302). https://doi.org/10.1109/MIPR.2019.00061

[25]. Fan, C., Lee, J., Lo, W., Huang, C., Chen, K., & Hsu, C. (2017). Fixation prediction for 360° video streaming in head-mounted virtual reality. Proceedings of the 27th Workshop on Network and Operating Systems Support for Digital Audio and Video (pp. 1–6). https://doi.org/10.1145/3083165.3083181


Cite this article

Liang,Y. (2025). User clustering-based GAN-LSTM model for viewport prediction in 360-degree video streaming. Advances in Engineering Innovation,16(6),65-73.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Journal:Advances in Engineering Innovation

Volume number: Vol.16
Issue number: Issue 6
ISSN:2977-3903(Print) / 2977-3911(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).

References

[1]. Bastug, E., Bennis, M., Medard, M., & Debbah, M. (2017). Toward interconnected virtual reality: Opportunities, challenges, and enablers. IEEE Communications Magazine, 55(6), 110–117. https://doi.org/10.1109/MCOM.2017.1601089

[2]. Mao, Y., Sun, L., Liu, Y., & Wang, Y. (2020). Low-latency FoV-adaptive coding and streaming for interactive 360° video streaming. Proceedings of the 28th ACM International Conference on Multimedia (pp. 3696–3704). https://doi.org/10.1145/3394171.3413751

[3]. Zhang, X., Cheung, G., Zhao, Y., Le Callet, P., Lin, C., & Tan, J. Z. G. (2021). Graph learning based head movement prediction for interactive 360 video streaming. IEEE Transactions on Image Processing, 30, 4622–4636. https://doi.org/10.1109/TIP.2021.3073283

[4]. Wang, S., Yang, S., Su, H., Zhao, C., Xu, C., Qian, F., Wang, N., & Xu, Z. (2024). Robust saliency-driven quality adaptation for mobile 360-degree video streaming. IEEE Transactions on Mobile Computing, 23(2), 1312–1329. https://doi.org/10.1109/TMC.2023.3294698

[5]. Wang, S., Yang, S., Su, H., Zhao, C., Xu, C., Qian, F., Wang, N., & Xu, Z. (2024). Robust saliency-driven quality adaptation for mobile 360-degree video streaming. IEEE Transactions on Mobile Computing, 23(2), 1312–1329. https://doi.org/10.1109/TMC.2024.3360123

[6]. Tu, J., Chen, C., Yang, Z., Li, M., Xu, Q., & Guan, X. (2023). PSTile: Perception-sensitivity-based 360° tiled video streaming for industrial surveillance. IEEE Transactions on Industrial Informatics, 19(9), 9777–9789. https://doi.org/10.1109/TII.2022.3216812

[7]. Jiang, Z., Ji, B., Wu, F., Liu, Y., & Zhang, Y. (2020). Reinforcement learning based rate adaptation for 360-degree video streaming. IEEE Transactions on Broadcasting, 67(2), 409–423. https://doi.org/10.1109/TBC.2020.3034157

[8]. Zhang, J., Qin, Q., Wan, T., & Luo, X. (2022). Perception-based pseudo-motion response for 360-degree video streaming. IEEE Signal Processing Letters, 29, 1973–1977. https://doi.org/10.1109/LSP.2022.3200882

[9]. Park, S., Lee, J., & Choi, J. (2021). Mosaic: Advancing user quality of experience in 360-degree video streaming with machine learning. IEEE Transactions on Network and Service Management, 18(1), 1000–1015. https://doi.org/10.1109/TNSM.2020.3047821

[10]. Hou, X., Wang, S., Zhou, Z., Wang, Y., & Wu, D. (2020). Predictive adaptive streaming to enable mobile 360-degree and VR experiences. IEEE Transactions on Multimedia, 23, 716–731. https://doi.org/10.1109/TMM.2020.2990446

[11]. Zou, J., Hao, T., Yu, C., & Sun, H. (2019). Probabilistic tile visibility-based server-side rate adaptation for adaptive 360-degree video streaming. IEEE Journal of Selected Topics in Signal Processing, 14(1), 161–176. https://doi.org/10.1109/JSTSP.2019.2951538

[12]. Yuan, H., Chen, Y., Yu, M., & Zhu, W. (2019). Spatial and temporal consistency-aware dynamic adaptive streaming for 360-degree videos. IEEE Journal of Selected Topics in Signal Processing, 14(1), 177–193. https://doi.org/10.1109/JSTSP.2019.2952001

[13]. Hassan, S. M. H. U., Lee, Y., & Kim, M. (2023). User profile-based viewport prediction using federated learning in real-time 360-degree video streaming. 2023 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (pp. 1–7). https://doi.org/10.1109/BMSB58369.2023.10211189

[14]. Chen, J., Sun, Y., He, D., & Wu, E. (2021). Sparkle: User-aware viewport prediction in 360-degree video streaming. IEEE Transactions on Multimedia, 23, 3853–3866. https://doi.org/10.1109/TMM.2020.3030440

[15]. Dong, P., Zhang, S., Zhang, H., & Xu, Y. (2023). Predicting long-term field of view in 360-degree video streaming. IEEE Network, 37(1), 26–33. https://doi.org/10.1109/MNET.123.2200371

[16]. Jin, Y., Zhang, Z., Wang, W., & Li, S. (2023). Ebublio: Edge-assisted multiuser 360° video streaming. IEEE Internet of Things Journal, 10(17), 15408–15419. https://doi.org/10.1109/JIOT.2023.3283859

[17]. Chen, J., Sun, Y., He, D., & Wu, E. (2023). Live360: Viewport-aware transmission optimization in live 360-degree video streaming. IEEE Transactions on Broadcasting, 69(1), 85–96. https://doi.org/10.1109/TBC.2022.3220616

[18]. Nguyen, A., & Yan, Z. (2023). Enhancing 360 video streaming through salient content in head-mounted displays. Sensors, 23(5), Article 2470. https://doi.org/10.3390/s23052470

[19]. Xu, X., Chen, L., & Liu, Y. (2023). Multi-features fusion based viewport prediction with GNN for 360-degree video streaming. 2023 IEEE International Conference on Metaverse Computing, Networking and Applications (pp. 57–64). https://doi.org/10.1109/MetaCom57706.2023.00020

[20]. Wang, W., Shen, J., Guo, F., Cheng, M., & Borji, A. (2018). Revisiting video saliency: A large-scale benchmark and a new model. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4894–4903). https://doi.org/10.1109/CVPR.2018.00514

[21]. Xu, M., Song, Y., Wang, J., Qiao, M., Huo, L., & Wang, Z. (2017). Predicting head movement in panoramic video: A deep reinforcement learning approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(11), 2693–2708. https://doi.org/10.1109/TPAMI.2018.2858783

[22]. Chopra, L., Chakraborty, D., & Mondal, A. (2021). PARIMA: Viewport adaptive 360-degree video streaming. Proceedings of the Web Conference 2021 (pp. 2379–2391). https://doi.org/10.1145/3442381.3449857

[23]. Li, J., Zhu, C., Xu, Z., Liu, Y., & Zhang, Z. (2023). Spherical convolution empowered FoV prediction in 360-degree video multicast with limited FoV feedback. IEEE Transactions on Circuits and Systems for Video Technology, 33(12), 7245–7259. https://doi.org/10.1109/TCSVT.2023.3275976

[24]. Li, C., Xu, M., Zhang, S., & Le Callet, P. (2019). Very long term field of view prediction for 360-degree video streaming. 2019 IEEE Conference on Multimedia Information Processing and Retrieval (pp. 297–302). https://doi.org/10.1109/MIPR.2019.00061

[25]. Fan, C., Lee, J., Lo, W., Huang, C., Chen, K., & Hsu, C. (2017). Fixation prediction for 360° video streaming in head-mounted virtual reality. Proceedings of the 27th Workshop on Network and Operating Systems Support for Digital Audio and Video (pp. 1–6). https://doi.org/10.1145/3083165.3083181