
A Review of 3D Human Pose Estimation
- 1 Northeastern University, CPS, Boston, Massachusetts, USA, 002199
* Author to whom correspondence should be addressed.
Abstract
Visual perception and human body recognition are fundamental capabilities required for effective and safe interactions between artificial intelligence (AI), computer vision, and humans in real-world scenarios. Recent groundbreaking developments in AI and computer vision have resulted in major advancements in human body recognition technology. However, research in human body recognition is still in the early stages of the product lifecycle. Identifying the three-dimensional locations of the joints in the human body from pictures or videos is known as 3D posture estimation. Although it is widely used in areas like human motion analysis and robotics, it continues to be a difficult task due to challenges such as depth ambiguity and the scarcity of robust datasets. Over the past decade, numerous methods have been developed, many of which are based on deep learning, significantly improving the performance of existing benchmarks. A comprehensive literature review of this field is crucial for future development. However, in nowadays,more and more such research has mainly concentrated on traditional techniques, requirement for a comprehensive examination of tools based on deep learning. This paper delivers a thorough overview of current deep learning-based 3D pose estimation algorithms, outlining their advantages and limitations while providing a detailed understanding of the field. It also explores commonly used benchmark datasets and methods for analyzing human poses in unlabeled field images, providing a thorough comparative analysis. Finally, insights are provided to aid in the design of future models and algorithms.
Keywords
Visual perception, Body recognition, 3D human pose estimation, Benchmark datasets.
[1]. Nikolaos Sarafianos, Bogdan Boteanu, Bogdan Ionescu, Ioannis A. Kakadiaris, 3D Human pose estimation: A review of the literature and analysis of covariates, Computer Vision and Image Understanding, Volume 152, 2016, Pages 1-20, ISSN 1077-3142, https://doi.org/10.1016/j.cviu.2016.09.002.
[2]. Jinbao Wang, Shujie Tan, Xiantong Zhen, Shuo Xu, Feng Zheng, Zhenyu He, Ling Shao, Deep 3D human pose estimation: A review, Computer Vision and Image Understanding, Volume 210, 2021, 103225, ISSN 1077-3142, https://doi.org/10.1016/j.cviu.2021.103225.
[3]. Wu Liu, Qian Bao, Yu Sun, and Tao Mei. 2022. Recent Advances of Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective. ACM Comput. Surv. 55, 4, Article 80 (April 2023), 41 pages. https://doi.org/10.1145/3524497
[4]. 3D human pose estimation in video with temporal convolutions and semi-supervised training. (n.d.). https://dariopavllo.github.io/VideoPose3D/
[5]. Jinbao Wang, Shujie Tan, Xiantong Zhen, Shuo Xu, Feng Zheng, Zhenyu He, Ling Shao, Deep 3D human pose estimation: A review, Computer Vision and Image Understanding, Volume 210, 2021, 103225, ISSN 1077-3142, https://doi.org/10.1016/j.cviu.2021.103225.
[6]. Nie, X., Zhang, J., Yan, S., & Feng, J. (2019, August 24). Single-Stage Multi-Person pose machines. arXiv.org. https://arxiv.org/abs/1908.09220
[7]. Cao, Z., Hidalgo, G., Simon, T., Wei, S., & Sheikh, Y. (2018, December 18). OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. arXiv.org. https://arxiv.org/abs/1812.08008
[8]. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., & Black, M. J. (2023). SMPL: a Skinned Multi-Person Linear model. In ACM eBooks (pp. 851–866). https://doi.org/10.1145/3596711.3596800
[9]. Güler, R. A., Neverova, N., & Kokkinos, I. (2018). DensePOse: Dense human pose estimation in the wild. https://openaccess.thecvf.com/content_cvpr_2018/html/Guler_DensePose_Dense_Human_CVPR_2018_paper.html
[10]. Y. Xue, J. Chen, X. Gu, H. Ma and H. Ma, "Boosting Monocular 3D Human Pose Estimation With Part Aware Attention, " in IEEE Transactions on Image Processing, vol. 31, pp. 4278-4291, 2022, doi: 10.1109/TIP.2022.3182269.
[11]. Pavlakos, G., Zhou, X., Derpanis, K. G., & Daniilidis, K. (2017). Coarse-to-fine volumetric prediction for single-image 3D human pose. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7025-7034).
[12]. Luvizon, D. C., Tabia, H., & Picard, D. (2019). Human pose regression by combining indirect part detection and contextual information. Computers & Graphics, 85, 15-22.
[13]. Zhao, W., Tian, Y., Ye, Q., Jiao, J., & Wang, W. (2021, September 17). GRAFormer: Graph Convolution Transformer for 3D pose Estimation. arXiv.org. https://arxiv.org/abs/2109.08364
[14]. Lee, K., Lee, I., & Lee, S. (2018). Propagating lstm: 3d pose estimation based on joint interdependency. In Proceedings of the European conference on computer vision (ECCV) (pp. 119-135)
[15]. Cai, Y., Ge, L., Liu, J., Cai, J., Cham, T. J., Yuan, J., & Thalmann, N. M. (2019). Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 2272-2281).
[16]. Pavlakos, G., Kolotouros, N., & Daniilidis, K. (2019). Texturepose: Supervising human mesh estimation with texture consistency. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 803-812).
[17]. Zhang, J., Tu, Z., Yang, J., Chen, Y., & Yuan, J. (2022). MIXSTE: SEq2SEQ Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in video. https://openaccess.thecvf.com/content/CVPR2022/html/Zhang_MixSTE_Seq2seq_Mixed_Spatio-Temporal_Encoder_for_3D_Human_Pose_Estimation_CVPR_2022_paper.html
[18]. Zhao, Q., Zheng, C., Liu, M., Wang, P., & Chen, C. (2023). PoseFormerV2: Exploring frequency domain for efficient and robust 3D human pose estimation. https://openaccess.thecvf.com/content/CVPR2023/html/Zhao_PoseFormerV2_Exploring_Frequency_Domain_for_Efficient_and_Robust_3D_Human_CVPR_2023_paper.html
[19]. Y. Zhang, C. Wang, X. Wang, W. Liu and W. Zeng, "VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the Wild, " in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 2613-2626, 1 Feb. 2023, doi: 10.1109/TPAMI.2022.3163709.
[20]. Zczcwh. (n.d.). DL-HPE/3D_dataset at main. zczcwh/DL-HPE. GitHub. https://github.com/zczcwh/DL-HPE/tree/main/3D_dataset
Cite this article
Zhai,L. (2024). A Review of 3D Human Pose Estimation. Applied and Computational Engineering,109,44-49.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 2nd International Conference on Machine Learning and Automation
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).