Research Article
Open access
Published on 23 October 2023
Download pdf
Export citation

Dimensionality reduction techniques for high dimensional data: State of the art

Suixin Jiang *,1,
  • 1 American University

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2755-2721/13/20230707

Abstract

In the past decades, tons of data have been generated every single day as people increasingly rely on electronic products and networks in their lives. The developments in techniques and storage capabilities provide a fundamental condition for analyzing those huge-volumed data. Each captured feature represents a dimension of the data. So far, high dimensional data analysis has become a challenging task in various study fields. Redundant and irrelevant features can be removed by using different dimensionality reduction techniques. Effective information extraction can be achieved by using the proper method. This paper reviews the most widely used dimensionality reduction techniques and their application fields. It can be found that though DRTs have been successfully applied to many areas (i.e., image, audio/video data, biomedical), DRTs still need to be improved and developed to achieve better classification and prediction accuracy. Inter-method combinations will remain the focus of research in the future. Computation time and cost may not be a limitation anymore as the computation power of the computer is still developing. So, the development of the algorithm is becoming particularly important. This study provides a brief introduction to widely used DRTs and their variants, it will be helpful for understanding HDD analysis, and more DRTs will be researched in future work.

Keywords

high dimensional data, dimensionality reduction technique

[1]. D. Amaratunga and J. Cabrera, "High-dimensional data, " J. Natl. Sci. Found. 44 (1), 3–9 (2016).

[2]. W. K. Vong, A. T. Hendrickson, D. J. Navarro and A. Perfors, "Do additional features help or hurt category learning? the curse of dimensionality in human learners, " Cogn. Sci. 43 (3), (2019) e12724.

[3]. I. K. Fodor, "A survey of dimension reduction techniques, " Lawrence Livermore National Lab., CA (US), 2002 Tech. rep..

[4]. J. Shlens, A tutorial on principal component analysis, Machine Learning, arXiv:1404.1100v1 (2014).

[5]. S. Ahmadkhani and P. Adibi, "Face recognition using supervised probabilistic principal component analysis mixture model in dimensionality reduction without loss framework, " IET Comput. Vision 10 (3), 193–201 (2016).

[6]. C. O. S. Sorzano, J. Vargas and A. Pascual Montano, "A survey of dimensionality reduction techniques, " Machine Learning, arXiv:1403.2877v1 (2014).

[7]. M. Hubert, P. J. Rousseeuw and K. V. Branden, "Robpca: a new approach to robust principal component analysis, " Technometrics. 47 (1), 64–79 (2005).

[8]. S. Serneels and T. Verdonck, "Principal component analysis for data containing outliers and missing elements, Comput. " Stat. Data Anal. 52 (3), 1712–1727 (2008).

[9]. R. Vidal, Y. Ma and S. Sastry, "Generalized principal component analysis (gpca), "IEEE Trans Pattern Anal Mach Intell. 27 (12),1945–1959 (2005).

[10]. T. Wang, I. Y. Gu and P. Shi, "Object Tracking Using Incremental 2D-pca Learning and Ml Estimation, "in: IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE. 1, I933–I936 (2007).

[11]. C. Z. Di, C. M. Crainiceanu, B. S. Caffo and N. M. Punjabi, "Multilevel functional principal component analysis, "Ann. Appl. Stat. 3 (1), 458–488 (2009).

[12]. C. Happ and S. Greven, "Multivariate functional principal component analysis for data observed on different (dimensional) domains, " J. Am. Stat. Assoc. 1–11 (2018).

[13]. T. Metsalu and J. Vilo, Clustvis: a web tool for visualizing clustering of multivariate data using principal component analysis and heatmap, Nucleic Acids Res. 43 (W1), W566–W570 (2015).

[14]. M. Journée, Y. Nesterov, P. Richtárik and R. Sepulchre, "Generalized power method for sparse principal component analysis, "J. Mach. Learn. Res. 11 (Feb), 517–553 (2010).

[15]. S. Yi, Z. Lai, Z. He, Y. Cheung and Y. Liu, "Joint sparse principal component analysis, " Pattern Recognit. 61, 524–536 (2017).

[16]. A. Tharwat, T. Gaber, A. Ibrahim and A. E. Hassanien, "Linear discriminant analysis: a detailed tutorial, "AI Commun. 30 (2), 169–190 (2017).

[17]. C. N. Li, Y. H. Shao, W. Yin and M. Z. Liu, "Robust and Sparse Linear Discriminant Analysis via an Alternating Direction Method of Multipliers, "IEEE Transactions on Neural Networks and Learning Systems. 31 (3), 915-926 (2020).

[18]. M. Zhu and A. M. Martinez, "Subclass discriminant analysis, IEEE Trans, " Pattern Anal. Mach. Intell. 28 (8), 1274–1286 (2006).

[19]. N. Gkalelis, V. Mezaris and I. Kompatsiaris, "Mixture subclass discriminant analysis", IEEE Signal Process. Lett. 18 (5), 319–322 (2011).

[20]. J. Ye, R. Janardan and Q. Li, “Two-dimensional Linear Discriminant Analysis,” in Advances in Neural Information Processing Systems 17, edited by L. Saul, Y. Weiss and L. Bottou, (NIPS 2004), pp. 1569–1576.

[21]. C. N. Li, Y. H. Shao, W. J. Chen and N. Y. Deng, "Generalized two-dimensional linear discriminant analysis with regularization, " Neural Networks. 142, 73-91 (2021).

[22]. R. Ran, B. Fang, X. Wu and S. Zhang, "A simple and effective generalization of exponential matrix discriminant analysis and its application to face recognition, IEICE Trans. "Inf. Syst. 101 (1), 265–268 (2018).

[23]. A. Sharma, K. K. Paliwal, S. Imoto and S. Miyano, "A feature selection method using improved regularized linear discriminant analysis, " machine Vision and Applications. 25, 775–786 (2014).

[24]. J. Wen, X. Z. fang, J. R. Cui, L. K. Fei, K. Yan, Y. Chen and Y. Xu, "Robust Sparse Linear Discriminant Analysis," IEEE Transactions on Circuits and Systems for Video Technology, 29 (2), 390–403 (2019).

[25]. S. T. Roweis and L. K. Saul, "Nonlinear Dimensionality Reduction by Locally Linear Embedding," Science, New Series. 290 (5500), 2323–2326 (2000).

[26]. B. Ghojogh, A. Ghodsi, F. Karray and M. Crowley, "Locally Linear Embedding and its Variants: Tutorial and Survey," Machine Learning, arXiv:2011.10925v1 (2020).

[27]. O. Kouropteva, O. Okun and M. Pietikäinen, “Supervised Locally Linear Embedding Algorithm for Pattern Recognition.” in Iberian Conference on Pattern Recognition and Image Analysis, Springer, 2003, pp. 386–394.

[28]. R. Hettiarachchi and J. F. Peters, "Multi-manifold lle learning in pattern recognition," Pattern Recognition. 48 (9), 2947–2960 (2015).

[29]. Y. X. Lang, Z. Qin and X. Li, "An Effective Gene Selection Method for Cancer Classification Based on Locally Linear Embedding," Journal of Computational and Theoretical Nanoscience. 8, 2108–2111 (2011).

[30]. X. P. Min, H. Wang, Z. W. Yang, S. X. Ge, J. Zhang and N. X. Shao, "Relevant Component Locally Linear Embedding Dimensionality Reduction for Gene Expression Data Analysis", Metallurgical & Mining Industry. 4, 186–194 (2015).

[31]. L. Sun, J. C. Xu, W. wang and Y. Yin, "Locally linear embedding and neighborhood rough set-based gene selection for gene expression data classification", Genetics and Molecular Research. 15 (3), (2016).

[32]. J. C. Xu, H. Y. Mu, Y. Wang and F. Z. Huang, "Feature Genes Selection Using Supervised Locally Linear Embedding and Correlation Coefficient for Microarray Classification," Computational and Mathematical Methods in Medicine. 2018.

Cite this article

Jiang,S. (2023). Dimensionality reduction techniques for high dimensional data: State of the art. Applied and Computational Engineering,13,37-45.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 5th International Conference on Computing and Data Science

Conference website: https://2023.confcds.org/
ISBN:978-1-83558-017-2(Print) / 978-1-83558-018-9(Online)
Conference date: 14 July 2023
Editor:Roman Bauer, Marwan Omar, Alan Wang
Series: Applied and Computational Engineering
Volume number: Vol.13
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).