Research Article
Open access
Published on 12 October 2024
Download pdf
Xu,Y. (2024). Evolution and future directions of Artificial Intelligence Generated Content (AIGC): A comprehensive review. Applied and Computational Engineering,95,1-13.
Export citation

Evolution and future directions of Artificial Intelligence Generated Content (AIGC): A comprehensive review

Yihan Xu *,1,
  • 1 School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Huhan , China

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2755-2721/95/2024BJ0056

Abstract

Artificial Intelligence Generated Content (AIGC) has rapidly evolved, revolutionizing the creation of text, images, audio, and video content. Despite these advancements, research on the development process of AIGC technology remains scarce, necessitating a systematic discussion of its current state and future directions. So this paper delves into the significant advancements and foundational technologies driving AIGC, emphasizing the contributions of state-of-the-art models such as DALL-E 3 [1] and Sora [2]. We analyze the evolution of generative models from single-modal approaches to the current multimodal generative models. The paper further explores the application prospects of AIGC across various domains such as office work, art, education, and film, while addressing the existing limitations and challenges in the field. We propose potential improvement directions, including more efficient model architectures and enhanced multimodal capabilities. Emphasis is placed on the environmental impact of AIGC technologies and the need for sustainable practices. Our comprehensive review aims to provide researchers and professionals with a deeper understanding of AIGC, inspiring further exploration and innovation in this transformative domain.

Keywords

Artificial Intelligence Generated Content, Neural Networks, Multimodal Generative Models, Large AI Models

[1]. Betker Goh Jing Brooks Wang Li et al. Improving image generation with better captions. Computer Science https://cdn openai com/papers/dall-e-3 pdf. 2023;2(3):8.

[2]. Liu Zhang Li Yan Gao Chen et al. Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models [Internet]. arXiv; 2024 [cited 2024 Jun 21]. Available from: http://arxiv.org/abs/2402.17177

[3]. Introducing Meta Llama 3: The most capable openly available LLM to dateProc.Symp.Int.Conf.2nd [Internet]. Meta AI. [cited 2024 Jun 21]. Available from: https://ai.meta.com/blog/meta-llama-3/

[4]. Yu Xu Koh Luong Baid Wang et al. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation [Internet]. arXiv; 2022 [cited 2024 Jun 21]. Available from: http://arxiv.org/abs/2206.10789

[5]. Bao Xiang Yue He Zhu Zheng et al. Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models [Internet]. arXiv; 2024 [cited 2024 Jun 21]. Available from: http://arxiv.org/abs/2405.04233

[6]. Wang Fu He Hao Wu. A survey on large-scale machine learning. IEEE Transactions on Knowledge and Data Engineering. 2020;34(6):2574–94.

[7]. Li Gan Yang Yang Li Wang et al. Multimodal foundation models: From specialists to general-purpose assistants. Foundations and Trends® in Computer Graphics and Vision. 2024;16(1–2):1–214.

[8]. Mahajan Girshick Ramanathan He Paluri Li et al. Exploring the Limits of Weakly Supervised Pretraining. In 2018 [cited 2024 Jun 21]. p. 181–96. Available from: https://openaccess.thecvf.com/content_ECCV_2018/html/Dhruv_Mahajan_Exploring_the_Limits_ECCV_2018_paper.html

[9]. Deng Dong Socher Li Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee; 2009. p. 248–55.

[10]. Schuhmann Vencu Beaumont Kaczmarczyk Mullis Katta et al. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:211102114. 2021;

[11]. Sun Shrivastava Singh Gupta. Revisiting unreasonable effectiveness of data in deep learning era. In: Proceedings of the IEEE international conference on computer vision. 2017. p. 843–52.

[12]. Zhan Yu Wu Zhang Lu Liu et al. Multimodal image synthesis and editing: The generative AI era. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023;45(12):15098–119.

[13]. Stauffer Grimson. Adaptive background mixture models for real-time tracking. In: Proceedings 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat No PR00149) [Internet]. 1999 [cited 2024 Jun 22]. p. 246-252 Vol. 2. Available from: https://ieeexplore.ieee.org/abstract/document/784637

[14]. Bengio Ducharme Vincent. A Neural Probabilistic Language Model. In: Advances in Neural Information Processing Systems [Internet]. MIT Press; 2000 [cited 2024 Jun 21]. Available from: https://proceedings.neurips.cc/paper_files/paper/2000/hash/728f206c2a01bf572b5940d7d9a8fa4c-Abstract.html

[15]. Efros Leung. Texture synthesis by non-parametric sampling. In: Proceedings of the Seventh IEEE International Conference on Computer Vision [Internet]. 1999 [cited 2024 Jun 22]. p. 1033–8 vol.2. Available from: https://ieeexplore.ieee.org/abstract/document/790383

[16]. Pérez Gangnet Blake. Poisson Image Editing. In: Seminal Graphics Papers: Pushing the Boundaries, Volume 2 [Internet]. 1st ed. New York, NY, USA: Association for Computing Machinery; 2023 [cited 2024 Jun 22]. p. 577–82. Available from: https://doi.org/10.1145/3596711.3596772

[17]. Rumelhart Hinton Williams. Learning Internal Representations by Error Propagation, Parallel Distributed Processing, Explorations in the Microstructure of Cognition, ed. DE Rumelhart and J. McClelland. Vol. 1. 1986. Biometrika. 1986;71:599–607.

[18]. Elman. Finding Structure in Time.

[19]. Hochreiter Schmidhuber. Long short-term memory. Neural computation. 1997;9(8):1735–80.

[20]. Cho Van Merriënboer Gulcehre Bahdanau Bougares Schwenk et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:14061078. 2014;

[21]. Kingma Welling. Auto-encoding variational bayes. arXiv preprint arXiv:13126114. 2013;

[22]. Goodfellow Pouget-Abadie Mirza Xu Warde-Farley Ozair et al. Generative Adversarial Nets. In: Advances in Neural Information Processing Systems [Internet]. Curran Associates, Inc.; 2014 [cited 2024 Jun 21]. Available from: https://proceedings.neurips.cc/paper_files/paper/2014/hash/5ca3e9b122f61f8f06494c97b1afccf3-Abstract.html

[23]. Ho Jain Abbeel. Denoising Diffusion Probabilistic Models. In: Advances in Neural Information Processing Systems [Internet]. Curran Associates, Inc.; 2020 [cited 2024 Jun 21]. p. 6840–51. Available from: https://proceedings.neurips.cc/paper/2020/hash/4c5bcfec8584af0d967f1ab10179ca4b-Abstract.html

[24]. Kullback Leibler. On information and sufficiency. The annals of mathematical statistics. 1951;22(1):79–86.

[25]. Tolstikhin Bousquet Gelly Schoelkopf. Wasserstein Auto-Encoders [Internet]. arXiv; 2019 [cited 2024 Jun 22]. Available from: http://arxiv.org/abs/1711.01558

[26]. Higgins Matthey Pal Burgess Glorot Botvinick et al. beta-vae: Learning basic visual concepts with a constrained variational framework. ICLR (Poster). 2017;3.

[27]. van den Oord Vinyals kavukcuoglu. Neural Discrete Representation Learning. In: Advances in Neural Information Processing Systems [Internet]. Curran Associates, Inc.; 2017 [cited 2024 Jun 22]. Available from: https://proceedings.neurips.cc/paper/2017/hash/7a98af17e63a0ac09ce2e96d03992fbc-Abstract.html

[28]. Razavi van den Oord Vinyals. Generating Diverse High-Fidelity Images with VQ-VAE-2. In: Advances in Neural Information Processing Systems [Internet]. Curran Associates, Inc.; 2019 [cited 2024 Jun 22]. Available from: https://proceedings.neurips.cc/paper/2019/hash/5f8e2fa1718d1bbcadf1cd9c7a54fb8c-Abstract.html

[29]. Mirza Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:14111784. 2014;

[30]. Radford Metz Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:151106434. 2015;

[31]. Arjovsky Chintala Bottou. Wasserstein GAN [Internet]. arXiv; 2017 [cited 2024 Jun 22]. Available from: http://arxiv.org/abs/1701.07875

[32]. Zhu Park Isola Efros. Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks. In 2017 [cited 2024 Jun 22]. p. 2223–32. Available from: https://openaccess.thecvf.com/content_iccv_2017/html/Zhu_Unpaired_Image-To-Image_Translation_ICCV_2017_paper.html

[33]. Zhang Goodfellow Metaxas Odena. Self-attention generative adversarial networks. In: International conference on machine learning. PMLR; 2019. p. 7354–63.

[34]. Ho. Autoregressive Models in Deep Learning—A Brief Survey. George Ho. 2019;

[35]. Oord Kalchbrenner Kavukcuoglu. Pixel Recurrent Neural Networks. In: Proceedings of The 33rd International Conference on Machine Learning [Internet]. PMLR; 2016 [cited 2024 Jun 22]. p. 1747–56. Available from: https://proceedings.mlr.press/v48/oord16.html

[36]. van den Oord Kalchbrenner Espeholt kavukcuoglu Vinyals Graves. Conditional Image Generation with PixelCNN Decoders. In: Advances in Neural Information Processing Systems [Internet]. Curran Associates, Inc.; 2016 [cited 2024 Jun 22]. Available from: https://proceedings.neurips.cc/paper_files/paper/2016/hash/b1301141feffabac455e1f90a7de2054-Abstract.html

[37]. Esser Rombach Ommer. Taming Transformers for High-Resolution Image Synthesis. In 2021 [cited 2024 Jun 22]. p. 12873–83. Available from: https://openaccess.thecvf.com/content/CVPR2021/html/Esser_Taming_Transformers_for_High-Resolution_Image_Synthesis_CVPR_2021_paper.html?ref=

[38]. Song Meng Ermon. Denoising Diffusion Implicit Models [Internet]. arXiv; 2022 [cited 2024 Jun 22]. Available from: http://arxiv.org/abs/2010.02502

[39]. Rombach Blattmann Lorenz Esser Ommer. High-Resolution Image Synthesis With Latent Diffusion Models. In 2022 [cited 2024 Jun 22]. p. 10684–95. Available from: https://openaccess.thecvf.com/content/CVPR2022/html/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.html

[40]. Ho Salimans. Classifier-Free Diffusion Guidance [Internet]. arXiv; 2022 [cited 2024 Jun 22]. Available from: http://arxiv.org/abs/2207.12598

[41]. Song Sohl-Dickstein Kingma Kumar Ermon Poole. Score-Based Generative Modeling through Stochastic Differential Equations [Internet]. arXiv; 2021 [cited 2024 Jun 22]. Available from: http://arxiv.org/abs/2011.13456

[42]. Bahdanau Cho Bengio. Neural Machine Translation by Jointly Learning to Align and Translate [Internet]. arXiv; 2016 [cited 2024 Jun 22]. Available from: http://arxiv.org/abs/1409.0473

[43]. Vaswani Shazeer Parmar Uszkoreit Jones Gomez et al. Attention is All you Need. In: Advances in Neural Information Processing Systems [Internet]. Curran Associates, Inc.; 2017 [cited 2024 Jun 22]. Available from: https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

[44]. Zhang Xu Li Zhang Wang Huang et al. StackGAN: Text to Photo-Realistic Image Synthesis With Stacked Generative Adversarial Networks. In 2017 [cited 2024 Jun 22]. p. 5907–15. Available from: https://openaccess.thecvf.com/content_iccv_2017/html/Zhang_StackGAN_Text_to_ICCV_2017_paper.html

[45]. Xu Zhang Huang Zhang Gan Huang et al. AttnGAN: Fine-Grained Text to Image Generation With Attentional Generative Adversarial Networks. In 2018 [cited 2024 Jun 22]. p. 1316–24. Available from: https://openaccess.thecvf.com/content_cvpr_2018/html/Xu_AttnGAN_Fine-Grained_Text_CVPR_2018_paper.html

[46]. Kang Zhu Zhang Park Shechtman Paris et al. Scaling Up GANs for Text-to-Image Synthesis. In 2023 [cited 2024 Jun 22]. p. 10124–34. Available from: https://openaccess.thecvf.com/content/CVPR2023/html/Kang_Scaling_Up_GANs_for_Text-to-Image_Synthesis_CVPR_2023_paper.html

[47]. Dosovitskiy Beyer Kolesnikov Weissenborn Zhai Unterthiner et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [Internet]. arXiv; 2021 [cited 2024 Jun 22]. Available from: http://arxiv.org/abs/2010.11929

[48]. Chang Zhang Barber Maschinot Lezama Jiang et al. Muse: Text-To-Image Generation via Masked Generative Transformers [Internet]. arXiv; 2023 [cited 2024 Jun 22]. Available from: http://arxiv.org/abs/2301.00704

[49]. Ramesh Pavlov Goh Gray Voss Radford et al. Zero-Shot Text-to-Image Generation. In: Proceedings of the 38th International Conference on Machine Learning [Internet]. PMLR; 2021 [cited 2024 Jun 22]. p. 8821–31. Available from: https://proceedings.mlr.press/v139/ramesh21a.html

[50]. Radford Kim Hallacy Ramesh Goh Agarwal et al. Learning Transferable Visual Models From Natural Language Supervision. In: Proceedings of the 38th International Conference on Machine Learning [Internet]. PMLR; 2021 [cited 2024 Jun 22]. p. 8748–63. Available from: https://proceedings.mlr.press/v139/radford21a.html

[51]. Nichol Dhariwal Ramesh Shyam Mishkin McGrew et al. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models [Internet]. arXiv; 2022 [cited 2024 Jun 22]. Available from: http://arxiv.org/abs/2112.10741

[52]. Saharia Chan Saxena Li Whang Denton et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. Advances in Neural Information Processing Systems. 2022 Dec 6;35:36479–94.

[53]. Raffel Shazeer Roberts Lee Narang Matena et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research. 2020;21(140):1–67.

[54]. Radford Narasimhan Salimans Sutskever others. Improving language understanding by generative pre-training. 2018;

[55]. Radford Wu Child Luan Amodei Sutskever et al. Language models are unsupervised multitask learners. OpenAI blog. 2019;1(8):9.

[56]. Brown Mann Ryder Subbiah Kaplan Dhariwal et al. Language Models are Few-Shot Learners. In: Advances in Neural Information Processing Systems [Internet]. Curran Associates, Inc.; 2020 [cited 2024 Jun 22]. p. 1877–901. Available from: https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html

[57]. Achiam Adler Agarwal Ahmad Akkaya Aleman et al. Gpt-4 technical report. arXiv preprint arXiv:230308774. 2023;

[58]. Chen Tworek Jun Yuan Pinto Kaplan et al. Evaluating Large Language Models Trained on Code [Internet]. arXiv; 2021 [cited 2024 Jun 22]. Available from: http://arxiv.org/abs/2107.03374

[59]. Bass News. Microsoft is rolling out generative AI in Windows and Office app [Internet]. [cited 2024 Jun 22]. Available from: https://techxplore.com/news/2023-09-microsoft-generative-ai-windows-office.html

[60]. What’s next for AI in 2024Proc.Symp.Int.Conf.2nd [Internet]. MIT Technology Review. [cited 2024 Jun 22]. Available from: https:// www.technologyreview. com/2024/01/04/1086046/whats-next-for-ai-in-2024/

[61]. Ramesh Dhariwal Nichol Chu Chen. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:220406125. 2022;1(2):3.

[62]. Colorize Photo | Try Free | Realistic ColorsProc.Symp.Int.Conf.2nd [Internet]. [cited 2024 Jun 22]. Available from: http://www.palette.fm

[63]. Lewkowycz Andreassen Dohan Dyer Michalewski Ramasesh et al. Solving Quantitative Reasoning Problems with Language Models. Advances in Neural Information Processing Systems. 2022 Dec 6;35:3843–57.

[64]. HeyGen Raises $60M Series A to Scale Visual Storytelling for Businesses | HeyGen BlogProc.Symp.Int.Conf.2nd [Internet]. [cited 2024 Jun 22]. Available from: https://www.heygen.com/article/announcing-our-series-a

[65]. Child Gray Radford Sutskever. Generating Long Sequences with Sparse Transformers [Internet]. arXiv; 2019 [cited 2024 Jun 22]. Available from: http://arxiv.org/abs/1904.10509

[66]. Zhang Lemoine Mitchell. Mitigating Unwanted Biases with Adversarial Learning. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society [Internet]. New York, NY, USA: Association for Computing Machinery; 2018 [cited 2024 Jun 22]. p. 335–40. (AIES ’18). Available from: https://dl.acm.org/doi/10.1145/3278721.3278779

[67]. Lapuschkin Wäldchen Binder Montavon Samek Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nat Commun. 2019 Mar 11;10(1):1096.

[68]. Bourtoule Chandrasekaran Choquette-Choo Jia Travers Zhang et al. Machine unlearning. In: 2021 IEEE Symposium on Security and Privacy (SP). IEEE; 2021. p. 141–59.

[69]. Su Zhu Gao Song. Utilizing Greedy Nature for Multimodal Conditional Image Synthesis in Transformers. IEEE Transactions on Multimedia. 2024;26:2354–66.

[70]. Patterson Gonzalez Le Liang Munguia Rothchild et al. Carbon Emissions and Large Neural Network Training [Internet]. arXiv; 2021 [cited 2024 Jun 22]. Available from: http://arxiv.org/abs/2104.10350

[71]. Henderson Hu Romoff Brunskill Jurafsky Pineau. Towards the systematic reporting of the energy and carbon footprints of machine learning. Journal of Machine Learning Research. 2020;21(248):1–43.

Cite this article

Xu,Y. (2024). Evolution and future directions of Artificial Intelligence Generated Content (AIGC): A comprehensive review. Applied and Computational Engineering,95,1-13.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 6th International Conference on Computing and Data Science

Conference website: https://2024.confcds.org/
ISBN:978-1-83558-641-9(Print) / 978-1-83558-642-6(Online)
Conference date: 12 September 2024
Editor:Alan Wang, Roman Bauer
Series: Applied and Computational Engineering
Volume number: Vol.95
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).