Research on the Application of Generative Artificial Intelligence in Games

1. Introduction

With the rapid advancement of artificial intelligence technology, generative AI is gradually showing its potential power across various fields. In the game development industry, it can particularly drive transformative changes [1]. The game development industry involves a multidimensional and complex process. With the properties of vast state space and high complexity, game development has become an excellent benchmark for evaluating techniques [2]. Starting from building the game world, character design, mechanics development, story planning, sound creation, and interaction design, all these require creative effort as well as heavy time and cost. Especially in the early stages of creation, how to generate new game scenes, characters, or storylines is often the most challenging and innovative part. The other stages of development are equally intricate and time-consuming. Whether designing a unique game character or build a vivid virtual world, developers usually need extensive experimentation and revisions to arrive at the most suitable design solution. At the same time, as the gaming market becomes increasingly competitive, developers face higher expectations and demands from players. Consequently, how to balance creativity with improved development efficiency has become a critical challenge in game development.

Traditional game development often requires a lot of time and manpower due to its diverse processes and complex content. With the emergence of generative artificial intelligence, game developers can use more automated and intelligent methods to greatly accelerate this process, improve work efficiency, and at the same time gain more inspiration from the creative process. Generative artificial intelligence is based on the rules of algorithmic models, uses existing data to train the model, and uses user input to autonomously generate content with certain logic and coherence, and the output can include graphics, text, audio and other forms. Its basic principle is to enable computers to visualize relatively abstract programming instructions and "generate" new content through deep learning models, especially generative adversarial networks, variational autoencoders and other technologies. For example, game developers can create character images, generate realistic character dialogues, write plots, generate scene music, etc. through pre-trained generative models. For example, create AI characters using genetic programming, provide interactive simulated dialogues using generative AI scaffolding [3]. This technology can provide powerful auxiliary tools for game development, which can not only reduce development costs, development thresholds and development cycles, but also inspire developers to have more creativity and inspiration in content creation [4].

The main body of this article is divided into three parts. First, it briefly introduces the development history of the basic technology of generative artificial intelligence. At the same time, it discusses the role and advantages of generative artificial intelligence in different aspects of game development, and gives the mainstream generative model. The second part provides examples for the first part through game examples, and explores more possibilities of generative artificial intelligence in the game industry. The third part will point out some challenges that generative artificial intelligence still faces.

2. Overview of Generative Artificial Intelligence Technology and Analysis of Its Advantages in Assisting Game Development

2.1. Overview of the Development of Generative Model Technology

The following article will provide an overview of generative technology in two areas: natural language processing (NLP) and computer vision (CV).

2.1.1. Natural Language Processing

The goal of natural language processing is to enable computers to understand, generate, and interact with human language. Starting from the early statistical language model n-gram, it searches for the best sequence by learning the word distribution. However, this method has great limitations, especially when dealing with the fuzziness, ambiguity and contextual relationships of language, and it is difficult to achieve ideal results. With the introduction of probabilistic and statistical methods, NLP technologies based on probabilistic and statistical models, such as hidden Markov models (HMM) and conditional random fields (CRF), are used in tasks such as speech recognition, named entity recognition (NER), and machine translation. Remarkable results have been achieved. However, statistical methods still have the problem of not being flexible enough to handle complex language structures, which limits their effectiveness in practical applications. Recently, with the development of deep learning, NLP has ushered in revolutionary changes. Models such as deep neural networks (DNN) and recurrent neural networks (RNN), especially long short-term memory (LSTM) networks and gated recurrent unit (GRU) networks, can effectively capture contextual information in long sequence data, significantly improving NLP performance. Later, the BERT (Bidirectional Encoder Representations from Transformers) model proposed by Google made technology based on pre-trained language models become the mainstream method of NLP. Transformer is a deep learning model based on the attention mechanism. Unlike traditional RNN and LSTM, Transformer completely abandons the loop structure and relies on the self-attention mechanism (Self-Attention) and parallelization to process sequence data. It consists of two parts: encoder and decoder. Each encoding layer of the encoder includes a self-attention sublayer and a feedforward neural network sublayer. The encoder is responsible for processing the input sequence and generating a series of hidden representations. Each decoding layer of the decoder contains three sublayers: a self-attention sublayer, an encoder-decoder attention sublayer, and a feedforward neural network sublayer. The decoder gradually generates the target sequence based on the representation generated by the encoder and the previously generated output. It has the advantages of strong parallelization capability, strong long-distance dependency capture capability, and high flexibility. It has achieved great success in the field of natural language processing and has become the infrastructure for many tasks, such as pre-trained models such as BERT, GPT, and T5.

BERT is a pre-trained language model proposed by Google in 2018. Its core feature is bidirectional encoding. During training, it randomly masks some words through the masked language model (MLM) and requires the model to predict the masked words based on the context, thereby achieving a comprehensive understanding of the context, which has promoted the widespread application of pre-trained models in the field of natural language processing. The emergence and development of GPT, BERT, etc. are leading the trend of NLP.

2.1.2. Computer Vision

The goal of computer vision is to enable computers to understand and interpret the real world through digital images or videos. Early computer vision research focused on basic technologies such as image processing, edge detection, and feature extraction, but the limitations of these methods are very obvious. Later, with the introduction of the concept of deep learning, the generative adversarial network (GAN) was pioneeringly proposed, and CV made a major breakthrough.

GAN is a deep learning model that generates data through adversarial training. It consists of two parts: the generator and the discriminator. The generator is responsible for generating realistic data to "deceive" the discriminator, while the discriminator determines whether the data is real. The two compete with each other during training, the generator continuously improves the authenticity of the generated data, and the discriminator continuously improves the discrimination ability. Finally, when the generator can generate sufficiently realistic data, the adversarial process reaches a balance. It is widely used in image generation, video generation, data enhancement and other fields, which has greatly promoted the development of generative models.

Later, variational autoencoders (VAE) based on mapping probability distribution and diffusion models that generate data by gradually adding noise and denoising processes were also developed, which can more finely control the generation process and make the generated images more detailed. In 2020, the Vision Transformer (ViT) proposed by Google enabled CV to be processed in an NLP way. It divides the image into fixed-size tiles, flattens each tile and maps it to a feature vector, and then uses these features as the input of the Transformer. Although there is no convolution operation of the traditional convolutional neural network (CNN), it is even better in tasks such as image classification and object detection. The emergence of ViT provides a new idea for computer vision and promotes the development of Transformer-based visual models. The masked autoencoder (MAE) is a self-supervised learning method applied to the field of vision. Its core idea is to randomly mask part of the image area and train the model to reconstruct the complete image by observing only part of the tiles. This method is similar to the masked language model in NLP (such as BERT), which introduces a masking mechanism in visual tasks. It can improve the performance of the visual Transformer model, and effectively reduce training time and data requirements, demonstrating the strong potential of self-supervised learning in computer vision.

2.2. Game development process advantages analysis

This section will break down the steps of game development and analyze the advantages of generative AI-assisted development in different steps.

2.2.1. Content Generation

Generative AI can assist in designing game content. According to the developer's requirements, it can generate game scenes, character designs or storylines under different worldviews. It also maintains the fluency of the language within the game world and the consistency of the worldview. In addition, generative AI can also provide R&D personnel with inspiration and materials for content creation [5].

2.2.2. Coding

Traditional game production, especially the development of large-scale games, has a very high technical threshold and requires the writing of complex algorithms and a large number of programs, which is very labor-intensive [6]. Generative AI models can be used as a tool for code generation, especially when a large amount of repetitive work is required. AI can learn from existing code samples and automatically generate certain basic functions in the game [7], such as enemy behavior logic, item drop mechanism, etc., thereby greatly improving development efficiency and assisting developers to complete the originally time-consuming process of writing and optimizing code, greatly reducing the threshold for code writing, allowing developers to further exert their creativity, and at the same time liberating developers from writing code, saving time and energy for them to complete more creative work.

2.2.3. Plot and dialogue generation

In traditional game production, screenwriters need to invest a lot of time and energy to design plots and write dialogues [8]. Generative AI can automatically generate logical and in-depth plot trends and character dialogues based on the theme and settings of the game. At the same time, it can also flexibly adjust the plot based on the player's game progress to create a personalized gaming experience.

2.2.4. Game dubbing sound effects production

Various sound effects, character voices, and scene music in the game are important factors in improving player immersion and enhancing the game atmosphere. However, the workload of creating music, hiring voice actors, and producing sound effects is often very large and expensive. Generative AI can automatically generate matching music and sound effects according to the emotional needs of the game scene, and can even simulate complex sound effects such as ambient sound and character voices, thereby greatly reducing the threshold of development costs [9].

2.2.5. Player experience optimization

Artificial intelligence can also intelligently adjust the difficulty, rhythm and plot of the game by analyzing the player's behavior data to enhance the player's sense of participation. For example, AI can automatically adjust the difficulty of the enemy according to the player's operating habits, or optimize the plot route in the game according to the player's choice [10].

3. Application game examples and new possibilities

3.1. Game Examples: Echoes of somewhere

Echoes of somewhere is an experimental point-and-click adventure game, created by a team of industry veterans in their free time. This game uses a large amount of AI-generated content for development. As an experimental game, it is a good interpretation of the application and advantages of generative artificial intelligence in game development [11].

In the early stages of game development, the developer used Midjourney and stable diffusion to create an AI character design model sheet with character turnover images, and completed the character design draft after many adjustments. In terms of modeling, the developer used side and front views to obtain the character's first draft in Modo, and created deformation maps for each projection to ensure that the front, side and back third views matched as much as possible on the mesh. At the same time, the deformation map was UV-unwrapped, and the subsequent model lock developer used Mixamo to manipulate the character. After that, the developer used fSpy to assist in the fusion of 2D background and 3D character. It took the developer only 33 hours to complete these tasks, and according to the developer himself, AI saved him at least 5 days of work.

In terms of code development, the author used ChatGPT to assist in code development. The author took the code from it and copied and pasted it into his own code, and they worked properly and did exactly what the developer wanted them to do without having to Google anything. Specific content includes but is not limited to audio reconstruction of footsteps, camera controllers in the world, etc.

In terms of sound effects production, the developer used Audition for production and Elevenlabs for AI text-to-speech. The developer just needs to copy the original text into it, select the correct actor and click Generate.

In terms of graphics, the developer used Midjourney to create icons, modo for modeling and rendering icons, applied Midjourney to render the game as a whole, and used Magnific AI to enlarge the game's low-resolution background images to ultra-high resolution of 4k+. For Echoes of Somewhere, almost every background image in the game was created in Magnific based on Midjourney/Stable Diffusion (Scenario or Leonardo), which greatly helped to integrate art from different sources.

In terms of UI design, the developer also relied on Midjourney to complete the initial design, and then used Photoshop for further optimization.

In terms of character dialogue, developers use ChatGPT to let them play human or non-human characters in the game to generate dialogue.

This project well reflects the advantages of generative artificial intelligence in lowering barriers and saving costs, and at the same time shows us how generative artificial intelligence can help different parts of game development. In addition, during the development process, AI still has limitations, and the clarity of the finished product, modeling accuracy, and sound accuracy are all partially lacking, and multiple debugging or manual adjustments are still required.

3.2. The application possibilities of generative artificial intelligence - taking “Treacherous Waters Online” as an example

Generative AI can not only provide assistance in the game development process, but also provide new game modes and personalized game experience in the game play process. The game "Ni Shui Han" developed with the help of NetEase Fuxi AI contains many new directions of generative AI applications, which is of reference significance [12].

Intelligent NPC: Based on the NLP language processing function, the NPC in the game can communicate with the player autonomously, understand the player's intentions, and give logical action feedback. At the same time, it can remember the dialogue for self-evolution, change its own behavior logic, and thus affect the final plot direction.

Virtual human generation: The face-pinching system in the game can not only change the character image by describing or inputting pictures, but also allow players to create highly personalized virtual characters, and use generative AI to generate appearance, voice, personality, life experience, to create a virtual character unique to the player.

Intelligent scene generation: Based on CV image generation technology, players can customize scenes through hand-drawn lines to improve the player's immersion in the game.

In addition, generative AI can also generate AI competitive or collaborative companions, automatically update the virtual world, etc., to provide more creative modes for the game.

4. Remaining challenges

There are many challenges in both generative AI itself and its application in game development.

The first is the problem of income distribution and job competition. Generative AI greatly reduces the workload of developers, which leads to the problem of technological unemployment. At the same time, the popularity of generative AI will greatly change the distribution of wealth, which may make the problem of inequality more complicated.

At the same time, the high development barriers of generative AI will lead to monopoly problems. Different companies have different levels of models, and the finished products will also be very different.

Intellectual property issues. In essence, the creation of generative AI is just a reorganization of learning materials. Whether the data for training AI is fully authorized is also a problem worthy of attention.

Security and privacy issues. Generative AI may generate a lot of false information and mislead the public. At the same time, a large amount of data is required for training generative AI, and some of these data may contain some users' private information, causing losses to users.

In addition, there are ethical issues. The training materials of generative AI come from human works, which may contain discriminatory factors or provide inappropriate remarks under the inducement of users.

5. Conclusion

Although there are still challenges, generative AI has shown great potential in game development, bringing a new perspective to game design, content generation, and player experience. Generative AI can significantly reduce the workload of game development, reduce repetitive work, enable developers to focus on development and innovation, lower the threshold for game production, and reduce cost constraints for a large number of creative creators to make better games. For players, the highly personalized experience brought by generative AI enables each player to experience a unique game journey and create a more immersive and realistic virtual world. In summary, generative AI has brought far-reaching influence and great changes to the game industry in terms of game development and player experience, and has created a more diversified and intelligent direction for the future development of the game industry.

References

[1]. Taylor, S. (2021). Q&A with James Gwertzman, Seattle gaming vet who just joined Andreessen Horowitz as an investor. Geek Wire, Q&A with James Gwertzman, Seattle gaming vet who just joined Andreessen Horowitz as an investor – GeekWire

[2]. Xia, B., Ye, X., Abuassba, A. O. M. (2020). Recent research on ai in games. 2020 International Wireless Communications and Mobile Computing (IWCMC), 505-510.

[3]. Togelius, J., Yannakakis, G. N. (2016). General general game AI//2016 IEEE Conference on Computational Intelligence and Games (CIG). IEEE, 1-8.

[4]. Chien, C. C., Chan, H. Y., Hou, H. T. (2024). Learning by playing with generative AI: design and evaluation of a role-playing educational game with generative AI as scaffolding for instant feedback interaction. Journal of Research on Technology in Education, 1-20.

[5]. Genesereth, M., Love, N., Pell, B. (2005). General game playing: Overview of the AAAI competition. AI magazine, 26(2): 62-62.

[6]. Gallotta, R., Todd, G., Zammit, M., et al. (2024). Large language models and games: A survey and roadmap. arXiv preprint arXiv:2402.18659.

[7]. Sefeni, D., Johnson, M., Lee, J. (2024). Game-theoretic approaches for stepwise controllable text generation in large language models.

[8]. McIntosh, T. R., Susnjak, T., Liu, T., et al. (2024). A game-theoretic approach to containing artificial general intelligence: Insights from highly autonomous aggressive malware. IEEE Transactions on Artificial Intelligence.

[9]. Fukaya, K., Daylamani-Zad, D., Agius, H. (2024). Evaluation metrics for intelligent generation of graphical game assets: a systematic survey-based framework. IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]. Wamba, S. F., Queiroz, M. M., Jabbour, C. J. C., et al. (2023). Are both generative AI and ChatGPT game changers for 21st-Century operations and supply chain excellence?. International Journal of Production Economics, 265: 109015.

[11]. Zhang, B., Zhu, J., Su, H. (2023). Toward the third generation artificial intelligence. Science China Information Sciences, 66(2): 121101.

[12]. Robertson, J., et al. (2024). Game changers: A generative AI prompt protocol to enhance human-AI knowledge co-construction. Business Horizons.

Cite this article

Qi,Z. (2024). Research on the Application of Generative Artificial Intelligence in Games. Applied and Computational Engineering,120,59-65.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 5th International Conference on Signal Processing and Machine Learning

ISBN：978-1-83558-809-3(Print) / 978-1-83558-810-9(Online)

Editor：Stavros Shiaeles

Conference website: https://2025.confspml.org/

Conference date: 12 January 2025

Series: Applied and Computational Engineering

Volume number: Vol.120

ISSN：2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).