
A Comprehensive Review of Transformer and Diffusion Models in Game Design: Applications, Challenges, and Future Directions
- 1 School of Computer Science, Sun Yat-Sen University, 132 Waihuan Rd E, Guangzhou, China
* Author to whom correspondence should be addressed.
Abstract
With the rapid advancement of artificial intelligence (AI) technologies, Transformer and Diffusion models have emerged as powerful tools across various fields, including game design. These models offer new possibilities for generating high-quality content and enhancing user experiences through personalized narratives. The integration of AI into game development promises to revolutionize how games are designed and experienced, potentially addressing limitations in traditional methods and opening avenues for more dynamic and engaging gameplay. However, despite the great potential of these advanced models, their use in game design is still an area that has not been fully explored, with limited systematic understanding of their capabilities, challenges, and future directions.This Scoping Review aims to provide a comprehensive overview of the current landscape of Transformer and Diffusion models in game design. By mapping out existing research, identifying key concepts, and highlighting gaps in the literature, this review seeks to establish a foundation for further exploration. It will analyze a broad range of studies to understand the diverse applications of these models, assess the methodologies employed, and explore the emerging trends in AI-enhanced game development. Through this process, the review will contribute to the broader discourse on AI's role in creative industries and inform future research efforts.
Keywords
Transformer models, Diffusion models, LLMs, Game design, Procedural Content Generation
[1]. Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polosukhin, I. (2017). Attention is All you Need. Neural Information Processing Systems.
[2]. Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., & Amodei, D. (2020). Language Models are Few-Shot Learners. ArXiv, abs/2005.14165.
[3]. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners.
[4]. Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. ArXiv, abs/2006.11239.
[5]. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2021). High-Resolution Image Synthesis with Latent Diffusion Models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10674-10685.
[6]. Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E.L., Ghasemipour, S.K., Ayan, B.K., Mahdavi, S.S., Lopes, R.G., Salimans, T., Ho, J., Fleet, D.J., & Norouzi, M. (2022). Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. ArXiv, abs/2205.11487.
[7]. Blattmann, A., Dockhorn, T., Kulal, S., Mendelevitch, D., Kilian, M., & Lorenz, D. (2023). Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets. ArXiv, abs/2311.15127.
[8]. Yannakakis, G.N., & Togelius, J. (2011). Experience-Driven Procedural Content Generation. IEEE Transactions on Affective Computing, 2, 147-161.
[9]. Hendrikx, M.J., Meijer, S., Velden, J.V., & Iosup, A. (2013). Procedural content generation for games: A survey. ACM Trans. Multim. Comput. Commun. Appl., 9, 1:1-1:22.
[10]. Shaker, N., Togelius, J., & Nelson, M.J. (2016). Procedural Content Generation in Games. Computational Synthesis and Creative Systems.
[11]. Liu, J., Snodgrass, S., Khalifa, A., Risi, S., Yannakakis, G.N., & Togelius, J. (2020). Deep learning for procedural content generation. Neural Computing and Applications, 33, 19 - 37.
[12]. Levytskyi, V., Tsiutsiura, M., Yerukaiev, A., Rusan, N., & Li, T. (2023). The Working Principle of Artificial Intelligence in Video Games. 2023 IEEE International Conference on Smart Information Systems and Technologies (SIST), 246-250.
[13]. Yi Chan, E.M., Seow, C.K., Wee Tan, E.S., Wang, M., Yau, P.C., & Cao, Q. (2024). SketchBoard: Sketch-Guided Storyboard Generation for Game Characters in the Game Industry. 2024 IEEE 22nd International Conference on Industrial Informatics (INDIN), 1-8.
[14]. Salge, C., Short, E., Preuss, M., Samothrakis, S., & Spronck, P. Applications of Artificial Intelligence in Live Action Role-Playing Games (LARP).
[15]. Gallotta, R., Todd, G., Zammit, M., Earle, S., Liapis, A., Togelius, J., & Yannakakis, G.N. (2024). Large Language Models and Games: A Survey and Roadmap. ArXiv, abs/2402.18659.
[16]. Sweetser, P. (2024). Large Language Models and Video Games: A Preliminary Scoping Review. ACM Conversational User Interfaces 2024.
[17]. Xu, Y., Chen, L., Fang, M., Wang, Y., & Zhang, C. (2020). Deep Reinforcement Learning with Transformers for Text Adventure Games. 2020 IEEE Conference on Games (CoG), 65-72.
[18]. Sudhakaran, S., Gonz'alez-Duque, M., Glanois, C., Freiberger, M.A., Najarro, E., & Risi, S. (2023). MarioGPT: Open-Ended Text2Level Generation through Large Language Models. ArXiv, abs/2302.05981.
[19]. Nananukul, N., & Wongkamjan, W. (2024). What if Red Can Talk? Dynamic Dialogue Generation Using Large Language Models. ArXiv, abs/2407.20382.
[20]. Ashby, T., Webb, B.K., Knapp, G., Searle, J., & Fulda, N. (2023). Personalized Quest and Dialogue Generation in Role-Playing Games: A Knowledge Graph- and Language Model-based Approach. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems.
[21]. Värtinen, S., Hämäläinen, P., & Guckelsberger, C. (2024). Generating Role-Playing Game Quests With GPT Language Models. IEEE Transactions on Games, 16, 127-139.
[22]. Todd, G., Earle, S., Nasir, M.U., Green, M.C., & Togelius, J. (2023). Level Generation Through Large Language Models. Proceedings of the 18th International Conference on the Foundations of Digital Games.
[23]. Leandro, J., Rao, S., Xu, M., Xu, W., Jojic, N., Brockett, C.J., & Dolan, W.B. (2023). GENEVA: GENErating and Visualizing branching narratives using LLMs. 2024 IEEE Conference on Games (CoG), 1-5.
[24]. Chen, D., Wang, H., Huo, Y., Li, Y., & Zhang, H. (2023). GameGPT: Multi-agent Collaborative Framework for Game Development. ArXiv, abs/2310.08067.
[25]. Ren, L. (2024). Application of Denoising Diffusion Probabilistic Models in the Minecraft Environment. CAIBDA.
[26]. Valevski, D., Leviathan, Y., Arar, M., & Fruchter, S. (2024). Diffusion Models Are Real-Time Game Engines. ArXiv, abs/2408.14837.
[27]. Lee, H.J., & Simo-Serra, E. (2023). Using Unconditional Diffusion Models in Level Generation for Super Mario Bros. 2023 18th International Conference on Machine Vision and Applications (MVA), 1-5.
[28]. Menapace, W., Siarohin, A., Lathuilière, S., Achlioptas, P., Golyanik, V., Ricci, E., & Tulyakov, S. (2023). Promptable Game Models: Text-guided Game Simulation via Masked Diffusion Models. ACM Transactions on Graphics, 43, 1 - 16.
[29]. Akoury, N., Yang, Q., & Iyyer, M. (2023). A Framework for Exploring Player Perceptions of LLM-Generated Dialogue in Commercial Video Games. Conference on Empirical Methods in Natural Language Processing.
[30]. Matyas, L., & Csepregi The Effect of Context-aware LLM-based NPC Conversations on Player Engagement in Role-playing Video Games.
[31]. Peebles, W.S., & Xie, S. (2022). Scalable Diffusion Models with Transformers. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 4172-4182.
[32]. Kumaran, V., Rowe, J., Mott, B.W., & Lester, J.C. (2023). SceneCraft: Automating Interactive Narrative Scene Generation in Digital Games with Large Language Models. Artificial Intelligence and Interactive Digital Entertainment Conference.
[33]. Che, H., He, X., Liu, Q., Jin, C., & Chen, H. (2024). GameGen-X: Interactive Open-world Game Video Generation. ArXiv, abs/2411.00769.
[34]. Zhu, X., Chen, Y., Tian, H., Tao, C., Su, W., Yang, C., Huang, G., Li, B., Lu, L., Wang, X., Qiao, Y., Zhang, Z., & Dai, J. (2023). Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory. ArXiv, abs/2305.17144.
[35]. Hu, S., Huang, T., Ilhan, F., Tekin, S.F., Liu, G., Kompella, R.R., & Liu, L. (2024). A Survey on Large Language Model-Based Game Agents. ArXiv, abs/2404.02039.
[36]. Nasir, M.U., & Togelius, J. (2023). Practical PCG Through Large Language Models. 2023 IEEE Conference on Games (CoG), 1-4.
[37]. Hu, C., Zhao, Y., Wang, Z., Du, H., & Liu, J. (2023). Games for Artificial Intelligence Research: A Review and Perspectives. IEEE Transactions on Artificial Intelligence, 5, 5949-5968.
[38]. Salge, C., Short, E., Preuss, M., Samothrakis, S., & Spronck, P. Applications of Artificial Intelligence in Live Action Role-Playing Games (LARP).
[39]. Hämäläinen, P., Tavast, M., & Kunnari, A. (2023). Evaluating Large Language Models in Generating Synthetic HCI Research Data: a Case Study. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems.
[40]. Sweetser, P. (2024). Large Language Models and Video Games: A Preliminary Scoping Review. ACM Conversational User Interfaces 2024.
[41]. Buongiorno, S., Klinkert, L.J., Zhuang, Z., Chawla, T., & Clark, C. (2024). PANGeA: Procedural Artificial Narrative Using Generative AI for Turn-Based, Role-Playing Video Games. Artificial Intelligence and Interactive Digital Entertainment Conference.
[42]. Hendrikx, M.J., Meijer, S., Velden, J.V., & Iosup, A. (2013). Procedural content generation for games: A survey. ACM Trans. Multim. Comput. Commun. Appl., 9, 1:1-1:22.
[43]. Abeele, V.V., Spiel, K., Nacke, L.E., Johnson, D.M., & Gerling, K.M. (2020). Development and validation of the player experience inventory: A scale to measure player experiences at the level of functional and psychosocial consequences. Int. J. Hum. Comput. Stud., 135.
[44]. Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., & Artzi, Y. (2019). BERTScore: Evaluating Text Generation with BERT. ArXiv, abs/1904.09675.
[45]. Yang, D., Kleinman, E., & Harteveld, C. (2024). GPT for Games: An Updated Scoping Review (2020-2024). ArXiv, abs/2411.00308.
[46]. Pérez-Liébana, D., Liu, J., Khalifa, A., Gaina, R.D., Togelius, J., & Lucas, S.M. (2018). General Video Game AI: A Multitrack Framework for Evaluating Agents, Games, and Content Generation Algorithms. IEEE Transactions on Games, 11, 195-214.
[47]. Levytskyi, V., Tsiutsiura, M., Yerukaiev, A., Rusan, N., & Li, T. (2023). The Working Principle of Artificial Intelligence in Video Games. 2023 IEEE International Conference on Smart Information Systems and Technologies (SIST), 246-250.
[48]. Guerrero-Romero, C., Lucas, S.M., & Pérez-Liébana, D. (2018). Using a Team of General AI Algorithms to Assist Game Design and Testing. 2018 IEEE Conference on Computational Intelligence and Games (CIG), 1-8.
[49]. Liebana, D.P., Samothrakis, S., Togelius, J., Schaul, T., & Lucas, S.M. (2016). General Video Game AI: Competition, Challenges and Opportunities. AAAI Conference on Artificial Intelligence.
[50]. Lanzi, P.L., & Loiacono, D. (2023). ChatGPT and Other Large Language Models as Evolutionary Engines for Online Interactive Collaborative Game Design. Proceedings of the Genetic and Evolutionary Computation Conference.
[51]. Volum, R., Rao, S., Xu, M., DesGarennes, G., Brockett, C., Durme, B.V., Deng, O., Malhotra, A., & Dolan, B. (2022). Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022).
[52]. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M.A., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518, 529-533.
[53]. Hafner, D., Lillicrap, T.P., Ba, J., & Norouzi, M. (2019). Dream to Control: Learning Behaviors by Latent Imagination. ArXiv, abs/1912.01603.
Cite this article
Zhong,C. (2025). A Comprehensive Review of Transformer and Diffusion Models in Game Design: Applications, Challenges, and Future Directions. Applied and Computational Engineering,133,135-141.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 5th International Conference on Signal Processing and Machine Learning
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).