Automatic extraction of playable narrative units coupled with level generation

1. Introduction

There is an inherent contradiction between programmatic generation and narrative generation: narrative emphasizes time progression, causal logic, and character continuity, while programmatic levels pursue structural diversity, reusability, and computational feasibility [1]. To integrate the two, it is necessary to reconcile the completely different representational logics behind them. Narrative relies on symbolic abstraction and story progression, while programmatic levels are based on spatial geometry, mechanism design and system interaction. This study aims to explore the connection between text narrative corpora and programmatic level generators by automatically obtaining playable narrative units [2]. Among them, the "playable narrative units" are particularly crucial. Unlike the original narrative segments, these units are accompanied by constraint annotations, enabling them to be gamified and transformed upon extraction. For instance, when extracting a story fragment of a "locked room guarded by puzzles", not only is the text retained, but also structured labels such as "Entry restrictions", "obstacles", and "knowledge challenges" are attached. These tags help transform narratives into level geometry, enemy configuration, and interaction rules, thereby enabling text narratives to become formal norms for procedural instance generation [3].

The motivations of this research mainly come from three aspects. Firstly, at present, most programmatic level generation methods still rely on abstract syntax or random combinations, and the generation results often lack semantic coherence. Secondly, existing narrative generation systems usually struggle to go beyond text interaction and cannot be effectively mapped into playable content. Thirdly, both in the industrial and academic fields, the demand for adaptive game systems that can effectively match the story experience with the dynamic replayable environment is constantly increasing. The key contributions of this study are as follows: (a) the proposal of a dual-phase pipeline combining narrative extraction with level generation; (b) the introduction of new evaluation metrics capturing narrative-mechanics alignment and coherence; (c) large-scale experiments with diverse corpora demonstrating measurable performance improvements. By coupling symbolic narrative analysis with procedural generative algorithms, this study advances the possibility of scalable interactive storytelling where narrative coherence and gameplay adaptability are not mutually exclusive but systematically integrated.

2. Literature review

2.1. Narrative extraction in computational creativity

Computational creativity research has focused on segmenting stories into structural units using semantic role labeling, dependency parsing, and discourse frameworks [4]. While early systems centered on character and event extraction, recent transformer-based models parse hierarchical structures and map narratives into reusable graphs. This enables plot fragments and quest templates to be instantiated in interactive settings. Yet, most methods overlook playability constraints, leaving a gap between theoretical segmentation and practical design.

2.2. Procedural content and level generation

Procedural Content Generation (PCG) has progressed from rule-based dungeon creation to machine learning systems that balance difficulty, novelty, and pacing. Grammar-based and reinforcement models produce diverse levels but lack semantic coherence, often resulting in arbitrary environments [5]. The absence of direct mapping between narrative and spatial geometry remains a central limitation.

2.3. Integration of narrative and gameplay

Efforts to connect narrative and gameplay include quest generation and co-creation of characters and environments. These highlight the need for narrative-mechanics alignment but often rely on oversimplification or manual mappings. Playable narrative units offer a systematic solution, enabling scalable integration of story semantics with procedural level design beyond handcrafted approaches [6].

3. Methodology

3.1. Data collection and narrative corpus

The dataset was created from three types of narrative sources: mythological texts (32,400 samples), role-playing game fantasy scripts (58,200 samples), and interactive fiction corpora (34,000 samples). These samples were preprocessed by tokenization, lemmatization, and parsing of dependency, resulting in a 89.6 million tokens and 12.4 million dependency arcs set [7]. Narrative units have been identified by a hybrid segmentation model of discourse relation parsing and detection of semantic frame. Units have been annotated by tags that belong to gameplay category like “exploration,” “combat,” “dialogue,” “puzzle.”

3.2. Playability constraints and annotation scheme

A constraint engine was developed to ensure extracted units were translatable into gameplay. Constraints included: (a) spatial feasibility (units must describe environments that can be mapped into level geometry of ≤500×500 tiles); (b) actionability (at least one player interaction type defined, such as “unlock,” “attack,” or “dialogue”); (c) resource economy (at most five resource types per unit to maintain computational tractability). The annotation scheme was refined through iterative pilot testing on 3,000 samples, achieving inter-annotator agreement of 0.89 (Cohen’s κ) [8].

3.3. Level generation and coupling mechanism

Levels were generated through a constraint-driven grammar system combined with reinforcement learning optimization. Grammar rules defined structural templates, while a reinforcement agent adjusted parameters such as enemy density, puzzle complexity, and branching depth based on playability feedback. The coupling mechanism linked narrative annotations with grammar non-terminals: for example, a “locked gate” narrative tag triggered the insertion of a “barrier + key” subgraph. This ensured that extracted narrative units directly shaped level geometry and challenge design. Narrative-Mechanics Alignment Index as Formula (1):

$N M A I = \frac{1}{n} \sum_{i = 1}^{n} \frac{| M_{i} \cap N_{i} |}{| M_{i} \cup N_{i} |} + ϵ$ (1)

where $M_{i}$ represents mechanics tags instantiated in the level, $N_{i}$ represents narrative tags in the extracted unit, and $ϵ = 1 0^{- 6}$ prevents division by zero [9].

4. Experiments

4.1. Experimental setup

Experiments were carried out on a high-performance computing cluster equipped with 64 NVIDIA A100 GPUs, 2 TB of RAM, and 1.5 PB of storage, ensuring sufficient computational resources for large-scale narrative processing and level generation tasks. The experimental pipeline processed a total of 124,600 narrative samples, which were drawn from three distinct corpora designed to capture different stylistic and structural features of storytelling: mythological texts, fantasy role-playing game scripts, and interactive fiction datasets. By including corpora that differ significantly in linguistic complexity, character interaction, and structural conventions, the setup enabled a comprehensive evaluation of how the proposed narrative-coupled generation system performs across varied narrative domains. The three generation methods compared were: (a) a grammar-only baseline, which relied on structural rules without semantic grounding; (b) a random assembly method, which combined fragments without coherence constraints; and (c) the proposed narrative-coupled system, designed to align semantic units with level design.The assessment adopted four core indicators: narrative coherence score (on a 0-10 scale), diversity index (measuring the variability of generated levels), Narse-Mechanism Alignment Index (NMAI, used to measure the consistency between mechanisms and narratives), and average playability duration (in minutes). These indicators, when combined, offer a multi-dimensional perspective on narrative and game quality [10]. Table 1 provides an overview of the distribution of the corpora, and the differences among the three types of corpora can be clearly seen. The mythological corpus contains 32,400 samples, with a total of 4,780 narrative units extracted, an average length of 138 ± 12 words, and 3.2 ± 0.6 gameplay labels marked. Fantasy RPG texts are the most numerous category, with a total of 58,200 samples, generating 9,920 units. The average length of each unit is 154 ± 15 bytes, and it has the highest 4.1 ± 0.7 tags, demonstrating the richness of its game mechanics. Interactive novels provided 34,000 samples and generated 4,720 units, with an average of 129 ± 11 word units, which were shorter than other categories. Each unit was labeled with 2.9 ± 0.5 tags, reflecting its more dialogation-driven feature. The above results demonstrate the scalability of the extraction process and also reflect the diversity of content that the system can handle. By rationally constructing a corpus, equipping with powerful computing resources and adopting clear evaluation metrics, this experiment was able to provide reliable, repeatable and complete analysis, thereby verifying the potential value of the combination of narrative extraction and programmatic level generation.

Table 1. Experimental dataset distribution
Corpus Type	Samples	Extracted Units	Avg. Tokens/Unit	Annotated Tags/Unit
Mythological	32,400	4,780	138 ± 12	3.2 ± 0.6
Fantasy RPG Scripts	58,200	9,920	154 ± 15	4.1 ± 0.7
Interactive Fiction	34,000	4,720	129 ± 11	2.9 ± 0.5

4.2. Quantitative results

The narrative-coupled approach consistently outperformed both baselines across all defined metrics, demonstrating its ability to combine semantic coherence with procedural adaptability. Coherence scores averaged 7.83 ± 0.42*, which is markedly higher than the grammar-only system at 6.11 ± 0.51 and the random assembly approach at 5.02 ± 0.67. This improvement reflects the advantage of integrating story-derived constraints directly into the generative pipeline, ensuring that levels not only exhibit structural variation but also retain narrative logic (see Figure 1). The diversity index further confirmed this benefit, with the narrative-coupled system reaching 0.68 ± 0.04, while the grammar-based method remained at 0.55 ± 0.06, suggesting a broader range of content expression when narrative units guide generation.

Additional metrics also reinforced the superiority of the proposed framework. The Narrative-Mechanics Alignment Index (NMAI) achieved a mean of 0.74 ± 0.03, substantially above the 0.61 ± 0.05 obtained by the baseline, confirming that extracted story elements were successfully translated into gameplay mechanics. Session length data also supported this conclusion, with players engaging for an average of 42.6 ± 3.4 minutes in narrative-coupled levels, compared to 34.7 ± 4.2 minutes for grammar-only levels and 29.9 ± 5.0 minutes for random assembly. Collectively, these findings demonstrate statistically reliable improvements in narrative coherence, content diversity, and player engagement. To capture the combined effect of these indicators, performance was formally modeled through the Level Playability Function, presented as Formula (2):

$L P F (L) = α \cdot Coherence (L) + β \cdot Diversity (L) + γ \cdot Session (L)$ (2)

With parameter weights of $α$ = 0.4, $β$ = 0.3, and $γ$ = 0.3.

Figure 1. Comparative results across methods with error bars

4.3. Qualitative observations

Generated levels displayed distinct thematic alignment with narrative units. For instance, mythological narratives involving labyrinths produced maze-like structures with branching puzzles, while fantasy dialogues yielded village hub levels with NPC interaction nodes. Playtesting feedback provided additional validation. As shown in the Table 2, a total of 120 participants completed structured sessions, and results highlighted stronger perceived immersion with narrative-coupled levels, achieving an average satisfaction score of 8.2 ± 0.5, compared to 6.4 ± 0.6 for grammar-only levels and 5.7 ± 0.7 for random assembly.

Table 2. Quantitative evaluation metrics
Method	Coherence Score	Diversity Index	NMAI	Session Length (min)
Narrative-Coupled	7.83 ± 0.42*	0.68 ± 0.04	0.74 ± 0.03	42.6 ± 3.4
Grammar-Only Baseline	6.11 ± 0.51	0.55 ± 0.06	0.61 ± 0.05	34.7 ± 4.2
Random Assembly	5.02 ± 0.67	0.47 ± 0.07	0.52 ± 0.06	29.9 ± 5.0

Figure 2. Effect of coupling strength (λ) on NMAI and LPF

Participants reported reduced dissonance between story and mechanics, as actions such as unlocking gates or solving riddles were perceived as natural extensions of the storyline rather than arbitrary obstacles. Observations also indicated smoother pacing, with fewer instances of abrupt difficulty spikes or incoherent transitions, further confirming that thematic grounding improved both experiential quality and narrative–gameplay alignment (see Figure 2).

5. Conclusion

This study introduced a dual-phase framework for coupling automatic extraction of playable narrative units with procedural level generation. The approach achieved robust improvements in coherence, diversity, and alignment, demonstrating that narrative integrity and procedural adaptability can coexist in scalable systems. The findings provide a pathway for next-generation narrative-driven games, educational simulations, and serious games requiring both storytelling depth and structural variability. Beyond entertainment, the methodology could be applied to training scenarios, cultural heritage applications, and interactive learning environments. Limitations include reliance on English corpora, computational demands, and the need for richer semantic ontologies. Future research will explore multilingual corpora, hybrid symbolic-neural architectures, and adaptive difficulty balancing. Longitudinal user studies are planned to assess retention and learning outcomes.

References

[1]. Zhong, Q. (2023, November). Enhancing Interactive Storytelling: A Computational Approach to Autonomously Generating Role-Playing Game Scripts with Natural Language Processing. In 2023 3rd International Signal Processing, Communications and Engineering Management Conference (ISPCEM) (pp. 714-719). IEEE.

[2]. Chitnis, G., & Shaikh, T. (2024, October). StoryPlay: End-to-End 8-Bit Game Level Generation Using Large Language Models. In International Conference on Information Technology and Applications (pp. 337-346). Singapore: Springer Nature Singapore.

[3]. Alhussain, A. I., & Azmi, A. M. (2021). Automatic story generation: A survey of approaches.ACM Computing Surveys(CSUR), 54(5), 1-38.

[4]. Wen, Y., Huang, C., Zhou, H., Zeng, Z., Po, C. M. L., Togelius, J., ... & Earle, S. (2025). All stories are one story: Emotional arc guided procedural game level generation. arXiv preprint arXiv: 2508.02132.

[5]. Kumaran, V., Rowe, J., Mott, B., & Lester, J. (2023, October). Scenecraft: Automating interactive narrative scene generation in digital games with large language models. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (Vol. 19, No. 1, pp. 86-96).

[6]. Saffari, S., Dorrigiv, M., & Yaghmaee, F. (2024). Harnessing Machine Learning for Procedural Content Generation in Gaming: A Comprehensive Review.Journal of AI and Data Mining, 12(4), 583-597.

[7]. Yannakakis, G. N., & Togelius, J. (2025). Procedural Content Generation by Content Type. In Artificial Intelligence and Games (pp. 287-312). Cham: Springer Nature Switzerland.

[8]. Cavaliere, F. (2024). Natural language understanding for interaction with digital characters [Master’s thesis, ETH Zurich]. ETH Zurich Research Collection.

[9]. Joon, L. H. (2024). Advancing Level Generation in Super Mario Bros.: A Proposal for Combining Diffusion Models with Real-Time Playability Assessment (Doctoral dissertation, Waseda University).

[10]. Puchal, C. H., & López, M. J. J. P. (2021). To create a game master: a decalogue for procedural generation of interactive stories. In Image Processing and Multimedia Technology Centre. Polytechnic University of Catalonia.

Cite this article

Zhang,Y. (2025). Automatic extraction of playable narrative units coupled with level generation. Advances in Operation Research and Production Management,4(2),59-63.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Journal：Advances in Operation Research and Production Management

Volume number: Vol.4

Issue number: Issue 2

ISSN：3029-0880(Print) / 3029-0899(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).

References

[3]. Alhussain, A. I., & Azmi, A. M. (2021). Automatic story generation: A survey of approaches.ACM Computing Surveys(CSUR), 54(5), 1-38.

[6]. Saffari, S., Dorrigiv, M., & Yaghmaee, F. (2024). Harnessing Machine Learning for Procedural Content Generation in Gaming: A Comprehensive Review.Journal of AI and Data Mining, 12(4), 583-597.

[7]. Yannakakis, G. N., & Togelius, J. (2025). Procedural Content Generation by Content Type. In Artificial Intelligence and Games (pp. 287-312). Cham: Springer Nature Switzerland.

[8]. Cavaliere, F. (2024). Natural language understanding for interaction with digital characters [Master’s thesis, ETH Zurich]. ETH Zurich Research Collection.

[9]. Joon, L. H. (2024). Advancing Level Generation in Super Mario Bros.: A Proposal for Combining Diffusion Models with Real-Time Playability Assessment (Doctoral dissertation, Waseda University).