Integrating computer vision and AI for interactive augmented reality experiences in new media

Ruonan Shi

doi:10.54254/2755-2721/102/20241002

1. Introduction

Augmented reality (AR) alters the conventional paradigm of user interaction with digital content by superimposing virtual objects onto the physical world, thereby enabling contextual immersion in multiple domains. While developers are accelerating the development of AR devices and applications, they are increasingly examining ways to improve the user experience from interaction perspectives. Recent advancements in artificial intelligence (AI) and computer vision technologies are emerging as important solutions for this purpose. By tracking, recognising and analysing user inputs as well as information gathered from the environment, these technologies enable AR applications to interact with users or the surrounding environment in a dynamic and adaptive manner. Gesture recognition enables users to manipulate virtual objects by natural hand-wave motions. Meanwhile, visual object tracking ensures that virtual content would be anchored to appropriate locations in the physical scene. Facial recognition has been frequently used across various AR applications in order to recognise users' facial expressions and features, thereby providing personalised AR content. The AI-driven content adaptation makes AR application content dynamically and contextually aware. Narrative technologies and AI could be utilised to create dynamic and emotionally rich stories and characters for immersive AR experiences. This paper focuses on exploring the potential of AI and computer vision in enhancing human-to-computer interaction in AR. Based on comprehensive reviews and a set of case studies, we analysed three key technologies and representative use cases of these technologies in the AR domain, while demonstrating performance metrics to show that these AI and computer vision technologies are reshaping user interaction in AR [1].

2. Enhancing User Interaction with Computer Vision

2.1. Gesture Recognition

Computer vision-powered gesture-recognition technology enables users to interact with AR environments using natural hand movements. It opens up a new kind of interaction that's highly intuitive and immersive. Pattern-recognition systems can analyse your hand movements and gestures, and use this information to interpret commands and perform actions in your AR environment. This differs from earlier interaction models that rely on physical controllers or touchscreens. A gestural interaction model embeds a feeling of control more naturally into the experience, making your interaction seem more intuitive and lifelike. Imagine you're using an educational AR application where you're able to examine a 3D model of the human body. Shifting your gaze from the heart to the lungs, you point your finger at the neck and open your palm to expand the larynx. You also roll your hand outwards to turn the stomach. From an interaction perspective, it would be difficult to find a better way of engaging with this complex anatomy model. It just feels so natural. So how can gesture-recognition technology be built? [2] Algorithm-wise, it requires highly precise computer vision systems that are capable of tracking and recognising a vast number of hand movements under a variety of environmental conditions. Research in the field explores ways to make these systems faster and more accurate so that they're able to support more sophisticated and diverse interactions, opening the door to new use cases, and eventually broadening the scope for gesture-based AR applications.

2.2. Object Tracking

Object tracking is an essential part of interactive AR experiences, where digital content must be anchored to and accurately aligned with the physical world. Computer vision algorithms can track the position and orientation of objects in real time, ensuring that virtual content is stably rendered in place as a user moves and interacts with their environment. Without the ability to track an object, a simple AR experience, such as adding a floating label with some text to a real-world product on a shelf, would lose its integrity as soon as the user moves their head and breaks the line of sight with the physical object being augmented. In retail, AR can track products sitting on a shelf, and place information over the top of them, such as a price, user reviews or coupons. In manufacturing, object tracking may be used to guide workers along an assembly line, keeping them informed as to the correct placement and sequence of physical parts. In all of these cases, the goal is to enable users to seamlessly interact with smart, digitally enhanced reality [3]. Research in this area is heavily focused on improving the accuracy and reducing the latency of object-tracking systems-especially in dynamic cluttered environments containing many moving objects.

2.3. Facial Recognition

Facial recognition is an exploit that could leverage the pervasiveness of mobile phones for AR to create 'me-centric' experiences. Based on this input, the system could identify facial features and extract expressions, and use the extracted information to tailor the AR content for a user's context. The application of facial recognition to AR could increase user engagement by creating a personalised experience. Some of these experiences might include: customising an AR avatar based on a user's facial features; generating dynamic faces of avatars or game characters in accordance with the user's facial expressions, all of which should be synchronised in real time; and building emotions into AR content, where an avatar's expression could be mapped to the user's [4]. Some existing applications of facial recognition to social media focus on creating personalised emojis or avatars that track a user's facial expressions in real time, to reflect the user's emotions. In the case of marketing, tools exploiting facial recognition might gauge a user's reactions to an advertisement. For instance, an advertisement could adapt to a user's feelings using tags to affect the user's mood in a positive way-like music that wouldn't normally be liked or an animated character that may appear to laugh when the user feels unhappy. The key to applying face recognition to AR is to develop algorithms that could operate on consumer-grade devices while also ensuring privacy and securely storing data. Current research towards improving the accuracy of facial recognition as well as its implementation time and energy consumption on resource-constrained devices could be applicable to this application [5].

Table 1 below depicts the performance metrics and user preferences for different application of AI and computer vision technologies in AR areas.

Table 1. AI and Computer Vision in AR Applications

Technology	Use Case	Application	Interaction Enhancement	Precision (%)	User Satisfaction (out of 10)	Response Time (ms)
Gesture Recognition	Education	3D Model Manipulation	Hands-on Learning	93.25273707	8.903972658	53.68544137
Gesture Recognition	Gaming	Complex Actions	Natural Interaction	93.06909213	9.272540131	67.86345438
Object Tracking	Retail	Product Information Overlay	Enhanced Shopping	97.22130853	9.012126616	51.77513944
Object Tracking	Manufacturing	Assembly Line Guidance	Real-time Instructions	96.68354491	9.102402275	29.46469686
Facial Recognition	Social Media	Personalized Emojis	Authentic Interactions	88.08231365	8.921661674	55.09513556
Facial Recognition	Marketing	Emotion-based Content	Adaptive Advertising	90.00067294	8.022366256	51.52857315

3. AI-Driven Content Adaptation

3.1. Contextual Awareness

This is a major advantage that AR applications can take advantage of. AI is able to contextualise data collected from sensors and cameras, and consequently adjust content, based on the context in which the user operates. A detailed understanding of the user's context allows for more meaningful and relevant interactions, from delivering location-based information to adapting to ambient light, detecting environmental features, and reconfiguring the displayed content accordingly. AR navigation apps can generate directions in real time, based on the user's location and orientation. In museums, a context-aware AR system can assist the visitor and personalise the tour, based on the exhibits around the user and his or her interests [6]. For enabling contextual awareness, we need to integrate diverse data sources and algorithms that can process and interpret these data in real time. Research in this area involves making it more accurate and reducing the computational overhead so that more powerful context-aware systems can be implemented on mobile and wearable machines.

3.2. Real-Time Data Processing

Real-time data processing is also greatly important for AR applications. In order to provide fast and responsive interactions, AI algorithms must be able to process and analyse enormous amounts of data quickly. Through advancements in computer technologies and AI algorithms, real-time data processing can collect, store, analyse and filter relevant information in a fraction of a second, which allows an AR application to respond to user inputs and environmental changes instantly. For example, an AR application in sports training can process enormous amounts of data to analyse athletes' motions and immediately provide feedback to train their technique, improving their performance. Similarly, in retail, AR mirrors can process customer data and instantly generate outfit recommendations that are tailored to fit the customer. In order to provide real-time data processing to improve the overall user experience, AI researchers are experimenting with new techniques to reduce latency caused by data processing. For instance, edge computing processes data closer to where data are generated and interacted with, eliminating the need for data to be initially uploaded to the cloud before processing [7]. This technique offers a faster and more reliable data-processing time, which is crucial for maintaining the AR experience. In figure 1, it is shown the performance metrics of real-time data processing by different AR applications [8]. These metrics are based on virtual data.

Figure 1. Performance Metrics for Real-Time Data Processing in AR Applications

3.3. Personalized Content Delivery

Alongside this, one of the most celebrated features of AI-augmented AR is easy delivery of personalised content. If an AR application analyses a user's preferences, behaviour and patterns of interaction, then the AI can make the content of the app more tailored to the user, thereby increasing user satisfaction and retention. Personalised content delivery can support more meaningful interactions between the user and the device, such as customised recommendations, adaptive learning paths, and timely marketing messages. For instance, AR education applications could modify their lessons based on each student's individual learning pace and style, by providing tailored resources, suggestions, and point of information. In marketing, aggregating previous data about consumers (purchases, preferences, demographics, geographical and contextual information, etc), AI algorithms can predict the right time and content of a promotion (graphic, sound, product displayed, etc) that has a higher chance to result in a conversion. This, in turn, leads to increased sales [9]. Retention is another aspect that can be enhanced thanks to personalised content delivery. Once a consumer is provided with a product or an experience that is tailored to their preferences, it is more likely that they will return for more [10].

4. Immersive Storytelling in AR

4.1. Narrative Integration

The integration of narrative elements can be useful in enhancing users' experience in AR experiences by creating meaningful interactions with narratives and thus compelling users' engagement. AI technologies can facilitate narrative integration in AR by creating immersive environments where users can interact with dynamic stories through AR toolkits that use narratives as contexts for human activities in our real world. The integration of AI-driven narrative can help to build interactive, adaptive and context-aware stories by communicating with accompanying data or triggers from additional sensors in AR environments to create engaging experiences. For instance, tourism can utilise an AI-driven narratives in AR guides that tell historical stories and legends, and establish meaningful connections with informational contents related to historical tourists' location. Users can find their physical locations in augmented historical space by incorporating narrative contents when they visit ancient historical sites [11]. Table 2 shows various case studies of AI-driven narrative integration in AR applications [12]. These metrics demonstrate the effectiveness of narrative integration in enhancing user experiences across different sectors.

Table 2. AI-Driven Narrative Integration Case Studies

Case Study	Application	Narrative Type	Interactivity Level	User Engagement Score (out of 10)	Learning/Experience Enhancement (%)	Adaptability Index
Historical Tourism	AR Guides	Historical Stories	High	8.675284422	22.49504734	0.837686719
Education	Interactive Lessons	Educational Stories	Medium	8.842656573	19.62500221	0.862992146
Gaming	AR Adventures	Fantasy Narratives	Very High	9.183600026	26.77090394	0.960236165
Retail	Virtual Showrooms	Product Stories	Low	7.794634805	15.78545266	0.781035527
Healthcare	Patient Education	Medical Scenarios	Medium	8.256977185	20.09308756	0.826476124

4.2. Character Development

In this way, AI-driven character development is critical to developing AR stories and interactions that feel real and dynamic. Through natural language processing and machine learning algorithms, AI-driven characters can be developed that, for example, coyly respond to user inputs, can adjust their behaviour for different user inputs, and can exhibit realistic emotions. For instance, in gaming, they can further enhance the experience by developing conversations with the users, providing personalised advice and companionship throughout the game. Meanwhile, in educational AR apps, gameful learning can be augmented by learning from virtual tutors that can dynamically adjust their style and pace in response to users' feedback and progress, thereby personalising learning experiences. Realising AI-driven character development requires sophisticated algorithms that can understand and generate natural language, and, importantly, create believable emotional responses [13]. The challenge for researchers today in this area lies in improving the plausibility and interactivity of AI-driven characters so that they can become more effective in engaging users across different contexts and applications.

5. Conclusion

The marriage of AI and CV to AR can transform the potential for interactive and immersive real-time user experiences. The use of gesture recognition, object tracking and facial recognition engages users in a whole new way. By precision-tracking human movements, it could make AR UIs more intuitive and engaging. The AI's adaptive content and narrative could make AR more immersive and contextually aware. Convolutional Neural Networks (CNNs) can train a computer to recognise patterns in data, and Natural Language Processing (NLP) can read text as sounds, tones, intonations and voice inflections. Ultimately, it could awaken AR users to a more profound digital reality. While there are technical and ethical barriers to overcome, such as processing data in real time and respecting user privacy, ongoing RD will seize the potential for what's possible in AR. The case studies presented here have shown how AI-enhanced AR is already changing sectors in ways that weren't conceivable without computer vision and AI, and how these technologies might define the future of new media-a world where users interact with a technology that can learn and understand us more profoundly. As AI and CV technologies progress further, they will continue pushing AR innovation to the forefront, allowing users to interact with a more immersive real-life experience.

References

[1]. Habil, Sandra Gamil Metry, Sara El-Deeb, and Noha El-Bassiouny. "The metaverse era: leveraging augmented reality in the creation of novel customer experience." Management & Sustainability: An Arab Review 3.1 (2024): 1-15.

[2]. Buchner, Josef, and Michael Kerres. "Media comparison studies dominate comparative research on augmented reality in education." Computers & Education 195 (2023): 104711.

[3]. Gong, Taeshik, and JungKun Park. "Effects of augmented reality technology characteristics on customer citizenship behavior." Journal of Retailing and Consumer Services 75 (2023): 103443.

[4]. Kumar, Harish, Parul Gupta, and Sumedha Chauhan. "Meta-analysis of augmented reality marketing." Marketing Intelligence & Planning 41.1 (2023): 110-123.

[5]. Dargan, Shaveta, et al. "Augmented reality: A comprehensive review." Archives of Computational Methods in Engineering 30.2 (2023): 1057-1080.

[6]. Walk, Jannis, et al. "Artificial intelligence for sustainability: Facilitating sustainable smart product-service systems with computer vision." Journal of Cleaner Production 402 (2023): 136748.

[7]. Khang, Alex, et al. "Application of Computer Vision (CV) in the Healthcare Ecosystem." Computer Vision and AI-Integrated IoT Technologies in the Medical Ecosystem. CRC Press, 2024. 1-16.

[8]. Waelen, Rosalie A. "The ethics of computer vision: an overview in terms of power." AI and Ethics 4.2 (2024): 353-362.

[9]. Khang, Alex, et al., eds. Computer vision and AI-integrated IoT technologies in the medical ecosystem. CRC Press, 2024.

[10]. William, P., et al. "Crime analysis using computer vision approach with machine learning." Mobile Radio Communications and 5G Networks: Proceedings of Third MRCN 2022. Singapore: Springer Nature Singapore, 2023. 297-315.

[11]. Górriz, Juan M., et al. "Computational approaches to explainable artificial intelligence: advances in theory, applications and trends." Information Fusion 100 (2023): 101945.

[12]. Oliveira, Rosana Cavalcante de, and Rogério Diogne de Souza E. Silva. "Artificial intelligence in agriculture: benefits, challenges, and trends." Applied Sciences 13.13 (2023): 7405.

[13]. Svanberg, Maja, et al. "Beyond AI Exposure: Which Tasks are Cost-Effective to Automate with Computer Vision?." Available at SSRN 4700751 (2024).

Cite this article

Shi,R. (2024). Integrating computer vision and AI for interactive augmented reality experiences in new media. Applied and Computational Engineering,102,49-54.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 2nd International Conference on Machine Learning and Automation

ISBN：978-1-83558-693-8(Print) / 978-1-83558-694-5(Online)

Editor：Mustafa ISTANBULLU

Conference website: https://2024.confmla.org/

Conference date: 12 January 2025

Series: Applied and Computational Engineering

Volume number: Vol.102

ISSN：2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).