1. Introduction
The fast-growing digital consumer finance platforms usher an era for multimodal, high-frequency, and complex data production. Each user interaction, inquiry, or transaction in digital consumer finance platform lays a footprint in multiple forms: text query, picture uploads, behavioral navigation patterns, and purchasing histories. Conventional pricing mechanisms cannot harness diversified signals such as these and instead rely upon structured credit history and demography tables. Pricing models thus are prone to underfit behavioral patterns honed enough to be linked to payback intention or ability. The emergence of artificial intelligence and representation learning holds new promises for extracting semantic and temporal value out of unstructured financial data [1]. This integration of such methods into feasible and interpretable pricing systems remains in its infant days, though.
Despite increasing interest in deep learning-based applications for fintech, academic and practical researches are mostly focused on fraud detection or credit authorization rather than price optimization. Applications for multimodal learning are often siloed in data modality treatment and scarcely consider cross-modal dependencies and interpretability for pricing. Moreover, models for revenue bias often pay less attention to regulatory conformity, fairness, and user-centered transparency and therefore inhibit practical implementation. What is therefore called for is an integrated framework that not only abets multimodal signals' predictive power but equally conforms to ethics, operations, and jurisprudence [2].
To counter these issues, we introduce a multimodal intelligent pricing architecture via three fundamental innovations. First, we construct a high-resolution embedding space integrating language, visual, and behavioral signals via gated attention mechanisms. Second, we formulate a module for price adjustments via reinforcement learning to optimize revenue given strict compliance constraints. Third, we conduct exhaustive experiments on a 120,000-member real-world loan application dataset demonstrating excellent accuracy, interpretability, and robustness. The findings offer actionable prescriptions for practitioners and pave the way to responsible AI for financial pricing.
2. Literature review
2.1. Pricing models in consumer finance
Traditional pricing in consumer lending is anchored in parametric frameworks that map borrower traits to default risk. Logistic regression, proportional-hazard specifications, and scorecard segmentation dominate because they align with underwriting workflows and offer clear coefficient interpretation. Yet these models assume monotonic or linear relationships and treat variables as largely independent, overlooking interaction effects such as debt-income–volatility feedbacks. Machine-learning upgrades, gradient-boosted trees and multilayer perceptrons—introduce nonlinearities and capture higher-order patterns, but they still operate almost exclusively on summary tables compiled from credit bureaus and bank statements [3]. Consequently, rich behavioural signals from mobile clickstreams, social graph interactions, and real-time spending rhythms remain untapped.
2.2. Multimodal data integration techniques
Three principal strategies integrate heterogeneous signals. Early fusion concatenates raw or pre-processed features into a single vector for holistic learning, but dimensionality growth and modality imbalance often impair generalisation. Late fusion trains specialised predictors for each modality and merges their posterior scores, enhancing robustness when certain channels degrade yet limiting fine-grained cross-signal reasoning [4]. Hybrid designs aim for the advantages of both paradigms by first encoding each modality into compact embeddings and then aligning them through cross-attention, gated units, or contrastive objectives. Methods like canonical correlation analysis are trained from shared latent spaces preserving modality-specific detail while summarising conjoint semantics. Active weight schemes inferentially further down-rank errant inputs, preventing one noisy modality from taking exclusive control over decisions [5]. Loan requests add extra demands for audit trails, bias control, and stability to macroeconomic regime jumps, rendering transfer beyond training data more formidable.
2.3. Intelligent mechanisms in fintech
Adaptive decision engines revolutionize product development in lending, payments, insurance, and wealth management. Reinforcement-learning agents calibrate peer-to-peer market place bids, optimize credit-line capacities, and set interest-rate offerings through balance between profit and retention. Explainable AI methods, like SHAP value attribution, counterfactual dashboards, and causal graphs—translate black-box model outputs into actionable narratives for risk officers, enabling regulatory checks. Deep causal inference techniques separate spurious correlation from causal drivers so that stress tests occur under simulated macro shocks and policy decisions remain valid when data distributions change [6]. Large-scale deployment for consumer lending continues in progress. Practicalities involve keeping real-time latency within traditional banking stacks, justifying fairness within demographically based segments, and meeting governance mandates that mirror pending AI-act templates. This work combines these advances using a risk-aware reinforcement learner synchronized along interpretable multimodal embeddings that offer both behavior-adaptive and auditor-transparent pricing.
3. Methodology
3.1. Data collection and preprocessing
The dataset includes 120,000 anonymized loan applications sourced from a fintech lender over two years. For each user, we collect three primary modalities: (i) application text such as purpose declarations and chat interactions; (ii) profile images scraped from publicly available data and verified with facial detection filters; and (iii) behavioral logs including session timestamps, scrolling speed, page transitions, and abandonment patterns [7]. Text is preprocessed using BERT tokenization, image vectors are derived from a pre-trained ResNet-50 encoder, and behavioral logs are converted into 64-dimensional statistical summaries per session using a temporal segmentation algorithm. Missing values are handled with modality-specific imputation strategies. Table 1 presents a descriptive summary of the multimodal dataset. It reveals that behavioral logs have the highest feature sparsity, while profile image features show higher interquartile variance. The volume of textual tokens per user also indicates significant semantic diversity, underscoring the need for robust language modeling [8].
Modality |
Avg. Feature Length |
Std. Dev |
Missing Rate (%) |
Avg. Feature Sparsity (%) |
Text (tokens) |
214 |
47.8 |
1.3 |
18.2 |
Image (pixels) |
2048 |
233.6 |
4.5 |
5.7 |
Behavior Logs |
64 |
12.4 |
6.9 |
42.3 |
3.2. Multimodal feature fusion framework
Each modality is encoded into a 128-dimensional embedding vector. Gated Multimodal Units (GMUs) selectively integrate information across modalities by computing attention weights that prioritize high-risk indicators. The fused representation passes through a cross-modal attention block that identifies signal combinations, such as risk-laden textual claims coupled with erratic browsing, that predict default likelihoods. The final output is a joint embedding vector fed into the downstream pricing module. This is illustrated in Figure 1.

3.3. Intelligent pricing algorithm design
The core objective is to maximize the following expected reward function:
where
To stabilize learning and ensure regulatory alignment, we include a dynamic risk regularization term defined as:
where
4. Experiments and results
4.1. Experimental setup and metrics
To critically evaluate our desired model, we split our dataset into 84,000 training examples, 18,000 validation examples, and 18,000 test examples in stratified proportions for class distribution. The performance metrics we use are: (1) Mispricing Loss (ML) calculated as square deviation between optimal and predicted price values based on ground-truth margins derived from histories of repayments; (2) Area Under the Receiver Operating Characteristic Curve (AUC) quantifying classification efficiency for default risk; (3) Compliance Rate (CR) calculated as percentage of priced amount within jurisdictional limits for five varying regional regulatory scenarios; and (4) Average Pricing Latency (APL) quantifying time in milliseconds to provide a price quote per instance through a cloud-deployed inference pipeline. All variants of our model were trained according to the Adam optimizer with a learning rate of 3e-4, batch size of 512, early stopping after five non-improvement epochs, and were tested on an NVIDIA A100 GPU. The hyperparameters were all toned via Bayesian search.
4.2. Comparative evaluation
The proposed multimodal model achieves an ML of 1.82 ± 0.003, outperforming logistic regression (3.21 ± 0.004), GBDT (2.64 ± 0.003), and unimodal deep networks (2.11 ± 0.003 for text-only; 2.27 ± 0.004 for behavior-only; 2.53 ± 0.005 for image-only). In terms of AUC, the multimodal architecture records 0.879, surpassing GBDT (0.834), text-only DNN (0.857), and ensemble voting baselines (0.861). The CR stands at 98.31%, satisfying the upper-bound pricing thresholds mandated by China, Singapore, the EU, the U.S., and Brazil. Furthermore, APL is maintained at 94.2 ms, well below the 150 ms industry benchmark for real-time lending decisions. Ablation studies provide deeper insights into each modality's contribution. Removing behavioral features degrades ML by 14.5% and reduces AUC by 0.041. Excluding text leads to a 9.2% drop in CR, while eliminating image data only causes a 4.1% reduction in AUC. Moreover, price error distributions show tighter concentration in multimodal outputs (standard deviation: 0.078) compared to logistic regression (0.124) and GBDT (0.096). In high-risk borrower segments (defined as π > 0.7), the proposed model reduces overpricing occurrences by 32.4% and underpricing incidents by 47.1%. This highlights its robustness in critical pricing zones.
4.3. Discussion of findings
Qualitative analysis using attention heatmaps reveals that loan applications with emotional cues such as "urgent medical bills" or "family crisis" receive heightened weights when co-occurring with erratic session navigation patterns and blurred profile images. For instance, users browsing pricing pages between 00:00 and 03:00 hours, with a median dwell time under 3.5 seconds, are 1.9x more likely to default. Similarly, applicants who submit grayscale or low-resolution images show a 7.4% higher average risk score, likely reflecting socioeconomic hardship. Text sentiment classified as negative by BERT sentiment heads correlates strongly (Pearson's r = 0.63) with pricing anomalies.
The RL pricing layer displays stable convergence over 97 episodes, with average reward variance declining to under 0.002. Comparative price heatmaps show that our model yields higher spreads in urban Tier-3 regions with lower median incomes, but compensates via lower mispricing in well-banked urban Tier-1 centers. This is illustrated in Table 2.
Feature Combination |
Avg. Price Spread (%) |
Default Rate (%) |
Correlation with Spread (r) |
Negative Sentiment Only |
3.81 |
18.2 |
0.41 |
Behavioral Irregularity Only |
4.23 |
21.7 |
0.46 |
Blurred Profile Image Only |
3.52 |
16.4 |
0.37 |
Negative Sentiment + Behavioral Irregularity |
5.67 |
26.3 |
0.61 |
All Three Combined |
6.44 |
31.1 |
0.74 |
5. Conclusion and future work
This study introduces a multimodal intelligent pricing framework for consumer finance, demonstrating substantial improvements in mispricing loss, predictive accuracy, latency, and regulatory compliance over traditional and single-modal baselines. By jointly leveraging behavioral logs, textual inputs, and profile images, the model uncovers latent risk indicators that remain hidden in conventional pricing systems. Extensive experiments across multiple jurisdictions affirm its scalability and robustness, while interpretability analyses underscore its fairness and auditability.
Future work will expand the data modalities to include voice biometrics and geospatial trajectories, enabling richer borrower profiling. Federated learning architectures will be explored to ensure data privacy and model generalization in cross-border contexts. Longitudinal studies are also planned to assess the impact of adaptive pricing on borrower behavior, loan retention, and financial inclusion over time.
References
[1]. Rezaei, Esfandyar, and Masoud Hatami. "Innovative Pricing Mechanisms: From Predictive Modeling to Real-Time Adjustments in Multi-Generation Products." International Journal of Industrial Engineering and Construction Management (IJIECM) 1.1 (2024): 30-43.
[2]. Farimani, Saeede Anbaee, Majid Vafaei Jahan, and Amin Milani Fard. "An Adaptive Multimodal Learning Model for Financial Market Price Prediction." IEEE Access (2024).
[3]. Kalisetty, Srinivas, and Phanish Lakkarasu. "Deep Learning Frameworks for Multi-Modal Data Fusion in Retail Supply Chains: Enhancing Forecast Accuracy and Agility." American Journal of Analytics and Artificial Intelligence (ajaai) with ISSN 3067-283X 2.1 (2024).
[4]. Shrestha, Yash Raj, and Vivianna Fang He. "Integrating multimodal data and machine learning for entrepreneurship research." Strategic Entrepreneurship Journal (2023).
[5]. Nweke, Obinna. "Integrating Consumer Behavior Tracking, Competitive Analysis, and Smart Algorithms for Smarter Business Strategies." International Journal of Computer Applications Technology and Research (IJCATR) 14.1 (2025): 38-45.
[6]. Sarisa, Manikanth, et al. "Stock Market Prediction Through AI: Analyzing Market Trends With Big Data Integration." Manikanth Sarisa, Gagan Kumar Patra, Chandrababu Kuraku, Siddharth Konkimalla, Venkata Nagesh Boddapati.(2024). Stock Market Prediction Through AI: Analyzing Market Trends With Big Data Integration. Migration Letters 21.4 (2024): 1846-1859.
[7]. Annapareddy, Venkata Narasareddy, et al. "Emotional Intelligence in Artificial Agents: Leveraging Deep Multimodal Big Data for Contextual Social Interaction and Adaptive Behavioral Modelling." Jai Kiran Reddy, Emotional Intelligence in Artificial Agents: Leveraging Deep Multimodal Big Data for Contextual Social Interaction and Adaptive Behavioral Modelling (April 14, 2025) (2025).
[8]. Lăzăroiu, George, et al. "Digital twin-based cyber-physical manufacturing systems, extended reality metaverse enterprise and production management algorithms, and Internet of Things financial and labor market technologies in generative artificial intelligence economics." Oeconomia Copernicana 15.3 (2024).
[9]. Debbadi, Rama Krishna, and Obed Boateng. "Optimizing end-to-end business processes by integrating machine learning models with UiPath for predictive analytics and decision automation." Int J Sci Res Arch 14.2 (2025): 778-796.
[10]. Rane, Nitin, Saurabh Choudhary, and Jayesh Rane. "Artificial intelligence, machine learning, and deep learning for sentiment analysis in business to enhance customer experience, loyalty, and satisfaction." Available at SSRN 4846145 (2024).
Cite this article
Chen,X. (2025). Intelligent Pricing Mechanism of Consumer Finance Behavior Based on Multimodal Data Integration. Theoretical and Natural Science,130,1-6.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: The 3rd International Conference on Applied Physics and Mathematical Modeling
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).
References
[1]. Rezaei, Esfandyar, and Masoud Hatami. "Innovative Pricing Mechanisms: From Predictive Modeling to Real-Time Adjustments in Multi-Generation Products." International Journal of Industrial Engineering and Construction Management (IJIECM) 1.1 (2024): 30-43.
[2]. Farimani, Saeede Anbaee, Majid Vafaei Jahan, and Amin Milani Fard. "An Adaptive Multimodal Learning Model for Financial Market Price Prediction." IEEE Access (2024).
[3]. Kalisetty, Srinivas, and Phanish Lakkarasu. "Deep Learning Frameworks for Multi-Modal Data Fusion in Retail Supply Chains: Enhancing Forecast Accuracy and Agility." American Journal of Analytics and Artificial Intelligence (ajaai) with ISSN 3067-283X 2.1 (2024).
[4]. Shrestha, Yash Raj, and Vivianna Fang He. "Integrating multimodal data and machine learning for entrepreneurship research." Strategic Entrepreneurship Journal (2023).
[5]. Nweke, Obinna. "Integrating Consumer Behavior Tracking, Competitive Analysis, and Smart Algorithms for Smarter Business Strategies." International Journal of Computer Applications Technology and Research (IJCATR) 14.1 (2025): 38-45.
[6]. Sarisa, Manikanth, et al. "Stock Market Prediction Through AI: Analyzing Market Trends With Big Data Integration." Manikanth Sarisa, Gagan Kumar Patra, Chandrababu Kuraku, Siddharth Konkimalla, Venkata Nagesh Boddapati.(2024). Stock Market Prediction Through AI: Analyzing Market Trends With Big Data Integration. Migration Letters 21.4 (2024): 1846-1859.
[7]. Annapareddy, Venkata Narasareddy, et al. "Emotional Intelligence in Artificial Agents: Leveraging Deep Multimodal Big Data for Contextual Social Interaction and Adaptive Behavioral Modelling." Jai Kiran Reddy, Emotional Intelligence in Artificial Agents: Leveraging Deep Multimodal Big Data for Contextual Social Interaction and Adaptive Behavioral Modelling (April 14, 2025) (2025).
[8]. Lăzăroiu, George, et al. "Digital twin-based cyber-physical manufacturing systems, extended reality metaverse enterprise and production management algorithms, and Internet of Things financial and labor market technologies in generative artificial intelligence economics." Oeconomia Copernicana 15.3 (2024).
[9]. Debbadi, Rama Krishna, and Obed Boateng. "Optimizing end-to-end business processes by integrating machine learning models with UiPath for predictive analytics and decision automation." Int J Sci Res Arch 14.2 (2025): 778-796.
[10]. Rane, Nitin, Saurabh Choudhary, and Jayesh Rane. "Artificial intelligence, machine learning, and deep learning for sentiment analysis in business to enhance customer experience, loyalty, and satisfaction." Available at SSRN 4846145 (2024).