Applying self-attention model to learn both Empirical Risk Minimization and Invariant Risk Minimization for multimedia recommendation

Research Article
Open access

Applying self-attention model to learn both Empirical Risk Minimization and Invariant Risk Minimization for multimedia recommendation

Hanyu Zhao 1* , Yangqi Huang 2 , Kunqi Zhao 3 , Sizhuo Wang 4
  • 1 Bachelor of Engineering in Computer Science and Technology (Honors), Xiamen University Malaysia, Sepang Selangor Malaysia, 43900    
  • 2 Bachelor of Economics in Finance (Honors), Xiamen University Malaysia, Sepang Selangor Malaysia, 43900    
  • 3 Kunqi Zhao, Sauder School of Business, The University of British Columbia, Vancouver BC V6T 1Z2, Canada    
  • 4 Faculty of Innovation Engineering, Macau University of Science and Technology, Wang Sizhuo, 999078, China    
  • *corresponding author CST2009155@xmu.edu.my
ACE Vol.44
ISSN (Print): 2755-273X
ISSN (Online): 2755-2721
ISBN (Print): 978-1-83558-327-2
ISBN (Online): 978-1-83558-328-9

Abstract

Multimedia recommendation systems have many applications in our daily life. However, how accurately capture a customer's preference is an issue that is difficult to deal with. The proposed Invariant Risk Minimization (IRM) and Empirical Risk Minimization (ERM) are ways to learn a customer's preference. Still, both frameworks show some limitations: although ERM performs excellently in a single environment, it fails to generalize well when faced with multiple and new domains. On the other hand, IRM learns invariant features across heterogeneous environments, but it lacks theoretical guarantees and performs less effectively where the invariants are unclear. This paper proposes an ERM and IRM Optimized Rating Framework (EIOR) as our final recommender model with direct rating scores. The EIOR enhances the accuracy and functionality of the multimedia recommendation systems by utilizing self-attention mechanisms to combine IRM and ERM with adjusted attention weights. Specifically, IRM learns invariant parts across different environments, while ERM learns variant parts. With self-attention, we can adaptively allocate attention weights for the two pieces and seek the optimal pair of attention weights based on the loss function. We demonstrate EIOR on a cutting-edge recommender model UltraGCN and use the open multimedia dataset of TikTok to finish all the experiments. The results validate the effectiveness of EIOR by comparing purely operating invariant representations alone with the framework of IRM.

Keywords:

Invariant Risk Minimization (IRM), Empirical Risk Minimization (ERM), Self-attention Mechanisms, Invariant Learning

Zhao,H.;Huang,Y.;Zhao,K.;Wang,S. (2024). Applying self-attention model to learn both Empirical Risk Minimization and Invariant Risk Minimization for multimedia recommendation. Applied and Computational Engineering,44,33-47.
Export citation

References

[1]. Kartik Ahuja, Jun Wang, Amit Dhurandhar, Karthikeyan Shanmugam Kush R. Varshney, Empirical or Invariant Risk Minimization? A Sample Complexity Perspective

[2]. Du, X., Wu, Z., Feng, F., He, X., & Tang, J. (2022). Invariant Representation Learning for multimedia recommendation. Proceedings of the 30th ACM International Conference on Multimedia.

[3]. Si, Z., Han, X., Zhang, X., Xu, J., Yin, Y., Song, Y., & Wen, J.-R. (2022). A Model-Agnostic Causal Learning Framework for Recommendation using Search Data. Proceedings of the ACM Web Conference 2022, 224–233. https://doi.org/10.1145/3485447.3511951

[4]. Creager, E., Jacobsen, J.-H., & Zemel, R. (2021, July 1). Environment inference for invariant learning. PMLR. Retrieved April 1, 2023, from https://proceedings.mlr.press/v139/creager21a.html?utm_campaign=The+Batch&utm_source=hs_email&utm_medium=email&_hsenc=p2ANqtz-9GoNXKtgh3kIYhDbN6wuqn6vTgNYaUE_B6t5EpPdQ9phgpRXVhYpkLoFHDJ7S-TWBi8nwc

[5]. Du, X., Wu, Z., Feng, F., He, X., & Tang, J. (2022). Invariant Representation Learning for multimedia recommendation. Proceedings of the 30th ACM International Conference on Multimedia. https://doi.org/10.1145/3503161.3548405

[6]. Lin, Y., Qing, L., & Zhang, T. (2021). An Empirical Study of Invariant Risk Minimization on Deep Models. Presented at the ICML 2021 Workshop on Uncertainty and Robustness in Deep Learn. Retrieved April 2, 2023, from http://www.gatsby.ucl.ac.uk/~balaji/udl2021/accepted-papers/UDL2021-paper-044.pdf.

[7]. Niu, Z., Zhong, G., & Yu, H. (2021). A review on the attention mechanism of Deep Learning. Neurocomputing, 452, 48–62. https://doi.org/10.1016/j.neucom.2021.03.091

[8]. Song, K., Yao, T., Ling, Q., & Mei, T. (2018). Boosting image sentiment analysis with visual attention. Neurocomputing, 312, 218–228. https://doi.org/10.1016/j.neucom.2018.05.104

[9]. Yan, X., Hu, S., Mao, Y., Ye, Y., & Yu, H. (2021). Deep multi-view learning methods: A Review. Neurocomputing, 448, 106–129. https://doi.org/10.1016/j.neucom.2021.03.090

[10]. Li, Y., Yang, L., Xu, B., Wang, J., & Lin, H. (2019). Improving user attribute classification with text and social network attention. Cognitive Computation, 11(4), 459–468. https://doi.org/10.1007/s12559-019-9624-y

[11]. Liu, G., & Guo, J. (2019). Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing, 337, 325–338. https://doi.org/10.1016/j.neucom.2019.01.078

[12]. Long, X., Gan, C., de Melo, G., Wu, J., Liu, X., & Wen, S. (2018). Attention clusters: Purely attention based local feature integration for video classification. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2018.00817

[13]. Huang, L., Wang, W., Chen, J., & Wei, X.-Y. (2019). Attention on attention for image captioning. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv.2019.00473

[14]. Wang, S., Hu, L., Cao, L., Huang, X., Lian, D., & Liu, W. (2018). Attention-based transactional context embedding for next-item recommendation. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.11851

[15]. Celikik, M., Wasilewski, J., Mbarek, S., Celayes, P., Gagliardi, P., Pham, D., Karessli, N., & Ramallo, A. P. (2023). Reusable self-attention-based recommender system for Fashion. Lecture Notes in Electrical Engineering, 45–61. https://doi.org/10.1007/978-3-031-22192-7_3

[16]. Ying, H., Zhuang, F., Zhang, F., Liu, Y., Xu, G., Xie, X., Xiong, H., & Wu, J. (2018). Sequential Recommender system based on hierarchical attention networks. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2018/546

[17]. Xu, C., Zhao, P., Liu, Y., Sheng, V. S., Xu, J., Zhuang, F., Fang, J., & Zhou, X. (2019). Graph contextualized self-attention network for session-based recommendation. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/547

[18]. Kelong Mao, Jieming Zhu, Xi Xiao, Biao Lu, Zhaowei Wang, and Xiuqiang He. 2021. UltraGCN: Ultra Simplification of Graph Convolutional Networks for Recommendation. In International Conference on Information and Knowledge Management, Proceedings. 1253–1262. https://doi.org/10.1145/3459637.3482291 arXiv:2110.15114

[19]. Jiashuo Liu, Zheyuan Hu, Peng Cui, Bo Li, and Zheyan Shen. 2021. Kernelized Heterogeneous Risk Minimization. In Advances in Neural Information Processing Systems, Vol. 26. PMLR, 21720–21731. arXiv:2110.12425

[20]. Yinwei Wei, Xiang Wang, Xiangnan He, Liqiang Nie, Yong Rui, and Tat Seng Chua. 2022. Hierarchical User Intent Graph Network for Multimedia Recommendation. IEEE Transactions on Multimedia 24 (2022), 2701–2712. TMM.2021.3088307. https://doi.org/10.1109/ arXiv:2110.14925

[21]. Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, and Tat Seng Chua. 2020. Graph-Refined Convolutional Network for Multimedia Recommendation with Implicit Feedback. Proceedings of the 28th ACM International Conference on Multimedia (2020), 3541–3549. https://doi.org/10.1145/3394171.3413556 arXiv:2111.02036

[22]. Ruining He and Julian McAuley. 2016. VBPR: Visual Bayesian personalized ranking from implicit feedback. In 30th AAAI Conference on Artificial Intelligence, Vol. 30. 144–150. https://doi.org/10.1609/aaai.v30i1.9973 arXiv:1510.01784

[23]. Xue Geng, Hanwang Zhang, Jingwen Bian, and Tat Seng Chua. 2015. Learning image and user features for recommendation in social networks. In Proceedings of the IEEE International Conference on Computer Vision, Vol. 2015 Inter. 4274–4282. https://doi.org/10.1109/ICCV.2015.486

[24]. Oren Barkan, Noam Koenigstein, Eylon Yogev, and Ori Katz. 2019. CB2CF: A neural multiview content-to-collaborative filtering model for completely cold item recommendations. In 13th ACM Conference on Recommender Systems. 228–236. https://doi.org/10.1145/3298689.3347038

[25]. Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat Seng Chua. 2019. Neural graph collaborative filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 165–174. https://doi.org/10.1145/3331184.3331267 arXiv:1905.08108

[26]. Jianxin Ma, Peng Cui, Kun Kuang, Xin Wang, and Wenwu Zhu. 2019. Disentangled graph convolutional networks. In 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 2019-June), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 7454–7463.

[27]. Jianxin Ma, Chang Zhou, Peng Cui, Hongxia Yang, and Wenwu Zhu. 2019. Learning disentangled representations for recommendation. Advances in Neural Information Processing Systems 32 (2019). arXiv:1910.14238

[28]. Yinwei Wei, Xiangnan He, Xiang Wang, Richang Hong, Liqiang Nie, and Tat Seng Chua. 2019. MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In Proceedings of the 27th ACM International Conference on Multimedia. 1437–1445. https://doi.org/10.1145/3343031.3351034

[29]. Diederik P. Kingma and Jimmy Lei Ba. 2015. Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations (2015). arXiv:1412.6980

[30]. Du, X., He, X., Yuan, F., Tang, J., Qin, Z., & Chua, T. S. (2019). Modeling embedding dimension correlations via convolutional neural collaborative filtering. ACM Transactions on Information Systems (TOIS), 37(4), 1-22.

[31]. Wang, X., He, X., Wang, M., Feng, F., & Chua, T. S. (2019, July). Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval (pp. 165-174).

[32]. Chung, Y. H., & Chen, Y. L. (2021, December). Social Recommendation System with Multimodal Collaborative Filtering. In 2021 IEEE Global Communications Conference (GLOBECOM) (pp. 1-7). IEEE.

[33]. Tian, X., Ding, C. H., Chen, S., Luo, B., & Wang, X. (2021). Regularization graph convolutional networks with data augmentation. Neurocomputing, 436, 92-102.

[34]. Cordonnier, J.-B., Loukas, A., & Jaggi, M. (2021, May 20). Multi-head attention: Collaborate instead of Concatenate. arXiv.org. https://arxiv.org/abs/2006.16362


Cite this article

Zhao,H.;Huang,Y.;Zhao,K.;Wang,S. (2024). Applying self-attention model to learn both Empirical Risk Minimization and Invariant Risk Minimization for multimedia recommendation. Applied and Computational Engineering,44,33-47.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 2023 International Conference on Machine Learning and Automation

ISBN:978-1-83558-327-2(Print) / 978-1-83558-328-9(Online)
Editor:Mustafa İSTANBULLU
Conference website: https://2023.confmla.org/
Conference date: 18 October 2023
Series: Applied and Computational Engineering
Volume number: Vol.44
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).

References

[1]. Kartik Ahuja, Jun Wang, Amit Dhurandhar, Karthikeyan Shanmugam Kush R. Varshney, Empirical or Invariant Risk Minimization? A Sample Complexity Perspective

[2]. Du, X., Wu, Z., Feng, F., He, X., & Tang, J. (2022). Invariant Representation Learning for multimedia recommendation. Proceedings of the 30th ACM International Conference on Multimedia.

[3]. Si, Z., Han, X., Zhang, X., Xu, J., Yin, Y., Song, Y., & Wen, J.-R. (2022). A Model-Agnostic Causal Learning Framework for Recommendation using Search Data. Proceedings of the ACM Web Conference 2022, 224–233. https://doi.org/10.1145/3485447.3511951

[4]. Creager, E., Jacobsen, J.-H., & Zemel, R. (2021, July 1). Environment inference for invariant learning. PMLR. Retrieved April 1, 2023, from https://proceedings.mlr.press/v139/creager21a.html?utm_campaign=The+Batch&utm_source=hs_email&utm_medium=email&_hsenc=p2ANqtz-9GoNXKtgh3kIYhDbN6wuqn6vTgNYaUE_B6t5EpPdQ9phgpRXVhYpkLoFHDJ7S-TWBi8nwc

[5]. Du, X., Wu, Z., Feng, F., He, X., & Tang, J. (2022). Invariant Representation Learning for multimedia recommendation. Proceedings of the 30th ACM International Conference on Multimedia. https://doi.org/10.1145/3503161.3548405

[6]. Lin, Y., Qing, L., & Zhang, T. (2021). An Empirical Study of Invariant Risk Minimization on Deep Models. Presented at the ICML 2021 Workshop on Uncertainty and Robustness in Deep Learn. Retrieved April 2, 2023, from http://www.gatsby.ucl.ac.uk/~balaji/udl2021/accepted-papers/UDL2021-paper-044.pdf.

[7]. Niu, Z., Zhong, G., & Yu, H. (2021). A review on the attention mechanism of Deep Learning. Neurocomputing, 452, 48–62. https://doi.org/10.1016/j.neucom.2021.03.091

[8]. Song, K., Yao, T., Ling, Q., & Mei, T. (2018). Boosting image sentiment analysis with visual attention. Neurocomputing, 312, 218–228. https://doi.org/10.1016/j.neucom.2018.05.104

[9]. Yan, X., Hu, S., Mao, Y., Ye, Y., & Yu, H. (2021). Deep multi-view learning methods: A Review. Neurocomputing, 448, 106–129. https://doi.org/10.1016/j.neucom.2021.03.090

[10]. Li, Y., Yang, L., Xu, B., Wang, J., & Lin, H. (2019). Improving user attribute classification with text and social network attention. Cognitive Computation, 11(4), 459–468. https://doi.org/10.1007/s12559-019-9624-y

[11]. Liu, G., & Guo, J. (2019). Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing, 337, 325–338. https://doi.org/10.1016/j.neucom.2019.01.078

[12]. Long, X., Gan, C., de Melo, G., Wu, J., Liu, X., & Wen, S. (2018). Attention clusters: Purely attention based local feature integration for video classification. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2018.00817

[13]. Huang, L., Wang, W., Chen, J., & Wei, X.-Y. (2019). Attention on attention for image captioning. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv.2019.00473

[14]. Wang, S., Hu, L., Cao, L., Huang, X., Lian, D., & Liu, W. (2018). Attention-based transactional context embedding for next-item recommendation. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.11851

[15]. Celikik, M., Wasilewski, J., Mbarek, S., Celayes, P., Gagliardi, P., Pham, D., Karessli, N., & Ramallo, A. P. (2023). Reusable self-attention-based recommender system for Fashion. Lecture Notes in Electrical Engineering, 45–61. https://doi.org/10.1007/978-3-031-22192-7_3

[16]. Ying, H., Zhuang, F., Zhang, F., Liu, Y., Xu, G., Xie, X., Xiong, H., & Wu, J. (2018). Sequential Recommender system based on hierarchical attention networks. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2018/546

[17]. Xu, C., Zhao, P., Liu, Y., Sheng, V. S., Xu, J., Zhuang, F., Fang, J., & Zhou, X. (2019). Graph contextualized self-attention network for session-based recommendation. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/547

[18]. Kelong Mao, Jieming Zhu, Xi Xiao, Biao Lu, Zhaowei Wang, and Xiuqiang He. 2021. UltraGCN: Ultra Simplification of Graph Convolutional Networks for Recommendation. In International Conference on Information and Knowledge Management, Proceedings. 1253–1262. https://doi.org/10.1145/3459637.3482291 arXiv:2110.15114

[19]. Jiashuo Liu, Zheyuan Hu, Peng Cui, Bo Li, and Zheyan Shen. 2021. Kernelized Heterogeneous Risk Minimization. In Advances in Neural Information Processing Systems, Vol. 26. PMLR, 21720–21731. arXiv:2110.12425

[20]. Yinwei Wei, Xiang Wang, Xiangnan He, Liqiang Nie, Yong Rui, and Tat Seng Chua. 2022. Hierarchical User Intent Graph Network for Multimedia Recommendation. IEEE Transactions on Multimedia 24 (2022), 2701–2712. TMM.2021.3088307. https://doi.org/10.1109/ arXiv:2110.14925

[21]. Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, and Tat Seng Chua. 2020. Graph-Refined Convolutional Network for Multimedia Recommendation with Implicit Feedback. Proceedings of the 28th ACM International Conference on Multimedia (2020), 3541–3549. https://doi.org/10.1145/3394171.3413556 arXiv:2111.02036

[22]. Ruining He and Julian McAuley. 2016. VBPR: Visual Bayesian personalized ranking from implicit feedback. In 30th AAAI Conference on Artificial Intelligence, Vol. 30. 144–150. https://doi.org/10.1609/aaai.v30i1.9973 arXiv:1510.01784

[23]. Xue Geng, Hanwang Zhang, Jingwen Bian, and Tat Seng Chua. 2015. Learning image and user features for recommendation in social networks. In Proceedings of the IEEE International Conference on Computer Vision, Vol. 2015 Inter. 4274–4282. https://doi.org/10.1109/ICCV.2015.486

[24]. Oren Barkan, Noam Koenigstein, Eylon Yogev, and Ori Katz. 2019. CB2CF: A neural multiview content-to-collaborative filtering model for completely cold item recommendations. In 13th ACM Conference on Recommender Systems. 228–236. https://doi.org/10.1145/3298689.3347038

[25]. Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat Seng Chua. 2019. Neural graph collaborative filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 165–174. https://doi.org/10.1145/3331184.3331267 arXiv:1905.08108

[26]. Jianxin Ma, Peng Cui, Kun Kuang, Xin Wang, and Wenwu Zhu. 2019. Disentangled graph convolutional networks. In 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 2019-June), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 7454–7463.

[27]. Jianxin Ma, Chang Zhou, Peng Cui, Hongxia Yang, and Wenwu Zhu. 2019. Learning disentangled representations for recommendation. Advances in Neural Information Processing Systems 32 (2019). arXiv:1910.14238

[28]. Yinwei Wei, Xiangnan He, Xiang Wang, Richang Hong, Liqiang Nie, and Tat Seng Chua. 2019. MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In Proceedings of the 27th ACM International Conference on Multimedia. 1437–1445. https://doi.org/10.1145/3343031.3351034

[29]. Diederik P. Kingma and Jimmy Lei Ba. 2015. Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations (2015). arXiv:1412.6980

[30]. Du, X., He, X., Yuan, F., Tang, J., Qin, Z., & Chua, T. S. (2019). Modeling embedding dimension correlations via convolutional neural collaborative filtering. ACM Transactions on Information Systems (TOIS), 37(4), 1-22.

[31]. Wang, X., He, X., Wang, M., Feng, F., & Chua, T. S. (2019, July). Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval (pp. 165-174).

[32]. Chung, Y. H., & Chen, Y. L. (2021, December). Social Recommendation System with Multimodal Collaborative Filtering. In 2021 IEEE Global Communications Conference (GLOBECOM) (pp. 1-7). IEEE.

[33]. Tian, X., Ding, C. H., Chen, S., Luo, B., & Wang, X. (2021). Regularization graph convolutional networks with data augmentation. Neurocomputing, 436, 92-102.

[34]. Cordonnier, J.-B., Loukas, A., & Jaggi, M. (2021, May 20). Multi-head attention: Collaborate instead of Concatenate. arXiv.org. https://arxiv.org/abs/2006.16362