A Prompt-based Vision-language Model for Offensive Meme Detection  
Author Xiaoyu Guo


Co-Author(s) Jing Ma; Xufeng Zhao; Yu Bai; Yongwei Chi


Abstract Internet memes have become prevalent as a mean to share the public opinion through the social media by mixing text and image. Research enabling automated analysis of memes has gained attention in recent years, including among others the task of detection the offensive meme. In this paper, we propose a novel model, prompt-based vision-language model (PVM), for detecting offensive meme. PVM is a multi-modal model that leverages the benefits of deep learning in combination with prompt learning, which enhances the features of text. We make use of the cloze questions as the prefix of text to unlock the potential of language models and fuse them with image features.


Keywords Internet Memes, multi-modal, prompt learning, disinformation detection, vision-language model
    Article #:  DSBFI23-76
Proceedings of 2nd ISSAT International Conference on Data Science in Business, Finance and Industry
January 8-10, 2023 - Da Nang, Vietnam