AI Model Combines Text and Images to Detect Fake Online Reviews with 93.4% Accuracy

In an effort to combat the increasing threat of fake online reviews, researchers have introduced a multimodal deep learning framework that combines the power of text and image analysis. The new system utilizes BERT for textual encoding and ResNet-50 for extracting visual features from review images, fusing these representations to predict review authenticity. The framework was trained and tested on a curated dataset of 21,142 user-uploaded images across food delivery, hospitality, and e-commerce domains. Experimental results demonstrated that the multimodal model outperforms unimodal baselines, achieving an F1-score of 0.934 on the test set. The model's ability to detect subtle inconsistencies, such as exaggerated textual praise paired with unrelated or low-quality images, highlights the critical role of multimodal learning in safeguarding digital trust. According to the researchers, this scalable solution offers a significant advancement for content moderation across various online platforms, addressing the limitations of existing detection models that primarily rely on textual data. The framework's architecture is designed for extensibility, allowing for future integration of more advanced AI models and additional modalities, including user behavior data. Ethical considerations are also being addressed to ensure fairness and transparency in content moderation.