Modeling Induced Pleasure through Cognitive Appraisal Prediction via Multimodal Fusion
arXiv cs.AI / 4/28/2026
📰 NewsModels & Research
Key Points
- The paper studies how visual content in video influences a viewer’s cognitive appraisal and produces specific affective experiences like pleasure, addressing a gap in multimodal affective computing.
- It proposes a new computational model that predicts video-induced pleasure by estimating cognitive appraisal variables, aiming to clarify why “positive emotions” differ from “pleasure.”
- The method tackles practical research challenges including inconsistent/noisy human labels, limited availability of pleasure-specific datasets, and poor interpretability of existing black-box multimodal fusion approaches.
- Using transformer-based multimodal feature extraction with attention and an interpretable fusion design, the model targets both inter- and intra-modal dynamics relevant to pleasure.
- Experiments report peak accuracy of 0.6624 for predicting pleasure levels, and the results suggest potential for affective recommendation and more explainable intelligent media creation.
Related Articles

An improvement of the convergence proof of the ADAM-Optimizer
Dev.to
We built an AI that runs an entire business autonomously. Not a demo. Not a prototype. Actually running. YC-backed, here's what we learned.
Reddit r/artificial
langchain-tests==1.1.7
LangChain Releases
Why isn’t LLM reasoning done in vector space instead of natural language?
Reddit r/LocalLLaMA
llama.cpp's Preliminary SM120 Native NVFP4 MMQ Is Merged
Reddit r/LocalLLaMA