RoIt-XMASA: Multi-Domain Multilingual Sentiment Analysis Dataset for Romanian and Italian

arXiv cs.CL / 4/21/2026

📰 NewsModels & Research

共有:

Key Points

The paper introduces RoIt-XMASA, a multilingual sentiment analysis dataset that adds Italian and Romanian to a cross-lingual, multi-domain Amazon reviews setting.
The dataset contains 36,000 labeled reviews across three domains (books, movies, music) plus 202,141 unlabeled samples, enabling both supervised and unsupervised or semi-supervised workflows.
To handle cross-lingual and cross-domain transfer, the authors propose a multi-target adversarial training method using loss reversal with meta-learned coefficients to balance sentiment accuracy against domain/language invariance.
Experiments show XLM-R reaching an F1 of 66.23%, a 4.64% improvement over baseline, while few-shot tests indicate Llama-3.1-8B obtains 58.43% F1, highlighting a trade-off between prompting efficiency and fine-tuning performance.

Abstract

We present RoIt-XMASA, a multilingual dataset that extends the Cross-lingual Multi-domain Amazon Sentiment Analysis to Italian and Romanian, comprising 36,000 labeled reviews across three domains (books, movies, and music) and 202,141 unlabeled samples. To address cross-lingual and cross-domain challenges, we propose a multi-target adversarial training framework that employs loss reversal with meta-learned coefficients to dynamically balance sentiment discrimination with domain and language invariance. XLM-R achieves an F1-score of 66.23% with our approach, outperforming the baseline by 4.64%. Few-shot evaluation shows that Llama-3.1-8B achieves 58.43% F1-score, revealing a meaningful trade-off between the efficiency of prompting-based approaches and the higher performance of task-specific fine-tuning.

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims

Dev.to

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM

Reddit r/LocalLLaMA

Where is Grok-2 Mini and Grok-3 (mini)?

Reddit r/LocalLLaMA

RoIt-XMASA: Multi-Domain Multilingual Sentiment Analysis Dataset for Romanian and Italian

Key Points

Abstract

Related Articles

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM

Where is Grok-2 Mini and Grok-3 (mini)?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer