BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs
arXiv cs.CL / 3/13/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- BTZSC is introduced as a comprehensive zero-shot text classification benchmark spanning 22 public datasets across sentiment, topic, intent, and emotion classification.
- It benchmarks four major model families—NLI cross-encoders, embedding models, rerankers, and instruction-tuned LLMs—across 38 public and custom checkpoints.
- Key findings include that modern rerankers (e.g., Qwen3-Reranker-8B) set a new state-of-the-art with macro F1 of 0.72, and embedding models like GTE-large-en-v1.5 offer strong accuracy with favorable latency.
- Instruction-tuned LLMs with 4–12B parameters achieve macro F1 up to 0.67, performing well on topic classification but lagging specialized rerankers, while NLI cross-encoders plateau as backbone size grows.
- The authors publicly release BTZSC and evaluation code to support fair and reproducible progress in zero-shot text understanding.




