Running
13
Mezura
🥇
Compare and evaluate LLM performance across multiple benchmarks
None defined yet.
Parrot: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs
TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval