arxiv:2602.18964

Yor-Sarc: A gold-standard dataset for sarcasm detection in a low-resource African language

Published on Feb 21

Authors:

Abstract

A new gold-standard dataset for sarcasm detection in Yorùbá is introduced with high inter-annotator agreement and soft labels for uncertainty-aware modeling.

AI-generated summary

Sarcasm detection poses a fundamental challenge in computational semantics, requiring models to resolve disparities between literal and intended meaning. The challenge is amplified in low-resource languages where annotated datasets are scarce or nonexistent. We present Yor-Sarc, the first gold-standard dataset for sarcasm detection in Yorùbá, a tonal Niger-Congo language spoken by over 50 million people. The dataset comprises 436 instances annotated by three native speakers from diverse dialectal backgrounds using an annotation protocol specifically designed for Yorùbá sarcasm by taking culture into account. This protocol incorporates context-sensitive interpretation and community-informed guidelines and is accompanied by a comprehensive analysis of inter-annotator agreement to support replication in other African languages. Substantial to almost perfect agreement was achieved (Fleiss' κ= 0.7660; pairwise Cohen's κ= 0.6732--0.8743), with 83.3% unanimous consensus. One annotator pair achieved almost perfect agreement (κ= 0.8743; 93.8% raw agreement), exceeding a number of reported benchmarks for English sarcasm research works. The remaining 16.7% majority-agreement cases are preserved as soft labels for uncertainty-aware modelling. Yor-Sarchttps://github.com/toheebadura/yor-sarc is expected to facilitate research on semantic interpretation and culturally informed NLP for low-resource African languages.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.18964 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.18964 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.