Swati Swati and Dunja Mladenić
Abstract
The rise of digital media enhances information accessibility but
also introduces challenges related to the quality and impartiality
of news reporting, particularly regarding biases that influence
public perception during key global events. In response, this
study introduces LLNewsBias, a dataset designed to detect and
analyze political bias in multilingual news headlines, covering
four major events from 2019 to 2022—Brexit, COVID-19, the 2020
U.S. election, and the Ukraine-Russia war. With over 350,000
headlines in 17 languages, annotated with bias labels, this dataset
is compiled using Media Bias/Fact Check and Event Registry. Our
contributions include a structured framework for data collection
and organization, enabling event-wise and year-wise analysis
while supporting lifelong learning. We also highlight potential
use cases that demonstrate the dataset’s utility in advancing bias
prediction models, multilingual adaptation, and model robustness.
Additionally, we discuss the dataset’s limitations, addressing po-
tential biases, sample size constraints, and contextual factors.
This work provides a valuable resource for improving bias detec-
tion in dynamic, multilingual news environments, contributing
to the development of more accurate and adaptable models in
natural language processing and media studies.