Borrowing Words: Transfer Learning for Reported Speech Detection in Slovenian News Texts

Zoran Fijavž

Abstract
This paper describes the development of a reported speech clas-
sifier for Slovenian news texts using transfer learning. Due to a
lack of Slovenian training data, multilingual models were trained
on English and German reported speech datasets, reaching an
F-score of 66.8 on a small manually annotated Slovenian news
dataset and a manual error analysis was performed. While the
developed model captures many aspects of reported speech, fur-
ther refinement and annotated data would be needed to reliably
predict less frequent instances, such as indirect speech and nom-
inalizations.