Predicting Pronunciation Types in the Sloleks Morphological Lexicon of Slovene

Jaka Čibej

Abstract
In the paper, we present an experiment in automatic prediction
of pronunciation types for lemmas in the Sloleks Morphological
Lexicon of Slovene. We perform a statistical analysis on a num-
ber of mostly n-gram-based features and use a set of statistically
significant features to train and test several machine learning
models to discriminate between lemmas for which pronuncia-
tion transcription can be generated automatically using Slovene
grapheme-to-phoneme (G2P) conversion rules (e.g. Novak), and
lemmas with pronunciation that follows other G2P rules (e.g.
Shakespeare).