Jaka Čibej
Abstract
In the paper, we present an experiment in automatic prediction
of pronunciation types for lemmas in the Sloleks Morphological
Lexicon of Slovene. We perform a statistical analysis on a num-
ber of mostly n-gram-based features and use a set of statistically
significant features to train and test several machine learning
models to discriminate between lemmas for which pronuncia-
tion transcription can be generated automatically using Slovene
grapheme-to-phoneme (G2P) conversion rules (e.g. Novak), and
lemmas with pronunciation that follows other G2P rules (e.g.
Shakespeare).