Tomi Božak, Mitja Luštrek and Gašper Slapničar
Abstract
The field of emotion recognition from eye-tracking data is wellestablished and offers near-real-time insights into human affective states. It is less obtrusive than some other modalities, such
as electroencephalogram (EEG), electrocardiogram (ECG) and galvanic skin response (GSR), which are often used in emotion recognition tasks. This study examined the practical feasibility of
emotion recognition using an eye-tracker with a lower frequency than that typically employed in similar research. Using ocular features, we explored the efficacy of classical machine learning
(ML) models in classifying four emotions (anger, disgust, sadness, and tenderness) as well as neutral and “undefined“ emotions. The features included gaze direction, pupil size, saccadic movements, fixations, and blink data. The data from the “emotional State Estimation based on Eye-tracking database“ was preprocessed and segmented into various time windows, with 22 features extracted for model training. Feature importance analysis revealed that pupil size and fixation duration were most important for
emotion classification. The efficacy of different window lengths (1 to 10 seconds) was evaluated using Leave-One-Subject-Out (LOSO) and 10-fold cross-validation (CV). The results demonstrated that accuracies of up to 0.76 could be achieved with 10- fold CV when differentiating between positive, negative, and neutral emotions. The analysis of model performance across different window lengths revealed that longer time windows generally resulted in improved model performance. When the data was split using a marginally personalised 10-fold CV within video, the Random Forest Classifier (RF) achieved an accuracy of 0.60 in differentiating between the six aforementioned emotions.
Some challenges remain, particularly in regard to data granularity, model generalization across subjects and the impact of downsampling on feature dynamics.