Alexander Perko and Franz Wotawa
Abstract
Large Language Models and chat interfaces like ChatGPT have
become increasingly important recently, receiving a lot of attention
even from the general public. People use these tools not only
to summarize or translate text but also to answer questions, including
medical ones. For the latter, giving reliable feedback is of
utmost importance, which is hard to assess. Therefore, we focus
on validating the feedback of ChatGPT and propose a testing procedure
utilizing other medical sources to determine the quality
of feedback for more straightforward medical diagnostic tasks.
This paper outlines the problem, discusses available sources, and
introduces the validation method. Moreover, we present the first
results obtained when applying the testing framework to Chat-
GPT.