{"id":16779,"date":"2024-10-01T10:50:19","date_gmt":"2024-10-01T08:50:19","guid":{"rendered":"https:\/\/is.ijs.si\/?p=16779"},"modified":"2025-03-26T09:02:47","modified_gmt":"2025-03-26T08:02:47","slug":"pandachat-rag-towards-the-benchmark-for-slovenian-rag-applications","status":"publish","type":"post","link":"https:\/\/is.ijs.si\/?p=16779","title":{"rendered":"PandaChat-RAG: Towards the Benchmark for Slovenian RAG Applications"},"content":{"rendered":"\n<p>Taja Kuzman, Tanja Pavleska, Urban Rupnik and Primo\u017e Cigoj<\/p>\n<p>Abstract<br \/>Retrieval-augmented generation (RAG) is a recent method for<br \/>enriching the large language models\u2019 text generation abilities<br \/>with external knowledge through document retrieval. Due to<br \/>its high usefulness for various applications, it already powers<br \/>multiple products. However, despite the widespread adoption,<br \/>there is a notable lack of evaluation benchmarks for RAG systems,<br \/>particularly for less-resourced languages. This paper introduces<br \/>the PandaChat-RAG \u2013 the first Slovenian RAG benchmark established on a newly developed test dataset. The test dataset is based<br \/>on the semi-automatic extraction of authentic questions and answers from a genre-annotated web corpus. The methodology for<br \/>the test dataset construction can be efficiently applied to any of<br \/>the comparable corpora in numerous European languages. The<br \/>test dataset is used to assess the RAG system\u2019s performance in retrieving relevant sources essential for providing accurate answers<br \/>to the given questions. The evaluation involves comparing the<br \/>performance of eight open- and closed-source embedding models,<br \/>and investigating how the retrieval performance is influenced<br \/>by factors such as the document chunk size and the number of<br \/>retrieved sources. These findings contribute to establishing the<br \/>guidelines for optimal RAG system configurations not only for<br \/>Slovenian, but also for other languages.<\/p>\n<p>\u00a0<\/p>\n\n\n\n<div data-wp-interactive=\"core\/file\" class=\"wp-block-file\"><object data-wp-bind--hidden=\"!state.hasPdfPreview\" hidden class=\"wp-block-file__embed\" data=\"https:\/\/is.ijs.si\/wp-content\/uploads\/2024\/10\/SCAI_2024_paper_0538.pdf\" type=\"application\/pdf\" style=\"width:100%;height:600px\" aria-label=\"Embed of SCAI_2024_paper_0538.\"><\/object><a id=\"wp-block-file--media-84128833-2adb-423a-8bc9-a9f234ffaff3\" href=\"https:\/\/is.ijs.si\/wp-content\/uploads\/2024\/10\/SCAI_2024_paper_0538.pdf\">SCAI_2024_paper_0538<\/a><a href=\"https:\/\/is.ijs.si\/wp-content\/uploads\/2024\/10\/SCAI_2024_paper_0538.pdf\" class=\"wp-block-file__button wp-element-button\" download aria-describedby=\"wp-block-file--media-84128833-2adb-423a-8bc9-a9f234ffaff3\">Download<\/a><\/div>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":29,"featured_media":24966,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[105,102],"tags":[],"class_list":["post-16779","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-doi-skui-2024","category-papers"],"_links":{"self":[{"href":"https:\/\/is.ijs.si\/index.php?rest_route=\/wp\/v2\/posts\/16779","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/is.ijs.si\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/is.ijs.si\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/is.ijs.si\/index.php?rest_route=\/wp\/v2\/users\/29"}],"replies":[{"embeddable":true,"href":"https:\/\/is.ijs.si\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=16779"}],"version-history":[{"count":1,"href":"https:\/\/is.ijs.si\/index.php?rest_route=\/wp\/v2\/posts\/16779\/revisions"}],"predecessor-version":[{"id":16782,"href":"https:\/\/is.ijs.si\/index.php?rest_route=\/wp\/v2\/posts\/16779\/revisions\/16782"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/is.ijs.si\/index.php?rest_route=\/wp\/v2\/media\/24966"}],"wp:attachment":[{"href":"https:\/\/is.ijs.si\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=16779"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/is.ijs.si\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=16779"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/is.ijs.si\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=16779"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}