Elham Motamedi, Inna Novalija and Luis Rei
Abstract
The rapid advancement and sharing of knowledge present both
opportunities and challenges. Although the extensive range of
frameworks and resources available on the internet facilitates
tracking developments within particular knowledge fields, the
sheer volume of these resources can make the process overwhelm-
ing. Various platforms, including patent systems, news platforms,
code-sharing repositories such as GitHub, and preprint reposito-
ries such as arXiv, support the dissemination of knowledge across
different domains. Furthermore, as knowledge increasingly spans
multiple disciplines, there is a growing need to track innovations
that intersect several fields. Despite the richness of available data,
there is a gap in the literature for a comprehensive knowledge
taxonomy that enables users to effectively track and understand
innovations across different domains. Developing such a taxon-
omy and employing automated methods to classify the textual
data into relevant knowledge fields enhances the ability to track
the knowledge shared on the internet.
To generate such a taxonomy, various platforms can be lever-
aged to ensure broad coverage of diverse knowledge areas. This
study addresses this gap by focusing on patent datasets, which
are classified into detailed groups using systems such as the Co-
operative Patent Classification (CPC). The CPC classification
system organises patents into hierarchical taxonomies, which
helps streamline internal processes and enhances the efficiency
of search queries. Each patent document can be assigned multiple
labels, reflecting its relevance to several knowledge fields.
In this work, we first developed a knowledge taxonomy based
on the CPC schema. We formulated the classification of textual
data into defined knowledge fields as a multi-label problem. Then,
we evaluated the effectiveness of the classification models by
fine-tuning pre-trained transformer language models. The multi-
label framework enables the tracking of knowledge trends at the
intersection of various disciplines.