On Domain-specific Topic Modelling Using the Case of a Humanities Journal

Redzuan, Nadja; Möller, Ralf; Gehrke, Marcel; Braun, Tanya

Forschungsartikel in Online-Sammlung (Konferenz)

Zusammenfassung

Topic modelling techniques have been an important tool for meaningful information retrieval. They also hold the potential to support researchers in areas such as humanities in exploring corpora of different topics in an automated way. One prominent method, latent Dirichlet allocation (LDA), describes documents as distributions over topics and topics as distributions over words. Most applications of LDA focus on sets of tweets, news articles, wikipedia entries, or academic publications covering various topics in a large corpus. In this article, LDA is used in a rather opposite setting: a domain-specific, small-scale corpus in the form of an academic journal concerned with the studies of modern and ancient manuscripts. From this case study, we infer steps specific to dealing with domain-specific corpora.

Details zur Publikation

Veröffentlichungsjahr: 2023
Sprache, in der die Publikation verfasst istEnglisch
Link zum Volltext: https://ceur-ws.org/Vol-3580/paper4.pdf