Time Period Categorization in Fiction : A Comparative Analysis of Machine Learning Techniques

作者
Fereshta Westin
出版日期
Published online: 23 Mar 2024
內容

This study investigates the automatic categorization of time period metadata in fiction, a critical but often overlooked aspect of cataloging. Using a comparative analysis approach, the performance of three machine learning techniques, namely Latent Dirichlet Allocation (LDA), Sentence-BERT (SBERT), and Term Frequency-Inverse Document Frequency (TF-IDF) were assessed, by examining their precision, recall, F1 scores, and confusion matrix results. LDA identifies underlying topics within the text, TF-IDF measures word importance, and SBERT measures sentence semantic similarity. Based on F1-score analysis and confusion matrix outcomes, TF-IDF and LDA effectively categorize text data by time period, while SBERT performed poorly across all time period categories.

刊名
Cataloging & Classification Quarterly
卷期
vol. 62, no. 2
頁數
124-153
關鍵字
Cataloging for digital resources ; time period categorization ; machine learning ; text analysis ; fiction ; LDA ; SBERT ; TF-IDF
網址連結
發布日期:2024年07月27日 最後更新:2024年07月30日