MAchine Readable Cataloging to MAchine Understandable Data with Distributed Big Data Management

作者
Sharma, Kumar ; Marjit, Ujjal ; Biswas, Utpal
出版者
Taylor & Francis
出版日期
2018.04
內容
In recent years, the library domain has been using semantic web technologies to enable the data-centric information that can be processed directly by machines. Attempts have been evolved for data transitioning from MAchine-Readable Cataloging (MARC) formats into the Resource Description Framework (RDF). Storing library data in RDF format enhances interlinking and reusing of the resources on the web. Moreover, the machine can interpret library resources meaningfully because of rich source of semantics. Existing approaches rely on the single-node environment but they fail when they meet the large volume of the input data. Some of the bibliographic records in MARC 21 formats are huge in size that traditional data-management tools become incapable during data processing and requires larger storage area. Such data need serious attention by the systems that can perform tasks in parallel. In this article, we propose a distributed approach to convert legacy library data into RDF format using Apache Spark and Hadoop. We describe the process of data conversion from MARC 21 formats for Bibliographic data into RDF and show preliminary reports on the processing speed and storage analysis. The performance of the conversion process is improved in terms of processing time and the storage size.
刊名
Journal of Library Metadata
卷期
18(1)
頁數
13-29
發布日期:2018年09月13日 最後更新:2018年12月18日