Comparison of document similarity algorithms in extracting document keywords from an academic paper
The idea of this study is to validate a list of keywords derived from a scientific article by a domain expert from years of knowledge with prominent document similarity algorithms. For this study, a list of handcrafted keywords generated by Electric Double Layer Capacitor (EDLC) experts are chosen,...
| Main Authors: | , , , , |
|---|---|
| Format: | Conference or Workshop Item |
| Language: | English |
| Published: |
IEEE
2021
|
| Subjects: | |
| Online Access: | https://umpir.ump.edu.my/id/eprint/33162/ |
| _version_ | 1848827296276807680 |
|---|---|
| author | Miah, M. Saef Ullah Junaida, Sulaiman Saiful, Azad Kamal Z., Zamli Jose, Rajan |
| author_facet | Miah, M. Saef Ullah Junaida, Sulaiman Saiful, Azad Kamal Z., Zamli Jose, Rajan |
| author_sort | Miah, M. Saef Ullah |
| building | UMP Institutional Repository |
| collection | Online Access |
| description | The idea of this study is to validate a list of keywords derived from a scientific article by a domain expert from years of knowledge with prominent document similarity algorithms. For this study, a list of handcrafted keywords generated by Electric Double Layer Capacitor (EDLC) experts are chosen, and relevant documents to EDLC are considered for the comparison. Then, different similarity calculation algorithms were employed in different settings on the documents such as using the whole texts of the documents, selecting the positive sentences of the documents, and generating similarity score with automatically extracted keywords from the documents. The experiment’s outcome provides us with findings that the machine-generated keywords are mostly similar to the curated list by the domain experts. This study also suggests the preferable algorithms for similarity calculation and automated key-phrase extraction for the EDLC domain. |
| first_indexed | 2025-11-15T03:58:27Z |
| format | Conference or Workshop Item |
| id | ump-33162 |
| institution | Universiti Malaysia Pahang |
| institution_category | Local University |
| language | English |
| last_indexed | 2025-11-15T03:58:27Z |
| publishDate | 2021 |
| publisher | IEEE |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | ump-331622025-10-17T02:43:54Z https://umpir.ump.edu.my/id/eprint/33162/ Comparison of document similarity algorithms in extracting document keywords from an academic paper Miah, M. Saef Ullah Junaida, Sulaiman Saiful, Azad Kamal Z., Zamli Jose, Rajan QA76 Computer software The idea of this study is to validate a list of keywords derived from a scientific article by a domain expert from years of knowledge with prominent document similarity algorithms. For this study, a list of handcrafted keywords generated by Electric Double Layer Capacitor (EDLC) experts are chosen, and relevant documents to EDLC are considered for the comparison. Then, different similarity calculation algorithms were employed in different settings on the documents such as using the whole texts of the documents, selecting the positive sentences of the documents, and generating similarity score with automatically extracted keywords from the documents. The experiment’s outcome provides us with findings that the machine-generated keywords are mostly similar to the curated list by the domain experts. This study also suggests the preferable algorithms for similarity calculation and automated key-phrase extraction for the EDLC domain. IEEE 2021 Conference or Workshop Item PeerReviewed pdf en https://umpir.ump.edu.my/id/eprint/33162/2/Comparison%20of%20document%20similarity%20algorithms%20in%20extracting.pdf Miah, M. Saef Ullah and Junaida, Sulaiman and Saiful, Azad and Kamal Z., Zamli and Jose, Rajan (2021) Comparison of document similarity algorithms in extracting document keywords from an academic paper. In: Proceedings - 2021 International Conference on Software Engineering and Computer Systems and 4th International Conference on Computational Science and Information Management, ICSECS-ICOCSIM 2021. 7th International Conference on Software Engineering and Computer Systems and 4th International Conference on Computational Science and Information Management, ICSECS-ICOCSIM 2021 , 24-26 Aug. 2021 , Pekan, Malaysia. pp. 631-636.. ISBN 978-166541407-4 (Published) https:10.1109/ICSECS52883.2021.00121 |
| spellingShingle | QA76 Computer software Miah, M. Saef Ullah Junaida, Sulaiman Saiful, Azad Kamal Z., Zamli Jose, Rajan Comparison of document similarity algorithms in extracting document keywords from an academic paper |
| title | Comparison of document similarity algorithms in extracting document keywords from an academic paper |
| title_full | Comparison of document similarity algorithms in extracting document keywords from an academic paper |
| title_fullStr | Comparison of document similarity algorithms in extracting document keywords from an academic paper |
| title_full_unstemmed | Comparison of document similarity algorithms in extracting document keywords from an academic paper |
| title_short | Comparison of document similarity algorithms in extracting document keywords from an academic paper |
| title_sort | comparison of document similarity algorithms in extracting document keywords from an academic paper |
| topic | QA76 Computer software |
| url | https://umpir.ump.edu.my/id/eprint/33162/ https://umpir.ump.edu.my/id/eprint/33162/ |