Summary: | Introduction: In Information Retrieval (IR), searching process involves a query that is
matched to relevant documents using various techniques. Information retrieval
regarding AI-Qur'an involves the retrieval of verses relating to specific concepts of
interests but the contributions on the query matching are relatively low due to the nature
of the Qur'an itself. The process of extracting information from AI-Qur'an text is
complicated where the challenges come in many forms such as same concepts that
might be mentioned in different verses, a verse that may be alluded to many themes, a
concept mentioned using different words, and a term that may refer to different things
and might have different name(s). However, semantic query matching for AI-Qur'an
text can be improved by emphasizing the processes of text extraction and similarity
analysis. Therefore, this study aims to contribute to the process of semantic query
matching focusing on the domain of pilgrimage by proposing a model called ConceptĀ
Based Lattice Mining (CBLM). Methodology: The research methodology involves four
main stages that include key terms extraction, preparation of two datasets, Formal
Concept Analysis (FCA) and concept-based lattice mining process, and finally
measuring lattice similarity between FCA concept lattices. Prior to proposing the
similarity algorithm, a comparison to a base model was conducted and it was found that
the similarity formula gives similar answer to this research but it only measure first level
similarity between graphs. However, this research proposes it further step in the
algorithm to refine the degree of similarity within a dataset up to the second level.
Dataset under study were 53 verses related to Hajj and Umrah from the AI-Qur'an
(taken from AI-Hilali English extended Qur'an translation) and related hadiths. The
reference dataset was obtained based on questions and answers related to Hajj and
Umrah from the website of' Jabatan Agama dan Kemajuan Islam Malaysia' (JAKIM).
Categorization of the datasets and results were validated by domain experts and
implementation of the CBLM model in both datasets was evaluated by comparing
accuracy and Kappa values. Results: After several experiments conducted, results
showed that the accuracy obtained was from 70% to 83%, in line with the improvement
of Kappa values. Overall, the performance of the dataset of JAKIM is consistent with
the judgment by the domain experts; exhibiting its validity to be used as the reference
dataset in testing the proposed technique of the CBLM model. Similar justification
could be employed with the dataset of AI-Qur'an and Hadiths where superior
performance in terms of average precision, F-Measure, and accuracy were observed;
indicating its potential use in conjunction with the CBLM model. Since to date, there
is no published standard on the range of acceptable percentage of accuracy for nonĀ
standard datasets as in the case of this study, the accuracy obtained supported by
improved Kappa's statistic is deemed satisfactory for this study. Conclusion: Overall,
this research not only contributed to keyword extraction of Qur'anic text by proposing
a hybrid text extraction model but also highlighted the importance ofFCA theory in the
determination of the underlying concepts in Qur'anic text. It also indicates that the
CBLM model contributes as a useful technique for similarity analysis using Formal
Concept Analysis and graph theory.
|