| Summary: | As a result of advances in information technology, information overload becomes a global problem. .Automatic text summarization, a branch of natural language processing, is one of the techniques that can be used to overcome the challenge. Automatic text summarization is a technique used to summarize a text without losing the essential information. Although there are many commercial text summarization tools available online, there is a limited research to summarize text automatically in Rausa. This study was conducted to develop a system to summarize text automatically in Rausa language. Rausa, a Chadic language that is widely spoken in West Africa, is a low resource language. A data set of 10 Rausa documents were extracted from two different newspapers which are 'Aminiya' and 'Leadership Rausa'. Each document was given to three linguistic experts for human made summary. The study adopted five features (keyword, length, title, cue phrases and location of a sentence) in the summarization process. Rausa morphological rules were reviewed, while Porter's algorithm was modified to fit the language. Meanwhile, a stemming algorithm was developed to stem Rausa terms. Term Frequency Inverse Sentence Frequency and Kmixture Probabilistic models were used to weigh each word before and after stemming. A set of words was chosen as keywords based on a threshold value. The keywords were used to produce summaries based on the models. This is to determine the fitness of the models and the impact of stemming on automatic text summarization for the Rausa language. Moreover, Naive Bayes model was employed to weigh each sentence based on its features. The system has produced a set of summary of sentences based on the threshold value. Considering human made summaries are perfect, the researcher has compared the system generated output to human made summaries. The results show that, the Term Frequency Inverse Sentence Frequency model, having an average F-score of 56.0% outclasses K-mixture Probabilistic model with 38.9%. This is based on automatic text summarization with stemming. The Term Frequency Inverse Sentence Frequency model with 34.8% f-score . has also out performed the K-mixture Probabilistic model with 26.3% based on automatic text summarization without stemming. The overall system testing revealed an average Fscore of 78.1 %. The result obtained from the automatic text summarization when tested on the Rausa language, has proven to be better if the text has been stemmed.
|