Comparing two corpus-based methods for extracting paraphrases to dictionary-based method
Paraphrase extraction plays an increasingly important role in language-related research and applications in areas such as information retrieval, question answering and automatic machine evaluation. Most of the existing methods extract paraphrases from different types of corpora by using syntactic-ba...
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
World Scientific Publishing
2011
|
| Online Access: | http://psasir.upm.edu.my/id/eprint/22466/ http://psasir.upm.edu.my/id/eprint/22466/1/Comparing%20two%20corpus-based%20methods%20for%20extracting%20paraphrases%20to%20dictionary-based%20method.pdf |
| _version_ | 1848844492306644992 |
|---|---|
| author | Ho, Chuk Fong Azmi Murad, Masrah Azrifah Abdul Kadir, Rabiah C. Doraisamy, Shyamala |
| author_facet | Ho, Chuk Fong Azmi Murad, Masrah Azrifah Abdul Kadir, Rabiah C. Doraisamy, Shyamala |
| author_sort | Ho, Chuk Fong |
| building | UPM Institutional Repository |
| collection | Online Access |
| description | Paraphrase extraction plays an increasingly important role in language-related research and applications in areas such as information retrieval, question answering and automatic machine evaluation. Most of the existing methods extract paraphrases from different types of corpora by using syntactic-based approaches. Since a syntactic-based approach relies on the similarity of context to identify and capture paraphrases, other than paraphrases, other terms which tend to appear in a similar context such as loosely related terms and functionally similar yet unrelated terms tend to be extracted. Besides, different types of corpora suffer from different kinds of problems such as limited availability and domain biased. This paper presents a solely semantic-based paraphrase extraction model. This model collects paraphrases from multiple lexical resources and validates those paraphrases semantically in three ways; by computing domain similarity, definition similarity and word similarity. This model is benchmarked with two outstanding syntactic-based approaches. The experimental results from a manual evaluation show that the proposed model outperforms the benchmarks. The results indicate that a semantic-based approach should be applied in paraphrase extraction instead of a syntactic-based approach. The results further suggest that a hybrid of these two approaches should be applied if one targets strictly precise paraphrases. |
| first_indexed | 2025-11-15T08:31:47Z |
| format | Article |
| id | upm-22466 |
| institution | Universiti Putra Malaysia |
| institution_category | Local University |
| language | English |
| last_indexed | 2025-11-15T08:31:47Z |
| publishDate | 2011 |
| publisher | World Scientific Publishing |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | upm-224662016-06-08T09:00:40Z http://psasir.upm.edu.my/id/eprint/22466/ Comparing two corpus-based methods for extracting paraphrases to dictionary-based method Ho, Chuk Fong Azmi Murad, Masrah Azrifah Abdul Kadir, Rabiah C. Doraisamy, Shyamala Paraphrase extraction plays an increasingly important role in language-related research and applications in areas such as information retrieval, question answering and automatic machine evaluation. Most of the existing methods extract paraphrases from different types of corpora by using syntactic-based approaches. Since a syntactic-based approach relies on the similarity of context to identify and capture paraphrases, other than paraphrases, other terms which tend to appear in a similar context such as loosely related terms and functionally similar yet unrelated terms tend to be extracted. Besides, different types of corpora suffer from different kinds of problems such as limited availability and domain biased. This paper presents a solely semantic-based paraphrase extraction model. This model collects paraphrases from multiple lexical resources and validates those paraphrases semantically in three ways; by computing domain similarity, definition similarity and word similarity. This model is benchmarked with two outstanding syntactic-based approaches. The experimental results from a manual evaluation show that the proposed model outperforms the benchmarks. The results indicate that a semantic-based approach should be applied in paraphrase extraction instead of a syntactic-based approach. The results further suggest that a hybrid of these two approaches should be applied if one targets strictly precise paraphrases. World Scientific Publishing 2011 Article PeerReviewed application/pdf en http://psasir.upm.edu.my/id/eprint/22466/1/Comparing%20two%20corpus-based%20methods%20for%20extracting%20paraphrases%20to%20dictionary-based%20method.pdf Ho, Chuk Fong and Azmi Murad, Masrah Azrifah and Abdul Kadir, Rabiah and C. Doraisamy, Shyamala (2011) Comparing two corpus-based methods for extracting paraphrases to dictionary-based method. International Journal of Semantic Computing, 5 (2). pp. 133-178. ISSN 1793-351X; ESSN: 1793-7108 http://www.worldscientific.com/doi/abs/10.1142/S1793351X11001225 10.1142/S1793351X11001225 |
| spellingShingle | Ho, Chuk Fong Azmi Murad, Masrah Azrifah Abdul Kadir, Rabiah C. Doraisamy, Shyamala Comparing two corpus-based methods for extracting paraphrases to dictionary-based method |
| title | Comparing two corpus-based methods for extracting paraphrases to dictionary-based method |
| title_full | Comparing two corpus-based methods for extracting paraphrases to dictionary-based method |
| title_fullStr | Comparing two corpus-based methods for extracting paraphrases to dictionary-based method |
| title_full_unstemmed | Comparing two corpus-based methods for extracting paraphrases to dictionary-based method |
| title_short | Comparing two corpus-based methods for extracting paraphrases to dictionary-based method |
| title_sort | comparing two corpus-based methods for extracting paraphrases to dictionary-based method |
| url | http://psasir.upm.edu.my/id/eprint/22466/ http://psasir.upm.edu.my/id/eprint/22466/ http://psasir.upm.edu.my/id/eprint/22466/ http://psasir.upm.edu.my/id/eprint/22466/1/Comparing%20two%20corpus-based%20methods%20for%20extracting%20paraphrases%20to%20dictionary-based%20method.pdf |