Corpus-based analysis on cross-domain experiments in classification-and-ranking generation

Problem statement: Overgeneration-and-ranking architecture works well in written language where sentence is the basic unit. However, in spoken language where utterance is the basic unit, the disadvantage becomes critical as spoken language also render intentions, hence short strings may be of equiva...

Full description

Bibliographic Details
Main Authors:	Aida, Mustapha, Sulaiman, Md. Nasir, Mahmod, Ramlan, Selamat, Mohd. Hasan
Format:	Article
Language:	English English
Published:	Science Publications 2010
Subjects:	Computational linguistics. Natural language processing (Computer science).
Online Access:	http://psasir.upm.edu.my/id/eprint/13804/ http://psasir.upm.edu.my/id/eprint/13804/1/Corpus.pdf

_version_	1848842214997753856
author	Aida, Mustapha Sulaiman, Md. Nasir Mahmod, Ramlan Selamat, Mohd. Hasan
author_facet	Aida, Mustapha Sulaiman, Md. Nasir Mahmod, Ramlan Selamat, Mohd. Hasan
author_sort	Aida, Mustapha
building	UPM Institutional Repository
collection	Online Access
description	Problem statement: Overgeneration-and-ranking architecture works well in written language where sentence is the basic unit. However, in spoken language where utterance is the basic unit, the disadvantage becomes critical as spoken language also render intentions, hence short strings may be of equivalent impact. Approach: In classification-and-ranking, response was deliberately chosen from dialogue corpus rather than wholly generated, such that it allows short ungrammatical utterances as long as they satisfy the intended meaning of input utterance. Because the architecture is intention-based, it adopted an open-domain knowledge representation, whereby response utterances were semantically represented using some ontology general enough for future reuse in another domain. Results: This study presented corpus-based analysis on cross-domain experimentation using different type of corpus to validate the consistency of the response classifier that delimits the searching space for ranking. The open-domain quality for classification-an-ranking architecture was tested on two mixed-initiative, transaction dialogue corpus in theater reservation and emergency planning. Results showed consistent distribution accuracies in both classification and ranking experiment, indicating that the approach is viable for cross-domain implementations. Conclusion: The ability of a response generation system to directly learn response utterances from the domain corpus suggested the possibility to build a dialogue system by feeding the learning module with a target corpus and the system learned the response behavior directly from the training corpus.
first_indexed	2025-11-15T07:55:35Z
format	Article
id	upm-13804
institution	Universiti Putra Malaysia
institution_category	Local University
language	English English
last_indexed	2025-11-15T07:55:35Z
publishDate	2010
publisher	Science Publications
recordtype	eprints
repository_type	Digital Repository
spelling	upm-138042015-10-28T04:36:39Z http://psasir.upm.edu.my/id/eprint/13804/ Corpus-based analysis on cross-domain experiments in classification-and-ranking generation Aida, Mustapha Sulaiman, Md. Nasir Mahmod, Ramlan Selamat, Mohd. Hasan Problem statement: Overgeneration-and-ranking architecture works well in written language where sentence is the basic unit. However, in spoken language where utterance is the basic unit, the disadvantage becomes critical as spoken language also render intentions, hence short strings may be of equivalent impact. Approach: In classification-and-ranking, response was deliberately chosen from dialogue corpus rather than wholly generated, such that it allows short ungrammatical utterances as long as they satisfy the intended meaning of input utterance. Because the architecture is intention-based, it adopted an open-domain knowledge representation, whereby response utterances were semantically represented using some ontology general enough for future reuse in another domain. Results: This study presented corpus-based analysis on cross-domain experimentation using different type of corpus to validate the consistency of the response classifier that delimits the searching space for ranking. The open-domain quality for classification-an-ranking architecture was tested on two mixed-initiative, transaction dialogue corpus in theater reservation and emergency planning. Results showed consistent distribution accuracies in both classification and ranking experiment, indicating that the approach is viable for cross-domain implementations. Conclusion: The ability of a response generation system to directly learn response utterances from the domain corpus suggested the possibility to build a dialogue system by feeding the learning module with a target corpus and the system learned the response behavior directly from the training corpus. Science Publications 2010 Article PeerReviewed application/pdf en http://psasir.upm.edu.my/id/eprint/13804/1/Corpus.pdf Aida, Mustapha and Sulaiman, Md. Nasir and Mahmod, Ramlan and Selamat, Mohd. Hasan (2010) Corpus-based analysis on cross-domain experiments in classification-and-ranking generation. Journal of Computer Science, 6 (11). pp. 1305-1312. ISSN 1549-3636 Computational linguistics. Natural language processing (Computer science). 10.3844/jcssp.2011.59.64 English
spellingShingle	Computational linguistics. Natural language processing (Computer science). Aida, Mustapha Sulaiman, Md. Nasir Mahmod, Ramlan Selamat, Mohd. Hasan Corpus-based analysis on cross-domain experiments in classification-and-ranking generation
title	Corpus-based analysis on cross-domain experiments in classification-and-ranking generation
title_full	Corpus-based analysis on cross-domain experiments in classification-and-ranking generation
title_fullStr	Corpus-based analysis on cross-domain experiments in classification-and-ranking generation
title_full_unstemmed	Corpus-based analysis on cross-domain experiments in classification-and-ranking generation
title_short	Corpus-based analysis on cross-domain experiments in classification-and-ranking generation
title_sort	corpus-based analysis on cross-domain experiments in classification-and-ranking generation
topic	Computational linguistics. Natural language processing (Computer science).
url	http://psasir.upm.edu.my/id/eprint/13804/ http://psasir.upm.edu.my/id/eprint/13804/ http://psasir.upm.edu.my/id/eprint/13804/1/Corpus.pdf

Corpus-based analysis on cross-domain experiments in classification-and-ranking generation

Similar Items