CANELC: constructing an e-language corpus

This paper reports on the construction of CANELC: the Cambridge and Nottingham e-language Corpus.3 CANELC is a one million word corpus of digital communication in English, taken from online discussion boards, blogs, tweets, emails and SMS messages. The paper outlines the approaches used when plannin...

Full description

Bibliographic Details
Main Authors:	Knight, Dawn, Adolphs, Svenja, Carter, Ronald
Format:	Article
Published:	Edinburgh University Press 2014
Subjects:	Blogs Tweets SMS Discussion boards e-language Corpus linguistics
Online Access:	https://eprints.nottingham.ac.uk/35781/

_version_	1848795160245174272
author	Knight, Dawn Adolphs, Svenja Carter, Ronald
author_facet	Knight, Dawn Adolphs, Svenja Carter, Ronald
author_sort	Knight, Dawn
building	Nottingham Research Data Repository
collection	Online Access
description	This paper reports on the construction of CANELC: the Cambridge and Nottingham e-language Corpus.3 CANELC is a one million word corpus of digital communication in English, taken from online discussion boards, blogs, tweets, emails and SMS messages. The paper outlines the approaches used when planning the corpus: obtaining consent; collecting the data and compiling the corpus database. This is followed by a detailed analysis of some of the patterns of language used in the corpus. The analysis includes a discussion of the key words and phrases used as well as the common themes and semantic associations connected with the data. These discussions form the basis of an investigation of how e-language operates in both similar and different ways to spoken and written records of communication (as evidenced by the BNC - British National Corpus). 3 CANELC stands for Cambridge and Nottingham e-language Corpus. This corpus has been built as part of a collaborative project between The University of Nottingham and Cambridge University Press with whom sole copyright of the annotated corpus resides. CANELC comprises one-million words of digital English taken from SMS messages, blogs, tweets, discussion board content and private/business emails. Plans to extend the corpus are under discussion. The legal dimension to corpus ‘ownership’ of some forms of unannotated data is a complex one and is under constant review. At the present time the annotated corpus is only available to authors and researchers working for CUP and is not more generally available.
first_indexed	2025-11-14T19:27:40Z
format	Article
id	nottingham-35781
institution	University of Nottingham Malaysia Campus
institution_category	Local University
last_indexed	2025-11-14T19:27:40Z
publishDate	2014
publisher	Edinburgh University Press
recordtype	eprints
repository_type	Digital Repository
spelling	nottingham-357812020-05-04T20:14:35Z https://eprints.nottingham.ac.uk/35781/ CANELC: constructing an e-language corpus Knight, Dawn Adolphs, Svenja Carter, Ronald This paper reports on the construction of CANELC: the Cambridge and Nottingham e-language Corpus.3 CANELC is a one million word corpus of digital communication in English, taken from online discussion boards, blogs, tweets, emails and SMS messages. The paper outlines the approaches used when planning the corpus: obtaining consent; collecting the data and compiling the corpus database. This is followed by a detailed analysis of some of the patterns of language used in the corpus. The analysis includes a discussion of the key words and phrases used as well as the common themes and semantic associations connected with the data. These discussions form the basis of an investigation of how e-language operates in both similar and different ways to spoken and written records of communication (as evidenced by the BNC - British National Corpus). 3 CANELC stands for Cambridge and Nottingham e-language Corpus. This corpus has been built as part of a collaborative project between The University of Nottingham and Cambridge University Press with whom sole copyright of the annotated corpus resides. CANELC comprises one-million words of digital English taken from SMS messages, blogs, tweets, discussion board content and private/business emails. Plans to extend the corpus are under discussion. The legal dimension to corpus ‘ownership’ of some forms of unannotated data is a complex one and is under constant review. At the present time the annotated corpus is only available to authors and researchers working for CUP and is not more generally available. Edinburgh University Press 2014-05 Article PeerReviewed Knight, Dawn, Adolphs, Svenja and Carter, Ronald (2014) CANELC: constructing an e-language corpus. Corpora, 9 (1). pp. 29-56. ISSN 1755-1676 Blogs Tweets SMS Discussion boards e-language Corpus linguistics http://dx.doi.org/10.3366/cor.2014.0050 doi:10.3366/cor.2014.0050 doi:10.3366/cor.2014.0050
spellingShingle	Blogs Tweets SMS Discussion boards e-language Corpus linguistics Knight, Dawn Adolphs, Svenja Carter, Ronald CANELC: constructing an e-language corpus
title	CANELC: constructing an e-language corpus
title_full	CANELC: constructing an e-language corpus
title_fullStr	CANELC: constructing an e-language corpus
title_full_unstemmed	CANELC: constructing an e-language corpus
title_short	CANELC: constructing an e-language corpus
title_sort	canelc: constructing an e-language corpus
topic	Blogs Tweets SMS Discussion boards e-language Corpus linguistics
url	https://eprints.nottingham.ac.uk/35781/ https://eprints.nottingham.ac.uk/35781/ https://eprints.nottingham.ac.uk/35781/

CANELC: constructing an e-language corpus

Similar Items