extending our sense of cyberspace language plurality: the value of the language observatory (LO) project
As the World Wide Web (WWW) grows exponentially, multilingual web pages are flooding the cyberspace at a tremendous rate. Most probably many of us would guess that the main medium of language on the Web is English. On the contrary, according to glreach.com [1], there are 801.4 million people online...
| Main Authors: | , |
|---|---|
| Format: | Conference or Workshop Item |
| Language: | English |
| Published: |
2005
|
| Subjects: | |
| Online Access: | http://eprints.utm.my/3403/ http://eprints.utm.my/3403/1/Extending_Our_Sense_of_Cyberspace_Language_Plurality.pdf |
| Summary: | As the World Wide Web (WWW) grows exponentially, multilingual web pages are flooding the cyberspace at a tremendous rate. Most probably many of us would guess that the main medium of language on the Web is English. On the contrary, according to glreach.com [1], there are 801.4 million people online with at least 510 million non-English and the remainder are English speakers. As a big step towards comprehending web page dimensions regarding languages in cyberspace, we have officially launched a project called “Language Observatory (LO)" in February 2004. We have made several experimental runs using Ubicrawler, some of which were dedicated to the 57 Organization of the Islamic Conference country code Top Level Domains (ccTLD). It is interesting to note that we covered at least 42 million web pages compared to almost 17 million indexed by two well known search engines and this covers nearly triple the amount containing multiple dimensions such as languages, script and character set encoding. Furthermore, data mining activities by LO yield significant findings that further provide a snapshot of cyberspace. This will offer contents that are often created in particular domains hence this provide practical information: language preferences and source documentations in cyberspace. The potency of LO in producing indispensable information must be taken into account because these are factors that should not be absent within the value chain of translation activities. |
|---|