Web page classification method using neural networks
Automatic categorization is the only viable method to deal with the scaling problem of the World Wide Web (WWW). In this paper, we propose a news web page classification method (WPCM). The WPCM uses a neural network with inputs obtained by both the principal components and class profile-based featur...
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Institute of Electrical Engineers of Japan
2003
|
| Subjects: | |
| Online Access: | http://eprints.utm.my/3093/ http://eprints.utm.my/3093/1/ieej-2.pdf |
| _version_ | 1848890504113029120 |
|---|---|
| author | Selamat, Ali Omatu, Sigeru Yanagimoto, Hidekazu Fujinaka, Toru Yoshioka, Michifumi |
| author_facet | Selamat, Ali Omatu, Sigeru Yanagimoto, Hidekazu Fujinaka, Toru Yoshioka, Michifumi |
| author_sort | Selamat, Ali |
| building | UTeM Institutional Repository |
| collection | Online Access |
| description | Automatic categorization is the only viable method to deal with the scaling problem of the World Wide Web (WWW). In this paper, we propose a news web page classification method (WPCM). The WPCM uses a neural network with inputs obtained by both the principal components and class profile-based features (CPBF). Each news web page is represented by the term-weighting scheme. As the number of unique words in the collection set is big, the principal component analysis (PCA) has been used to select the most relevant features for the classification. Then the final output of the PCA is combined with the feature vectors from the class-profile which contains the most regular words in each class before feeding them to the neural networks. We have manually selected the most regular words that exist in each class and weighted them using an entropy weighting scheme. The fixed number of regular words from each class will be used as a feature vectors together with the reduced principal components from the PCA. These feature vectors are then used as the input to the neural networks for classification. The experimental evaluation demonstrates that the WPCM method provides acceptable classification accuracy with the sports news datasets. |
| first_indexed | 2025-11-15T20:43:07Z |
| format | Article |
| id | utm-3093 |
| institution | Universiti Teknologi Malaysia |
| institution_category | Local University |
| language | English |
| last_indexed | 2025-11-15T20:43:07Z |
| publishDate | 2003 |
| publisher | Institute of Electrical Engineers of Japan |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | utm-30932010-06-01T03:07:12Z http://eprints.utm.my/3093/ Web page classification method using neural networks Selamat, Ali Omatu, Sigeru Yanagimoto, Hidekazu Fujinaka, Toru Yoshioka, Michifumi ZA4050 Electronic information resources Automatic categorization is the only viable method to deal with the scaling problem of the World Wide Web (WWW). In this paper, we propose a news web page classification method (WPCM). The WPCM uses a neural network with inputs obtained by both the principal components and class profile-based features (CPBF). Each news web page is represented by the term-weighting scheme. As the number of unique words in the collection set is big, the principal component analysis (PCA) has been used to select the most relevant features for the classification. Then the final output of the PCA is combined with the feature vectors from the class-profile which contains the most regular words in each class before feeding them to the neural networks. We have manually selected the most regular words that exist in each class and weighted them using an entropy weighting scheme. The fixed number of regular words from each class will be used as a feature vectors together with the reduced principal components from the PCA. These feature vectors are then used as the input to the neural networks for classification. The experimental evaluation demonstrates that the WPCM method provides acceptable classification accuracy with the sports news datasets. Institute of Electrical Engineers of Japan 2003 Article PeerReviewed application/pdf en http://eprints.utm.my/3093/1/ieej-2.pdf Selamat, Ali and Omatu, Sigeru and Yanagimoto, Hidekazu and Fujinaka, Toru and Yoshioka, Michifumi (2003) Web page classification method using neural networks. IEEJ Transactions on Electronics, Information and Systems, 123 (5). pp. 1020-1026. http://www2.iee.or.jp/~english/publish/cont2003.html |
| spellingShingle | ZA4050 Electronic information resources Selamat, Ali Omatu, Sigeru Yanagimoto, Hidekazu Fujinaka, Toru Yoshioka, Michifumi Web page classification method using neural networks |
| title | Web page classification method using neural networks |
| title_full | Web page classification method using neural networks |
| title_fullStr | Web page classification method using neural networks |
| title_full_unstemmed | Web page classification method using neural networks |
| title_short | Web page classification method using neural networks |
| title_sort | web page classification method using neural networks |
| topic | ZA4050 Electronic information resources |
| url | http://eprints.utm.my/3093/ http://eprints.utm.my/3093/ http://eprints.utm.my/3093/1/ieej-2.pdf |