Maintaining regularity and generalizationin data using the minimum description length principle and genetic algorithm: case of grammatical inference

In this paper, a genetic algorithm with minimum description length (GAWMDL) is proposed for grammatical inference. The primary challenge of identifying a language of infinite cardinality from a finite set of examples should know when to generalize and specialize the training data. The minimum descri...

Full description

Bibliographic Details
Main Authors: Pandey, Hari Mohan, Chaudhary, Ankit, Mehrotra, Deepti, Kendall, Graham
Format: Article
Published: Elsevier 2016
Subjects:
Online Access:https://eprints.nottingham.ac.uk/49535/
_version_ 1848798018575269888
author Pandey, Hari Mohan
Chaudhary, Ankit
Mehrotra, Deepti
Kendall, Graham
author_facet Pandey, Hari Mohan
Chaudhary, Ankit
Mehrotra, Deepti
Kendall, Graham
author_sort Pandey, Hari Mohan
building Nottingham Research Data Repository
collection Online Access
description In this paper, a genetic algorithm with minimum description length (GAWMDL) is proposed for grammatical inference. The primary challenge of identifying a language of infinite cardinality from a finite set of examples should know when to generalize and specialize the training data. The minimum description length principle that has been incorporated addresses this issue is discussed in this paper. Previously, the e-GRIDS learning model was proposed, which enjoyed the merits of the minimum description length principle, but it is limited to positive examples only. The proposed GAWMDL, which incorporates a traditional genetic algorithm and has a powerful global exploration capability that can exploit an optimum offspring. This is an effective approach to handle a problem which has a large search space such the grammatical inference problem. The computational capability, the genetic algorithm poses is not questionable, but it still suffers from premature convergence mainly arising due to lack of population diversity. The proposed GAWMDL incorporates a bit mask oriented data structure that performs the reproduction operations, creating the mask, then Boolean based procedure is applied to create an offspring in a generative manner. The Boolean based procedure is capable of introducing diversity into the population, hence alleviating premature convergence. The proposed GAWMDL is applied in the context free as well as regular languages of varying complexities. The computational experiments show that the GAWMDL finds an optimal or close-to-optimal grammar. Two fold performance analysis have been performed. First, the GAWMDL has been evaluated against the elite mating pool genetic algorithm which was proposed to introduce diversity and to address premature convergence. GAWMDL is also tested against the improved tabular representation algorithm. In addition, the authors evaluate the performance of the GAWMDL against a genetic algorithm not using the minimum description length principle. Statistical tests demonstrate the superiority of the proposed algorithm. Overall, the proposed GAWMDL algorithm greatly improves the performance in three main aspects: maintains regularity of the data, alleviates premature convergence and is capable in grammatical inference from both positive and negative corpora.
first_indexed 2025-11-14T20:13:06Z
format Article
id nottingham-49535
institution University of Nottingham Malaysia Campus
institution_category Local University
last_indexed 2025-11-14T20:13:06Z
publishDate 2016
publisher Elsevier
recordtype eprints
repository_type Digital Repository
spelling nottingham-495352020-05-04T17:51:38Z https://eprints.nottingham.ac.uk/49535/ Maintaining regularity and generalizationin data using the minimum description length principle and genetic algorithm: case of grammatical inference Pandey, Hari Mohan Chaudhary, Ankit Mehrotra, Deepti Kendall, Graham In this paper, a genetic algorithm with minimum description length (GAWMDL) is proposed for grammatical inference. The primary challenge of identifying a language of infinite cardinality from a finite set of examples should know when to generalize and specialize the training data. The minimum description length principle that has been incorporated addresses this issue is discussed in this paper. Previously, the e-GRIDS learning model was proposed, which enjoyed the merits of the minimum description length principle, but it is limited to positive examples only. The proposed GAWMDL, which incorporates a traditional genetic algorithm and has a powerful global exploration capability that can exploit an optimum offspring. This is an effective approach to handle a problem which has a large search space such the grammatical inference problem. The computational capability, the genetic algorithm poses is not questionable, but it still suffers from premature convergence mainly arising due to lack of population diversity. The proposed GAWMDL incorporates a bit mask oriented data structure that performs the reproduction operations, creating the mask, then Boolean based procedure is applied to create an offspring in a generative manner. The Boolean based procedure is capable of introducing diversity into the population, hence alleviating premature convergence. The proposed GAWMDL is applied in the context free as well as regular languages of varying complexities. The computational experiments show that the GAWMDL finds an optimal or close-to-optimal grammar. Two fold performance analysis have been performed. First, the GAWMDL has been evaluated against the elite mating pool genetic algorithm which was proposed to introduce diversity and to address premature convergence. GAWMDL is also tested against the improved tabular representation algorithm. In addition, the authors evaluate the performance of the GAWMDL against a genetic algorithm not using the minimum description length principle. Statistical tests demonstrate the superiority of the proposed algorithm. Overall, the proposed GAWMDL algorithm greatly improves the performance in three main aspects: maintains regularity of the data, alleviates premature convergence and is capable in grammatical inference from both positive and negative corpora. Elsevier 2016-05-17 Article PeerReviewed Pandey, Hari Mohan, Chaudhary, Ankit, Mehrotra, Deepti and Kendall, Graham (2016) Maintaining regularity and generalizationin data using the minimum description length principle and genetic algorithm: case of grammatical inference. Swarm and Evolutionary Computation, 31 . pp. 11-23. ISSN 0305-215X Bit-masking oriented data structure Context free grammar Genetic Algorithm Grammar induction Learning algorithm Minimum description length principle https://www.sciencedirect.com/science/article/pii/S2210650216300244?via%3Dihub doi:10.1016/j.swevo.2016.05.002 doi:10.1016/j.swevo.2016.05.002
spellingShingle Bit-masking oriented data structure
Context free grammar
Genetic Algorithm
Grammar induction
Learning algorithm
Minimum description length principle
Pandey, Hari Mohan
Chaudhary, Ankit
Mehrotra, Deepti
Kendall, Graham
Maintaining regularity and generalizationin data using the minimum description length principle and genetic algorithm: case of grammatical inference
title Maintaining regularity and generalizationin data using the minimum description length principle and genetic algorithm: case of grammatical inference
title_full Maintaining regularity and generalizationin data using the minimum description length principle and genetic algorithm: case of grammatical inference
title_fullStr Maintaining regularity and generalizationin data using the minimum description length principle and genetic algorithm: case of grammatical inference
title_full_unstemmed Maintaining regularity and generalizationin data using the minimum description length principle and genetic algorithm: case of grammatical inference
title_short Maintaining regularity and generalizationin data using the minimum description length principle and genetic algorithm: case of grammatical inference
title_sort maintaining regularity and generalizationin data using the minimum description length principle and genetic algorithm: case of grammatical inference
topic Bit-masking oriented data structure
Context free grammar
Genetic Algorithm
Grammar induction
Learning algorithm
Minimum description length principle
url https://eprints.nottingham.ac.uk/49535/
https://eprints.nottingham.ac.uk/49535/
https://eprints.nottingham.ac.uk/49535/