Automatic idea generation and analysis using NLP and ML techniques

Ideas are the fundamental way in which information is conveyed in written text. This research investigates the discovery and extraction of ideas from corpuses of scientific literature. There are several elements to this work: (1) the functional definition of ideas; (2) the computation of novel ideas...

Full description

Bibliographic Details
Main Author: Liu, Haixia
Format: Thesis (University of Nottingham only)
Language:English
Published: 2019
Subjects:
Online Access:https://eprints.nottingham.ac.uk/56448/
_version_ 1848799331563339776
author Liu, Haixia
author_facet Liu, Haixia
author_sort Liu, Haixia
building Nottingham Research Data Repository
collection Online Access
description Ideas are the fundamental way in which information is conveyed in written text. This research investigates the discovery and extraction of ideas from corpuses of scientific literature. There are several elements to this work: (1) the functional definition of ideas; (2) the computation of novel ideas; (3) the representation of ideas; (4) the construction of a ground truth dataset; and (5) the use of citations as an idea container. Ideas are defined as a <problem, solution> pair, where the problem and solution are represented by noun phrases, or a sequence of words. As a result of this, the task of idea detection is broken down to problem and solution extraction. The task of idea extraction is similar to Named Entity Recognition (NER), where the problems and solutions may be seen as special entities. These techniques worked well although the results contained a lot of noise that need to be removed. Automatic idea generation was conducted using a dataset from the Journal of Science. Old ideas were defined as the existing <problem, solution> pairs in the same abstract and new ideas were generated by predicting new links between problems and solutions that do not occur together in one abstract. Evaluation was performed using metrics that are widely used in information retrieval. The F1 scores (higher than 0.90) provides good evidence that the proposed method is capable of generating useful ideas. A ground truth data set that contained <problem, solution> pairs was constructed from the publications of the International Conference on Neural Information Pro-cessing Systems and the Journal of Machine Learning Research. This data was annotated by human volunteers, and it was used for training idea detection models using Conditional Random Field (CRF) and Long-short Term Memory (LSTM). To evaluate the performance of the models, the precision and recall were computed. Idea analysis was studied by analyzing citations, which are considered to be containers for ideas. Word vectors were used to represent the citations for the purpose of classifying citation sentiment, and a method was developed to measure the sequence of citation sentiment. This method for analysing internal citation sentiment sequence worked well (with F1 measure 0.86).
first_indexed 2025-11-14T20:33:58Z
format Thesis (University of Nottingham only)
id nottingham-56448
institution University of Nottingham Malaysia Campus
institution_category Local University
language English
last_indexed 2025-11-14T20:33:58Z
publishDate 2019
recordtype eprints
repository_type Digital Repository
spelling nottingham-564482025-02-28T14:28:21Z https://eprints.nottingham.ac.uk/56448/ Automatic idea generation and analysis using NLP and ML techniques Liu, Haixia Ideas are the fundamental way in which information is conveyed in written text. This research investigates the discovery and extraction of ideas from corpuses of scientific literature. There are several elements to this work: (1) the functional definition of ideas; (2) the computation of novel ideas; (3) the representation of ideas; (4) the construction of a ground truth dataset; and (5) the use of citations as an idea container. Ideas are defined as a <problem, solution> pair, where the problem and solution are represented by noun phrases, or a sequence of words. As a result of this, the task of idea detection is broken down to problem and solution extraction. The task of idea extraction is similar to Named Entity Recognition (NER), where the problems and solutions may be seen as special entities. These techniques worked well although the results contained a lot of noise that need to be removed. Automatic idea generation was conducted using a dataset from the Journal of Science. Old ideas were defined as the existing <problem, solution> pairs in the same abstract and new ideas were generated by predicting new links between problems and solutions that do not occur together in one abstract. Evaluation was performed using metrics that are widely used in information retrieval. The F1 scores (higher than 0.90) provides good evidence that the proposed method is capable of generating useful ideas. A ground truth data set that contained <problem, solution> pairs was constructed from the publications of the International Conference on Neural Information Pro-cessing Systems and the Journal of Machine Learning Research. This data was annotated by human volunteers, and it was used for training idea detection models using Conditional Random Field (CRF) and Long-short Term Memory (LSTM). To evaluate the performance of the models, the precision and recall were computed. Idea analysis was studied by analyzing citations, which are considered to be containers for ideas. Word vectors were used to represent the citations for the purpose of classifying citation sentiment, and a method was developed to measure the sequence of citation sentiment. This method for analysing internal citation sentiment sequence worked well (with F1 measure 0.86). 2019-02-23 Thesis (University of Nottingham only) NonPeerReviewed application/pdf en arr https://eprints.nottingham.ac.uk/56448/1/2019-01-10-HaixiaPhDThesis.pdf Liu, Haixia (2019) Automatic idea generation and analysis using NLP and ML techniques. PhD thesis, University of Nottingham. ideas generation natural language processing computational linguistics
spellingShingle ideas generation
natural language processing
computational linguistics
Liu, Haixia
Automatic idea generation and analysis using NLP and ML techniques
title Automatic idea generation and analysis using NLP and ML techniques
title_full Automatic idea generation and analysis using NLP and ML techniques
title_fullStr Automatic idea generation and analysis using NLP and ML techniques
title_full_unstemmed Automatic idea generation and analysis using NLP and ML techniques
title_short Automatic idea generation and analysis using NLP and ML techniques
title_sort automatic idea generation and analysis using nlp and ml techniques
topic ideas generation
natural language processing
computational linguistics
url https://eprints.nottingham.ac.uk/56448/