Towards computation of novel ideas from corpora of scientific text

In this work we present a method for the computation of novel 'ideas' from corpora of scientific text. The system functions by first detecting concept noun-phrases within the titles and abstracts of publications using Part-Of-Speech tagging, before classifying these into sets of problem an...

Full description

Bibliographic Details
Main Authors: Liu, Haixia, Goulding, James, Brailsford, Tim
Format: Book Section
Language:English
Published: Springer Verlag 2015
Subjects:
Online Access:https://eprints.nottingham.ac.uk/55719/
Description
Summary:In this work we present a method for the computation of novel 'ideas' from corpora of scientific text. The system functions by first detecting concept noun-phrases within the titles and abstracts of publications using Part-Of-Speech tagging, before classifying these into sets of problem and solution phrases via a target-word matching approach. By defining an idea as a co-occurring <problem,solution> pair, known-idea triples can be constructed through the additional assignment of a relevance value (computed via either phrase co-occurrence or an `idea frequency-inverse document frequency' score). The resulting triples are then fed into a collaborative filtering algorithm, where problem-phrases are considered as users and solution-phrases as the items to be recommended. The final output is a ranked list of novel idea candidates, which hold potential for researchers to integrate into their hypothesis generation processes. This approach is evaluated using a subset of publications from the journal Science, with precision, recall and F-Measure results for a variety of model parametrizations indicating that the system is capable of generating useful novel ideas in an automated fashion.