Aggregator: a machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial

Objective It is important to identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence. Methods We created positive and negative training sets (comprised of pairs of articles reporting o...

Full description

Bibliographic Details
Main Authors: Shao, Weixiang, Adams, Clive E., Cohen, Aaron M., Davis, John M., McDonagh, Marian S., Thakurta, Sujata, Yu, Philip S., Smalheiser, Neil R.
Format: Article
Published: Elsevier 2015
Subjects:
Online Access:https://eprints.nottingham.ac.uk/46915/
_version_ 1848797426753732608
author Shao, Weixiang
Adams, Clive E.
Cohen, Aaron M.
Davis, John M.
McDonagh, Marian S.
Thakurta, Sujata
Yu, Philip S.
Smalheiser, Neil R.
author_facet Shao, Weixiang
Adams, Clive E.
Cohen, Aaron M.
Davis, John M.
McDonagh, Marian S.
Thakurta, Sujata
Yu, Philip S.
Smalheiser, Neil R.
author_sort Shao, Weixiang
building Nottingham Research Data Repository
collection Online Access
description Objective It is important to identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence. Methods We created positive and negative training sets (comprised of pairs of articles reporting on the same condition and intervention) that were, or were not, linked to the same clinicaltrials.gov trial registry number. Features were extracted from MEDLINE and PubMed metadata; pairwise similarity scores were modeled using logistic regression. Results Article pairs from the same trial were identified with high accuracy (F1 score = 0.843). We also created a clustering tool, Aggregator, that takes as input a PubMed user query for RCTs on a given topic, and returns article clusters predicted to arise from the same clinical trial. Discussion Although painstaking examination of full-text may be needed to be conclusive, metadata are surprisingly accurate in predicting when two articles derive from the same underlying clinical trial.
first_indexed 2025-11-14T20:03:42Z
format Article
id nottingham-46915
institution University of Nottingham Malaysia Campus
institution_category Local University
last_indexed 2025-11-14T20:03:42Z
publishDate 2015
publisher Elsevier
recordtype eprints
repository_type Digital Repository
spelling nottingham-469152020-05-04T17:01:44Z https://eprints.nottingham.ac.uk/46915/ Aggregator: a machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial Shao, Weixiang Adams, Clive E. Cohen, Aaron M. Davis, John M. McDonagh, Marian S. Thakurta, Sujata Yu, Philip S. Smalheiser, Neil R. Objective It is important to identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence. Methods We created positive and negative training sets (comprised of pairs of articles reporting on the same condition and intervention) that were, or were not, linked to the same clinicaltrials.gov trial registry number. Features were extracted from MEDLINE and PubMed metadata; pairwise similarity scores were modeled using logistic regression. Results Article pairs from the same trial were identified with high accuracy (F1 score = 0.843). We also created a clustering tool, Aggregator, that takes as input a PubMed user query for RCTs on a given topic, and returns article clusters predicted to arise from the same clinical trial. Discussion Although painstaking examination of full-text may be needed to be conclusive, metadata are surprisingly accurate in predicting when two articles derive from the same underlying clinical trial. Elsevier 2015-03-01 Article PeerReviewed Shao, Weixiang, Adams, Clive E., Cohen, Aaron M., Davis, John M., McDonagh, Marian S., Thakurta, Sujata, Yu, Philip S. and Smalheiser, Neil R. (2015) Aggregator: a machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial. Methods, 74 . pp. 65-70. ISSN 1095-9130 Evidence-based medicine; Clinical trials; Systematic reviews; Bias; Information retrieval; Informatics http://www.sciencedirect.com/science/article/pii/S1046202314003661 doi:10.1016/j.ymeth.2014.11.006 doi:10.1016/j.ymeth.2014.11.006
spellingShingle Evidence-based medicine; Clinical trials; Systematic reviews; Bias; Information retrieval; Informatics
Shao, Weixiang
Adams, Clive E.
Cohen, Aaron M.
Davis, John M.
McDonagh, Marian S.
Thakurta, Sujata
Yu, Philip S.
Smalheiser, Neil R.
Aggregator: a machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial
title Aggregator: a machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial
title_full Aggregator: a machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial
title_fullStr Aggregator: a machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial
title_full_unstemmed Aggregator: a machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial
title_short Aggregator: a machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial
title_sort aggregator: a machine learning approach to identifying medline articles that derive from the same underlying clinical trial
topic Evidence-based medicine; Clinical trials; Systematic reviews; Bias; Information retrieval; Informatics
url https://eprints.nottingham.ac.uk/46915/
https://eprints.nottingham.ac.uk/46915/
https://eprints.nottingham.ac.uk/46915/