Machine learning to predict the clinical utility of biomedical research

A machine learning model to predict which scientific advances are likely to eventually translate to the clinic has been developed by Ian Hutchins and colleagues in the Office of Portfolio Analysis (OPA), a team led by George Santangelo at the National Institutes of Health (NIH).

This work published in the journal PLOS Biology aims to decrease the interval between scientific discovery and clinical application. The model determines the likelihood that a research article will be cited by a future clinical trial or guideline, an early indicator of translational progress.

Researchers have quantified these predictions as a novel metric called “Approximate Potential to Translate” (APT). Approximate Potential to Translate values can be used by researchers and decision-makers to focus attention on areas of science that have strong signatures of translational potential. Although numbers alone should never be a substitute for evaluation by human experts, the Approximate Potential to Translate metric has the potential to accelerate biomedical progress as one component of data-driven decision-making.

The model that computes Approximate Potential to Translate values makes predictions based upon the content of research articles and citations. A long-standing barrier to research and development of metrics like Approximate Potential to Translate is that such citation data has remained hidden behind proprietary, restrictive, and often costly licensing agreements. To disrupt this impediment to the scientific community, to increase transparency, and to facilitate reproducibility, OPA has aggregated citation data from publicly available resources to create an open citation collection (NIH-OCC).

The open citation collection comprises over 420 million citation links at present and will be updated monthly. For publications since 2010, the open citation collection is already more comprehensive than leading proprietary sources of citation data. Citation data from the open citation collection are used to calculate both Approximate Potential to Translate values and Relative Citation Ratios (RCRs). The latter, a measure of scientific influence at the article level, normalized for the field of study and time since publication.

Approximate Potential to Translate values and the open citation collection are publicly available as components of the iCite webtool. This tool will continue as the primary source of Relative Citation Ratios data.