Computational model to track flu using Twitter data

An international team led by Alessandro Vespignani from Northeastern University has developed a computational model to predict the spread of the flu in real time. This unique model uses posts on Twitter in combination with key parameters of each season’s epidemic, including the incubation period of the disease, the immunization rate, how many people an individual with the virus can infect, and the viral strains present. When tested against official influenza surveillance systems, the model has been shown to accurately(70 to 90 percent) forecast the disease’s evolution up to six weeks in advance.

The paper on the novel model received a coveted Best Paper Honorable Mention award at the 2017 International World Wide Web Conference last month following its presentation.

While the paper reports results using Twitter data, the researchers note that the model can work with data from many other digital sources, too, as well as online surveys of individuals such as influenzanet, which is very popular in Europe.

“Our model is a work in progress,” emphasizes Vespignani. “We plan to add new parameters, for example, school and workplace structure.

Adapted from press release from the Northeastern University.

Collaboration between UCSF, Intel to develop deep learning analytics for healthcare

UC San Francisco’s Center for Digital Health Innovation (CDHI) today announced a collaboration with Intel Corporation to deploy and validate a deep learning analytics platform designed to improve care by helping clinicians make better treatment decisions, predict patient outcomes, and respond more nimbly in acute situations.

The collaboration brings together Intel’s leading-edge computer science and deep learning capabilities with UCSF’s clinical and research expertise to create a scalable, high-performance computational environment to support enhanced frontline clinical decision making for a wide variety of patient care scenarios. Until now, progress toward this goal has been difficult because complex, diverse datasets are managed in multiple, incompatible systems. This next-generation platform will allow UCSF to efficiently manage the huge volume and variety of data collected for clinical care as well as newer “big data” from genomic sequencing, monitors, sensors, and wearables. These data will be integrated into a highly scalable “information commons” that will enable advanced analytics with machine learning and deep learning algorithms. The end result will be algorithms that can rapidly support data-driven clinical decision-making.

“While artificial intelligence and machine learning have been integrated into our everyday lives, our ability to use them in healthcare is a relatively new phenomenon,” said Michael Blum, MD, associate vice chancellor for informatics, director of CDHI and professor of medicine at UCSF. “Now that we have ‘digitized’ healthcare, we can begin utilizing the same technologies that have made the driverless car and virtual assistants possible and bring them to bear on vexing healthcare challenges such as predicting health risks, preventing hospital readmissions, analyzing complex medical images and more. Deep learning environments are capable of rapidly analyzing and predicting patient trajectories utilizing vast amounts of multi-dimensional data. By integrating deep learning capabilities into the care delivered to critically injured patients, providers will have access to real-time decision support that will enable timely decision making in an environment where seconds are the difference between life and death. We expect these technologies, combined with the clinical and scientific knowledge of UCSF, to be made accessible through the cloud to drive the transformation of health and healthcare.”

UCSF and Intel will work together to deploy the high-performance computing environment on industry-standard Intel® Xeon® processor-based platforms that will support the data management and algorithm development lifecycle, including data curation and annotation, algorithm training, and testing against labeled datasets with particular pre-specified outcomes. The collaboration will also allow UCSF and Intel to better understand how deep learning analytics and machine-driven workflows can be employed to optimize the clinical environment and patient outcomes. This work will inform Intel’s development and testing of new platform architectures for the healthcare industry.
“This collaboration between Intel and UCSF will accelerate the development of deep learning algorithms that have great potential to benefit patients,” said Kay Eron, general manager of health and life sciences in Intel’s Data Center Group. “Combining the medical science and computer science expertise across our organizations will enable us to more effectively tackle barriers in directing the latest technologies toward critical needs in healthcare.”

The platform will enable UCSF’s deep learning use cases to run in a distributed fashion on a central processing unit (CPU)-based cluster. The platform will be able to handle large data sets and scale easily for future use case requirements, including supporting larger convolutional neural network models, artificial networks patterned after living organisms, and very large multidimensional datasets. In the future, Intel expects to incorporate the deep learning analytics platform with other Intel analytics frameworks, healthcare data sources, and application program interfaces (APIs) – code that allows different programs to communicate – to create increasingly sophisticated use case algorithms that will continue to raise the bar in health and healthcare.

Adapted from press release by UCSF.

Researchers use multi-task deep neural networks to automatically extract data from cancer pathology reports

Despite steady progress in detection and treatment in recent decades, cancer remains the second leading cause of death in the United States, cutting short the lives of approximately 500,000 people each year. To better understand and combat this disease, medical researchers rely on cancer registry programs–a national network of organizations that systematically collect demographic and clinical information related to the diagnosis, treatment, and history of cancer incidence in the United States. The surveillance effort, coordinated by the National Cancer Institute (NCI) and the Centers for Disease Control and Prevention, enables researchers and clinicians to monitor cancer cases at the national, state, and local levels.

Much of this data is drawn from electronic, text-based clinical reports that must be manually curated–a time-intensive process–before it can be used in research.

A representation of a deep learning neural network designed to
intelligently extract text-based information from cancer
 pathology reports. Credit: Oak Ridge National Laboratory

Since 2014 Tourassi has led a team focused on creating software that can quickly identify valuable information in cancer reports, an ability that would not only save time and worker hours but also potentially reveal overlooked avenues in cancer research. After experimenting with conventional natural-language-processing software, the team’s most recent progress has emerged via deep learning, a machine-learning technique that employs algorithms, big data, and the computing power of GPUs to emulate human learning and intelligence.

Using the Titan supercomputer at the Oak Ridge Leadership Computing Facility, a DOE Office of Science User Facility located at ORNL, Tourassi’s team applied deep learning to extract useful information from cancer pathology reports, a foundational element of cancer surveillance. Working with modest datasets, the team obtained preliminary findings that demonstrate deep learning’s potential for cancer surveillance.

The continued development and maturation of automated data tools, among the objectives outlined in the White House’s Cancer Moonshot initiative, would give medical researchers and policymakers an unprecedented view of the US cancer population at a level of detail typically obtained only for clinical trial patients, historically less than 5 percent of the overall cancer population.

Creating software that can understand not only the meaning of words but also the contextual relationships between them is no simple task. Humans develop these skills through years of back-and-forth interaction and training. For specific tasks, deep learning compresses this process into a matter of hours.

Typically, this context-building is achieved through the training of a neural network, a web of weighted calculations designed to produce informed guesses on how to correctly carry out tasks, such as identifying an image or processing a verbal command. Data fed to a neural network, called inputs, and select feedback give the software a foundation to make decisions based on new data. This algorithmic decision-making process is largely opaque to the programmer, a dynamic akin to a teacher with little direct knowledge of her students’ perception of a lesson.

GPUs, such as those in Titan, can accelerate this training process by quickly executing many deep-learning calculations simultaneously. In two recent studies, Tourassi’s team used accelerators to tune multiple algorithms, comparing results to more traditional methods. Using a dataset composed of 1,976 pathology reports provided by NCI’s Surveillance, Epidemiology, and End Results (SEER) Program, Tourassi’s team trained a deep-learning algorithm to carry out two different but closely related information-extraction tasks. In the first task the algorithm scanned each report to identify the primary location of the cancer. In the second task the algorithm identified the cancer site’s laterality–or on which side of the body the cancer was located.

By setting up a neural network designed to exploit the related information shared by the two tasks, an arrangement known as multitask learning, the team found the algorithm performed substantially better than competing methods.

Another study carried out by Tourassi’s team used 946 SEER reports on breast and lung cancer to tackle an even more complex challenge: using deep learning to match the cancer’s origin to a corresponding topological code, a classification that’s even more specific than a cancer’s primary site or laterality, with 12 possible answers.

The team tackled this problem by building a convolutional neural network, a deep-learning approach traditionally used for image recognition, and feeding it language from a variety of sources. Text inputs ranged from general (e.g., Google search results) to domain-specific (e.g., medical literature) to highly specialized (e.g., cancer pathology reports). The algorithm then took these inputs and created a mathematical model that drew connections between words, including words shared between unrelated texts.

Comparing this approach to more traditional classifiers, such as a vector space model, the team observed incremental improvement in performance as the network absorbed more cancer-specific text. These preliminary results will help guide Tourassi’s team as they scale up deep-learning algorithms to tackle larger datasets and move toward less supervision, meaning the algorithms will make informed decisions with less human intervention.

In 2016 Tourassi’s team learned its cancer surveillance project will be developed as part of DOE’s Exascale Computing Project, an initiative to develop a computing ecosystem that can support an exascale supercomputer–a machine that can execute a billion billion calculations per second. Though the team has made considerable progress in leveraging deep learning for cancer research, the biggest gains are still to come.

Citation: Yoon, Hong-Jun, Arvind Ramanathan, and Georgia Tourassi. “Multi-task Deep Neural Networks for Automated Extraction of Primary Site and Laterality Information from Cancer Pathology Reports.” In INNS Conference on Big Data, pp. 195-204. Springer International Publishing, 2016.
DOI: http://dx.doi.org/10.1007/978-3-319-47898-2_21
Adapted from press release by US Department of Energy, Oak Ridge National Laboratory.

Researchers identify suicidal behavior using machine learning algorithm on patients verbal and non-verbal data

A new study shows that machine learning is up to 93 percent accurate in correctly classifying a suicidal person and 85 percent accurate in identifying a person who is suicidal, has a mental illness but is not suicidal, or neither. These results provide strong evidence for using advanced technology as a decision-support tool to help clinicians and caregivers identify and prevent suicidal behavior, says John Pestian, PhD, professor in the divisions of Biomedical Informatics and Psychiatry at Cincinnati Children’s Hospital Medical Center and the study’s lead author. The study is published in the journal Suicide and Life-Threatening Behavior.

Dr. Pestian and his colleagues enrolled 379 patients in the study between Oct. 2013 and March 2015 from emergency departments and inpatient and outpatient centers at three sites. Those enrolled included patients who were suicidal, were diagnosed as mentally ill and not suicidal, or neither – serving as a control group.

Each patient completed standardized behavioral rating scales and participated in a semi-structured interview answering five open-ended questions to stimulate conversation, such as “Do you have hope?” “Are you angry?” and “Does it hurt emotionally?”

The researchers extracted and analyzed verbal and non-verbal language from the data. They then used machine learning algorithms to classify the patients into one of the three groups. The results showed that machine learning algorithms can tell the differences between the groups with up to 93 percent accuracy. The scientists also noticed that the control patients tended to laugh more during interviews, sigh less, and express less anger, less emotional pain and more hope.

Citation: Pestian, John P., Michael Sorter, Brian Connolly, Kevin Bretonnel Cohen, Cheryl McCullumsmith, Jeffry T. Gee, Louis‐Philippe Morency, Stefan Scherer, and Lesley Rohlfs. “A Machine Learning Approach to Identifying the Thought Markers of Suicidal Subjects: A Prospective Multicenter Trial.” Suicide and Life-Threatening Behavior (2016).
DOI: http://dx.doi.org/10.1111/sltb.12312
Adapted from press release by Cincinnati Children’s Hospital Medical Center