Mining for biomedical information
Grants to help UD researchers find patterns, study drug interactions
11:38 a.m., Jan. 15, 2015--It’s amazing what a quick glance can tell you, notes Hagit Shatkay, associate professor of computer and information sciences at the University of Delaware. During the Ebola scare in the United States, one could see a newspaper image of medical professionals in HAZMAT suits and guess what the story was about, even without reading the headlines.
Shatkay and Chandra Kambhamettu, professor of computer and information sciences and image analysis expert, have received a grant from the National Library of Medicine (NLM) to apply that principle to biomedical research, using what Shatkay calls “computational glancing.”
Prof. Heck's legacy
Text mining treats the text in papers, on websites, etc., as data that can be statistically analyzed to find patterns. Physicians, researchers, and curators of medical databases rely on published text to find relevant information in their areas but the medical literature is vast and growing.
Applying text-mining tools to biomedical text is an appealing option but in many cases the results produced by text mining efforts are not precise enough to be useful. “We hope that we can get better results by using both text and images to find what relevant papers there might be,” says Shatkay.
Although the team’s actual methods will be more complex, a simple example illustrates the concept: A paper about a newly discovered gene-regulating region on the genome will almost always have a picture showing the relevant part of the DNA expressed using the letters A, C, G, and T, which symbolize the bases that form the genetic code. Researchers can analyze each image using optical character recognition (OCR), a technique that picks out text information from pictures.
“If you get tons of A, C, G, T, and very few other characters, you can easily say, ‘Oh, this is a DNA image,’” Shatkay explains.
A two-year, $560,000 grant from the NLM and the National Institute of General Medical Sciences (under the NIH R56 grant program) will allow Shatkay and Kambhamettu and their associates to investigate methods of integrating text and image information to obtain relevant information in three very specific areas.
Shatkay and her associates will collaborate with the CYRENE project at Brown University on identifying information about cis-regulatory regions (segments of DNA that regulate the expression of certain genes); with the Jackson Laboratory on optimal ways to find information on gene expression in mice; and with the Protein Information Resource, run by Cathy H. Wu, the Unidel Edward G. Jefferson Chair of Bioinformatics and Computational Biology at UD who also has an appointment at Georgetown University, to uncover information on protein-protein interactions.
Cecilia Arighi, research associate professor at the Center for Bioinformatics and Computational Biology, and several graduate students will also collaborate on the work.
Although the techniques developed will be project-specific, showing that it is possible to combine text and image mining for biomedical information will be a big step.
“Little has been done so far to use this type of idea to find relevant papers,” Shatkay admits, “and one reason is that it is hard. Text mining is typically easier than image mining.”
Knowing how to combine the two, and what weight to give each type of information, has proven very tricky for the few people who have attempted it, so the team is not doing a full analysis of the images.
“We’re trying to obtain from the images the bare-minimum information needed to identify the image types in order to decide whether a paper is relevant for a specific task,” Shatkay says. “Compared to full-fledged image analysis, this task is simpler and we believe we can do it automatically.”
Tracking drug interactions through data
Shatkay has also received another, longer-term grant from the NLM as part of a team looking for information on drug-drug interactions (DDIs).
When patients take more than one drug at a time, the different drugs can interact in dangerous ways, sometimes resulting in adverse reactions and hospitalization.
The information on DDIs is hard to sort out, however, because what happens when two drugs are combined in a test tube may differ from the outcome when they come together in a patient’s body, and medications often contain more than one compound that might be causing the problem.
Even when it is known from clinical studies that two drugs taken together may cause adverse reactions, researchers often don’t know how and why the interaction happens, especially for older drugs, because newer information from cell biologists, chemistry labs, and hospitals may not all get put together.
“What do you do if a person needs to take two medications, and they interact?” asks Shatkay. “If you can’t understand the mechanism, you can’t propose an alternative treatment. You need to know what happens in the test tube and in the cell in order to understand why the interaction happens, and what may prevent it.”
Shatkay is co-principal investigator on the grant, along with Luis Rocha and Lang Li from Indiana University.
The team will use text mining to identify evidence of and reasons for drug interactions, combining studies from across various fields.
“The idea is to take the literature and try to fill in the gaps, or point out where there are gaps,” Shatkay says. “You take all known drugs that interact clinically, and go back into the literature and find whether there is a report of interaction on the molecular level and if not, point out that this is an area for research.”
The project will be text-based -- no images yet -- but will be looking for a lot more detail. “We want to identify the relevant sentences, indicative terms, and exact quantities,” Shatkay says. “We want to find the evidence needed to understand exactly what makes these two drugs interact.”