Interview transcript: George Hripcsak
Accuracy and Speed in Biomedical Informatics
Biomedical informatics is the study and use of information in health care. It draws methods from a number of methodological fields -- computer science, applied mathematics, philosophy, physics, statistics, epidemiology, cognitive science, psychology, and a little bit of economics. Then, with the domain areas we can divide up the field in different ways in terms of scale: computational biology is at the cellular scale; imaging at the human, or organ, scale; medicine, or clinical, informatics at the personal level; public health informatics at the population scale; and translational informatics where we cross among those various scales. The field ranges from people who do a lot of theory and develop new algorithms to people who do very practical things such as building clinical information systems that doctors use to deliver health care in the hospital.
Then there's natural language processing. Carol Friedman and Noemie Elhadad are the two natural language processing experts in our department. Carol started working in the field in the 1970s as a lab technician and then got her PhD in the 1980s from the Courant Institute of Mathematical Sciences at NYU working with Naomi Sager, one of the pioneers in natural language processing in health care. Carol developed the next generation of natural language processing systems. The difference was that her system actually worked. Now, there are a lot of systems that are used for research and to test whether a hypothesis for a language would actually work, but with this one you could actually put text in one end and the structured information would come out the other end. We began doing studies with this new tool. In the 1990s, tuberculosis was more prevalent in New York City, so one of our problems was the fact that our patients would come in with something, but we didn't know what and they'd be put in a hospital room with someone that might have active tuberculosis and suddenly now the patient would be at risk for tuberculosis as well. We needed to identify how to catch those patients with active TB ahead of time and make the doctor aware of it so the disease wouldn't spread. We wrote a rule in the electronic health record that said, "If the patient is at risk for tuberculosis and they're in a shared room, then tell the hospital epidemiologist, who then checks it out and moves the patient." The problem was, the way we detected the tuberculosis was through a tuberculosis culture, or a tuberculosis smear, or a tuberculosis medication. But those are all examples of when the doctor knew that a person had tuberculosis, so the doctor wouldn't put them in a shared room anyhow. Then we parsed the chest x-rays. The radiologist would read these images within a day and that became a report that is natural language. Carol's system would then parse that report and look for particular findings like a cavitary lesion or an upper lobe infiltrate. If the patient had either of those, there would be a pretty good chance of active tuberculosis. The results from Carol's system would go to the hospital epidemiologist, and we found that by using that system, it reduced the number of patients with active TB that were put into shared rooms in half.
We've used natural language processing for a number of studies. We did a study where we parsed all of the x-rays in our neighborhood for ten years and we then verified the crime rate. We found that bullet and stab wounds in the emergency room, as evidenced by their wounds on the chest x-ray, dropped 46 percent just like the rate of violent crime in our neighborhood. So, we used external indicators to see if what we were getting from natural language processing on this database actually matched verifiable statistics that we knew to be true.
In medical language processing, we've swung very far in one direction with everyone trying to produce natural language processing statistically; that is, taking a corpus, annotating it, and then doing machine learning on that corpus to build a system to recognize what is being said. As an expert in linguistics, Naomi Sager's method was based on syntax. Carol's system was based on a semantic grammar. So she said, I can't sit there and figure out every "the" and "a" because then I'll be here forever and go down a path that I didn't need to go down. Instead, using sub-language analysis, we can figure out what's important in the sub-language, which is the language of the radiologist, or internist, or pathologist, etc and convert it into a semantic grammar which is something like, "The disease is in the body part." So, these classes are not noun, verb, and adverb, they're things like "body part" or "sidedness" or "disease" or "finding" or "degree." This is the concept behind a semantic grammar. She hand-coded this grammar over a year or two and has been improving it over the last twenty years since then such that we're now able to parse things very accurately. We did a study, published in the Annals Of Internal Medicine, comparing different methods of parsing a particular set of medical reports. We had radiologists read the reports, internists read the reports, laypersons read the reports, and Carol's system read the reports, and then we did a sort of Google-like search, and Carol's system was almost indistinguishable from the doctors', radiologists', and internists' in interpreting what was in there and getting it coded right. And it was far above the laypersons and far above the Google-like search.
But now in the last ten years, there's been a hope that with more data, we can do these things with statistics. In the early days of speech processing and speech recognition, there was some statistical work and some knowledge and symbolic work, but statistics won-out. Now people are trying to do the same thing by applying statistics to natural language processing, but it hasn't been working that well. For example, you can do specific tests to find out, "Does this report tell me if the patient smokes or not?" You can train on a training set and come up with a pretty good algorithm that tells you yes or no. But if you're trying to read the report and you have no idea if they're going to be telling you, "I went on a vacation in the Himalayas and I came back with a rash on my left knee," it's pretty hard to train for that. And in cases like that, the semantic grammar approach is still out-performing the statistical approaches. Ultimately, I think what's going to work is a combination of the statistical approach and some kind of symbolic approach like a semantic grammar that requires expertise and knowledge in order to create the system.
Another member of our department, Noemie Elhadad is looking at nuances in patient utterances so as to develop a program that can read between the lines in medical reports. For example, a note from the emergency department could read, "31 year old woman here with pain," or "25 year old female back again for pain meds." If you parse it there's both some woman and some age for pain, but one statement tells you that someone has probably hurt herself and is visiting a doctor to fix it, versus the other where there's a person possibly running from emergency room to emergency room looking for pain medications. But the actual word difference is small, and if you use a modern system that uses synonyms, you'd completely lose those nuances. The choice of "womans" versus "female" is also telling. If you say "female" you are distancing yourself and implying that something is different about this person.