All research in progress seminars are off-the-record. Any information about methodology and/or results are embargoed until publication.
Abstract: In this talk I will describe data mining methods that transform unstructured patient notes taken by doctors, nurses and other clinicians into a de-identified, temporally ordered, patient-feature matrix using standardized medical terminologies. We demonstrate how to use the resulting high-throughput data for uncovering natural experiments (i.e. learning practice-based evidence), for conducing drug safety studies, and for building predictive models.
We show that it is possible to investigate adverse drug event associations with high accuracy by analyzing textual notes in a clinical data warehouse using automated methods. We examine suspected associations for confounding via stratification and propensity score matching. We find that such an analysis of textual clinical notes could detect adverse drug events roughly 2 years before the official alert. Using this approach we examine sub-populations of patients with gastro-esophageal reflux being treated with either a PPI or an H2 receptor antagonist (H2Bs) for an increased risk of myocardial infarction. The association of PPIs with such events was hypothesized based on experimental results that show that PPIs elevate plasma levels of asymmetric dimethylarginine, an independent predictor of major adverse cardiovascular events. We will also discuss a proof of principle study which shows the potential of text-analytics to mine clinical data warehouses to uncover ‘natural experiments’ that profile the safety of Cilostazol in patients with peripheral artery disease.
We envision that such methods will enable examination of difficult to test clinical hypotheses and aid in post-marketing drug safety surveillance.