Novel approach to utilizing electronic health records for dermatologic research: Developing a multi-institutional federated data network for clinical and translational research in psoriasis and psoriatic arthritis
Published Web Location
https://doi.org/10.5070/D385f777mmMain Content
Novel approach to utilizing electronic health records for dermatologic research: Developing a multi-institutional federated
data network for clinical and translational research in psoriasis and psoriatic arthritis
April W Armstrong1 MD MPH, Shalini B Reddy2 BA, Amit Garg3 MD
Dermatology Online Journal 18 (5): 2
1. University of California Davis, Department of Dermatology, Sacramento, California2. Boston University School of Medicine, Boston, Massachusetts
3. Boston University School of Medicine, Department of Dermatology, Boston, Massachusetts
Abstract
The implementation of Electronic Health Records (EHR) in the United States has created new opportunities for research using automated data extraction methods. A large amount of information from the EHR can be utilized for clinical and translational research. To date, a number of institutions have the capability of extracting clinical data from EHR to create local repositories of de-identified data amenable to research queries through the Informatics for Integrated Biology and the Bedside (i2b2) platform. Collaborations among institutions sharing a common i2b2 platform hold exciting opportunities for research in psoriasis and psoriatic arthritis. With the automated extraction of patient-level data from multiple institutions, this novel informatics network has the ability to address high-priority research questions. With commitment to high-quality data through applied algorithms for cohort identification and validation of outcomes, the creation of Psoriasis and Psoriatic Arthritis Integrated Research Data Network (PIONEER) will make a significant contribution to psoriasis and psoriatic arthritis research.
Introduction
Clinical and translational research discoveries in psoriasis and psoriatic arthritis rely significantly on the size and quality of data sources. The ability to query research information comprehensively from a large pool of patients with psoriasis and psoriatic arthritis across multiple health care facilities can provide valuable information on epidemiology, disease detection, management, prognostication, and therapy comparison among at-risk psoriasis populations. However, large disease-specific databases for psoriasis and psoriatic arthritis that accurately reflect real-world practices and that are able to capture an expansive array of clinical relevant outcomes are lacking in the U.S.
Gaps in large-database resources for psoriasis and psoriatic arthritis research
The four primary forms of databases currently available for psoriasis and psoriatic arthritis research include (1) single institutional databases, (2) claims databases, (3) publicly available national databases, and (4) registries. Whereas single institutional databases afford individual investigators the ability to analyze detailed parameters on their psoriasis cohort, these databases are small and underpowered to detect small effects. They are lacking in diversity of patient populations and have limited ability to provide generalizable information. Whereas the existing claims databases offer sample size advantage, these databases are largely restricted to administrative information and lack point-of-care data and other clinically relevant parameters. Furthermore, patients who change insurance carriers or lack health coverage temporarily are not adequately captured in these databases. Third, although the publicly available databases such as NAMCS, NHAMCS, and NHANES provide means of accessing representative national-level data, these databases lack disease-specific, detailed clinical information relevant to the psoriasis and psoriatic arthritis patients. Studies using the United Kingdom-based General Practice Research Database have made significant contributions to our understanding of psoriasis. However, this database is not publicly available and is costly to individual investigators; furthermore, it reflects practice patterns of UK physicians and health outcomes of UK residents only. Finally, whereas established registries collect valuable clinical and laboratory information, usually with a more narrowed focus on a select group of patients, their development and sustainability require substantial and ongoing financial support and human resource.
Innovations in utilizing electronic health records for clinical and translational research: opportunities from the i2b2 platform
The implementation of Electronic Health Records (EHR) in the United States has created new opportunities for clinical and translational research using automated data extraction methods. In essence, EHR represents a considerable preexisting investment in health information technology. The large amount of information generated by the EHR through the usual points of care can be made available for outcomes research. Compared to existing databases, the EHR provides more comprehensive information regarding health history and status for many involved organ systems from larger patient cohorts. This allows investigators to perform more robust analyses over a broader set of outcomes with conclusions that are better generalized.
The ability to extract and network high-quality data from EHR is not straight forward; failure to recognize its complexity or the need for quality assurance can lead to erroneous analyses and conclusions. The challenge facing outcomes research that utilizes health informatics is to devise ways to extract, clean, link data captured through disparate informatics systems. Today, a number of institutions have the capability of extracting clinical data from their EHR to create local repositories of de-identified data amenable to research queries through the Informatics for Integrated Biology and the Bedside (i2b2) platform (i2b2, http://www.i2b2.org). The i2b2 platform is a scalable informatics platform whose data structure, ontology, and query tools allow the integration and analysis of large amounts of de-identified clinical, laboratory, claims, and payer data from multiple disparate health systems. It has a common architecture that enables separate EHRs to share a common messaging protocol. The i2b2 platform has been adopted by over 60 U.S. academic institutions and 10 international medical centers.
The structure and flow of the i2b2 platform are briefly summarized herein; detailed description has been published elsewhere [1, 2, 3, 4]. The i2b2 workbench communicates with what is termed the i2b2 “hive” via XML-based web services. The i2b2 toolkit includes collections of i2b2 “hives.” Each hive consists of a compilation of “cells” or software components. Each cell has distinct functionalities and cells can be continually added to the hive. Some hives serve core functions such as authentication, database services, and ontology services, whereas others hives are optional, such as natural language processing. Because i2b2 adopts an architecture that communicates by XML, this has enabled adopters to develop related cells that can be shared with the larger community [4]. Importantly, recent innovations in distributed queries (e.g., W3EMRS, shared pathology informatics network) have created a dynamic interface that allows users to query multiple i2b2 databases [5, 6, 7, 8, 9].
Examples of multi-institutional federated data network utilizing electronic health records via the i2b2 platform from non-dermatology specialties
Pharmacovigilance and genomics research represent two successful examples of multi-institutional federated data networking utilizing the EHR. In pharmacovigilance, the ability to capture and characterize richer clinical data, rather than relying on claims data, is among the key advantages in utilizing the EHR for research. For example, using EHR data, investigators were able to accurately identify and associate a safety signal in cardiovascular mortality to the use of a particular cyclooxygenase 2 inhibitor (Vioxx®) [10]. In another example, investigators were able to identify a higher risk for myocardial infarction observed with the oral hypoglycemia agent rosiglitazone (Avandia®) compared to other agents in the same pharmacologic class [11]. These seminal studies were reviewed by the Food and Drug Administration in the safety assessments of these medications. In the field of electronic health record-driven genomic research (EDGR) [12-21], investigators were able to over-represent minorities from patient cohorts at large academic centers to achieve sufficient sample sizes for these traditionally under-represented populations in genomics research. Furthermore, EDGR investigators reported that the reproducibility of phenotypes from codified data and natural language processed terms was over 90 percent across collaborating institutions [3, 14, 22].
Opportunities for developing multi-institutional federated data network for psoriasis and psoriatic arthritis research
The introduction of i2b2 represents a radical advancement in the ability to use clinical data for hypothesis-driven populations research. Collaboration among multiple institutions sharing a common i2b2 platform holds exciting opportunities for research in psoriasis and psoriatic arthritis. When implemented across committed institutions, i2b2 has the ability to extract and standardize detailed ontologies (including diagnoses, procedures, medications, laboratory values, vitals, clinical documentation data, demographic groups, payer information, and visit dates) on large cohorts of psoriasis patients. To increase power and generalizability of study findings, investigators specializing in psoriasis who are interested in the cross-institutional data networking may pre-specify the clinical and laboratory fields, and extract the information from multiple institutions in a systematic and standardized fashion.
For example, whereas a single institution may have only 50 psoriasis patients who have experienced myocardial infarction, a network of 10 similarly sized institutions with 500 cases of myocardial infarction may allow meaningful analysis of this association as well as modification by treatment in a much shorter time frame than prospective data gathering. When implemented across a large number of institutions and communities that share the i2b2 platform, this research data network has the ability to extract and standardize detailed ontologies (including demographics, problem lists, diagnoses, medications, clinical observations, procedures, and laboratory data) on a large cohort of psoriasis and psoriatic arthritis patients over multiple years of observation since the inception of each institution’s EHR.
Suitable research questions in psoriasis and psoriatic arthritis for utilizing a multi-institutional federated data network
For addressing clinical and translational research questions in psoriasis, a multi-institutional EHR data network based on the i2b2 platform offers several unique advantages. First, the depth and richness of the real-world clinical care data from multiple institutions create context for specific research questions. Second, with pooled sample size and time-stamped data, the studies are more likely to be powered to examine rare events; longitudinal analysis of data is possible to track event development and changes in patient status. Third, the automated retrieval of data specific to psoriasis and psoriatic arthritis allows for efficient capturing of relevant parameters. The use of natural language processing i2b2 cells will enable investigators to sort through non-structured clinical documentation.
With commitment from pilot institutions that share the i2b2 platform, the development of the Psoriasis and Psoriatic Arthritis Integrated Research Data Network (PIONEER) is underway. The goals of the PIONEER are to (1) establish a coalition of institutions with shared i2b2 platform and shared goals of advancing research in psoriasis and psoriatic arthritis, (2) accurately identify a cohort of psoriasis and psoriatic arthritis patients from this coalition, (3) analyze the federated cohort for high priority research questions, and (4) establish mechanisms to expand and sustain PIONEER.
With the automated extraction of patient-level data from multiple institutions, we will be able to leverage this novel informatics network to address high-priority clinical and translational research questions. Examples of suitable research aims include the following: (1) define disease presentation and course, including characterizing early psoriasis and psoriatic arthritis, identifying predictors of disease development and exacerbation, identifying biomarkers of disease progression, and predicting changes in disease course with treatment; (2) conduct comparative effectiveness research that examines effectiveness and safety of approved and off-label dosing of topical and systemic treatments in the real-world practice setting, including detection of rare adverse effects and regional differences in physician practice behaviors; (3) identify common and rare co-morbid conditions associated with psoriasis and psoriatic arthritis.
In summary, in an era in which significant investment has been made by healthcare organizations to implement EHR, utilizing information gathered through daily clinical practice to address important research questions is a keystone of modern-day clinical and translational research. The value of using collective data from multiple institutions to create a federated data network for psoriasis and psoriatic arthritis research is substantial. With commitment to high-quality data through applied algorithms for cohort identification and validation of outcomes, the investigators and supporters of PIONEER aim to advance psoriasis and psoriatic arthritis research.
References
1. Healthcare P (2012) i2b2: Informatics for Integrating Biology & the Bedside.2. Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, et al. (2010) Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). Journal of the American Medical Informatics Association : JAMIA 17: 124-130. [PubMed]
3. Kohane IS, Churchill SE, Murphy SN (2012) A translational engine at the national scale: informatics for integrating biology and the bedside. Journal of the American Medical Informatics Association : JAMIA 19: 181-185. [PubMed]
4. Murphy SN, Mendis M, Hackett K, Kuttan R, Pan W, et al. (2007) Architecture of the open-source clinical research chart from Informatics for Integrating Biology and the Bedside. AMIA Annual Symposium proceedings / AMIA Symposium AMIA Symposium: 548-552. [PubMed]
5. Kohane IS, Greenspun P, Fackler J, Cimino C, Szolovits P (1996) Building national electronic medical record systems via the World Wide Web. Journal of the American Medical Informatics Association : JAMIA 3: 191-207. [PubMed]
6. Namini AH, Berkowicz DA, Kohane IS, Chueh H (2004) A submission model for use in the indexing, searching, and retrieval of distributed pathology case and tissue specimens. Studies in health technology and informatics 107: 1264-1267. [PubMed]
7. Holzbach AM CH PA, Kohane IS, Berkowicz D (2004) A query engine for distributed medical databases. Medinfo 1519.
8. Drake TA, Braun J, Marchevsky A, Kohane IS, Fletcher C, et al. (2007) A system for sharing routine surgical pathology specimens across institutions: the Shared Pathology Informatics Network. Human pathology 38: 1212-1225. [PubMed]
9. Weber GM, Murphy SN, McMurry AJ, Macfadden D, Nigrin DJ, et al. (2009) The Shared Health Research Information Network (SHRINE): a prototype federated query tool for clinical data repositories. Journal of the American Medical Informatics Association : JAMIA 16: 624-630. [PubMed]
10. Brownstein JS, Sordo M, Kohane IS, Mandl KD (2007) The tell-tale heart: population-based surveillance reveals an association of rofecoxib and celecoxib with myocardial infarction. PloS one 2: e840. [PubMed]
11. Brownstein JS, Murphy SN, Goldfine AB, Grant RW, Sordo M, et al. (2010) Rapid identification of myocardial infarction risk associated with diabetes medications using electronic medical records. Diabetes care 33: 526-531. [PubMed]
12. Himes BE, Klanderman B, Kohane IS, Weiss ST (2011) Assessing the reproducibility of asthma genome-wide association studies in a general clinical population. The Journal of allergy and clinical immunology 127: 1067-1069. [PubMed]
13. Murphy S, Churchill S, Bry L, Chueh H, Weiss S, et al. (2009) Instrumenting the health care enterprise for discovery research in the genomic era. Genome research 19: 1675-1681. [PubMed]
14. Liao KP, Cai T, Gainer V, Goryachev S, Zeng-treitler Q, et al. (2010) Electronic medical records for discovery research in rheumatoid arthritis. Arthritis care & research 62: 1120-1127. [PubMed]
15. Kurreeman F, Liao K, Chibnik L, Hickey B, Stahl E, et al. (2011) Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records. American journal of human genetics 88: 57-69. [PubMed]
16. Roden DM, Pulley JM, Basford MA, Bernard GR, Clayton EW, et al. (2008) Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clinical pharmacology and therapeutics 84: 362-369. [PubMed]
17. Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, et al. (2010) PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 26: 1205-1210. [PubMed]
18. Denny JC, Ritchie MD, Crawford DC, Schildcrout JS, Ramirez AH, et al. (2010) Identification of genomic predictors of atrioventricular conduction: using electronic medical records as a tool for genome science. Circulation 122: 2016-2021. [PubMed]
19. Dumitrescu L, Ritchie MD, Brown-Gentry K, Pulley JM, Basford M, et al. (2010) Assessing the accuracy of observer-reported ancestry in a biorepository linked to electronic medical records. Genetics in medicine : official journal of the American College of Medical Genetics 12: 648-650. [PubMed]
20. Pulley J, Clayton E, Bernard GR, Roden DM, Masys DR (2010) Principles of human subjects protections applied in an opt-out, de-identified biobank. Clinical and translational science 3: 42-48. [PubMed]
21. Ritchie MD, Denny JC, Crawford DC, Ramirez AH, Weiner JB, et al. (2010) Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. American journal of human genetics 86: 560-572. [PubMed]
22. Stahl EA, Raychaudhuri S, Remmers EF, Xie G, Eyre S, et al. (2010) Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nature genetics 42: 508-514. [PubMed]
© 2012 Dermatology Online Journal