You are on page 1of 1

Heterogeneity and Accuracy Issues in Federated Patient Data Repositories

Sarah N. Lim Choi Keung1, Edward Tyler1, Adel Taweel2, Theodoros N. Arvanitis1, Brendan Delaney2, F. D. Richard Hobbs1
University of Birmingham, United Kingdom1; King’s College London, United Kingdom2

Analysis of medical coding within the federated data repository


Summary
Summary Discussion
 Compared the data coding with the reference Read Codes3.
The federation of patient data repositories is an essential precursor of  Data does not strictly follow the standardised Read Codes.  Need for analysis of the accuracy and consistency issues for our
use of the federated data for patient cohort identification.
analysis and reuse in clinical research. In the United Kingdom, primary  Several proprietary coding approaches are also used, devised by
care data originates from GP systems with syntactic and semantic proprietary EHR systems.
differences. Identifying and recruiting eligible patients to clinical studies Investigation into ways to optimize data quality
rely on the ability to search these repositories, despite data  Include as much data as possible for the queries.
heterogeneity. In this work, we discuss the heterogeneity issues of data Results E.g. converting 7-byte to 5-byte Read Codes for correct codes. The
federation from all widely adopted GP systems to create a unified proportion of incorrect codes decreases from 32.0% to 27.3% after
repository. considering the correct 7-byte codes.
Error Category Codes Practices Patients  Maintain the accuracy of the data by encouraging the use of
standardised coding instead of proprietary ones.
Introduction Widely used GP system 46.1% 51 65.8%
 Preserve the patient information not currently possible to code in
7-byte Read Codes 32.0% 6 7.8%
 Patient cohort identification is time-consuming and costly. the standard Read Codes.
 Processes essential for adequate numbers of research subjects. Other proprietary codes 24.8% 6 12.3%  Additional automated mapping from proprietary codes to
standardised codes.
 In UK primary care, clinical research staff and GP practice staff Table 1: Categories of incorrect Read Codes and corresponding ratios.
directly search the patient electronic health records (EHRs).
 Despite 100% of GP practices using EHRs1, the electronic data is not 4 levels of coding issues based on Table 1 have been identified. These
Conclusions
directly accessible to research staff for anonymous patient search. are shown with examples in Figure 1.
 Proportion of incorrect Read Codes within the federated patient
 Main Goal: Facilitate patient cohort identification and recruitment by data repository is a significant problem for data quality.
semi-automating these processes using EHR data.
 We are further examining the data quality of the federated patient
 Research aim presented: Analysis of the consistency of use of data repository to identify other potential data quality issues.
standardised medical coding in the federated electronic patient data 7-byte Read Codes Proprietary codes with
repository. (5-byte code + 2 byte corresponding Read  We are also studying the effects of the data quality issue on the
synonym code) Codes accuracy of patient cohort identification for feasibility studies.
 Further research is being carried out into data quality optimization
Methods E.g. Pulmonary Tuberculosis E.g. Calamine Lotion has techniques to improve data re-use for clinical research.
has code A11.., specifically Read Code m313., but a
A1100 for its preferred term proprietary code of
Regional federation of primary care data and A1111 for the CALO454 is used instead.
 In the UK, Primary Care Trusts (PCTs) are regional organizations, synonymous term Lung Acknowledgements
Tuberculosis.
each responsible for a number of GP practices. The project is funded by the UK National Institute for Health Research’s
 Some PCTs have local initiatives to integrate patient data from GP Birmingham and the Black Country Comprehensive Local Research Network.
systems into a unified EHR repository.
 The main aims for the federation of patient data are for References
commissioning purposes and improved patient care through Proprietary codes with Uninformative data
monitoring and reports. no corresponding Read 1. Benson, T. Principles of Health Interoperability HL7 and SNOMED. London:
Codes Springer; 2010.
 New usage: Identification of patient cohorts for clinical research E.g. codes for free text 2. National Health Service Connecting for Health [Internet]. Leeds: NHS; 2011
through access to these regional data repositories. without any further
Proprietary codes introduced description (@AAA for Free [cited 2011 Feb 25]. Available from: http://www.connectingforhealth.nhs.uk/
for non-existing terms in the Text). systemsandservices/ data/uktc/readcodes/.
Medical coding in UK primary care standard Read Code, such 3. NHS TRUD Service [Internet]: NHS, 2011 [cited 2011 Feb 25]. Available
as Non Vegetarian. from: https://www.uktcregistration.nss.cfh.nhs.uk/trud/.
 The PCT under study provides healthcare services for over 300,000
patients in 78 GP practices.
 The unified repository federates data from 63 GP practices. Contact
 The data is coded using the Read Codes2 version 2, a 5-byte clinical For further information, please contact Sarah Lim Choi Keung
terminology ubiquitously used in UK primary care. Figure 1: Levels of coding issues and examples. (s.n.limchoikeung@bham.ac.uk) or Theo Arvanitis (t.arvanitis@bham.ac.uk).

You might also like