You are on page 1of 47

Scalable Architecture for Federated Therapeutic Inquiries Network (SAFTINet)

ETL Specifications Document


Version 4.0
March 3rd, 2013

SAFTINet ETL Specifications Document

Page 1

LICENSE
2011 Foundation for the National Institutes of Health (FNIH).
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this document except in
compliance with the License. You may obtain a copy of the License at http://omop.fnih.org/publiclicense.
Unless required by applicable law or agreed to in writing, documentation and software distributed under
the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
either express or implied. Any redistributions of this work or any derivative work or modification based on
this work should be accompanied by the following source attribution: "This work is based on work by the
Observational Medical Outcomes Partnership (OMOP) and used under license from the FNIH at
http://omop.fnih.org/publiclicense.
Any scientific publication that is based on this work should include a reference to
http://omop.fnih.org.

This document was created specifically for the Scalable Architecture for Federated Translational Inquiries
Network (SAFTINet) project, in collaboration with OMOP. It reflects changes to the OMOP CDMv2 to
create OMOP CDMv3 which were done in collaboration with FNIH OMOP and the SCANNER (Scalable
National Network for Effectiveness Research) project (http://scanner.ucsd.edu/)
SAFTINet is supported by grant number R01HS019908 from the Agency for Healthcare Research and Quality.

SAFTINet ETL Specifications Document

Page 2

TABLE OF CONTENTS
1.0 Introduction

2.0 Definition of terms

3.0 Assumptions

12

4.0 Source Data Mapping Approach


4.1 Change to Existing Tables
4.2 Table Name: ORGANIZATION
4.3 Table Name: CARE_SITE
4.4 Table Name: PROVIDER
4.5 Table Name: X_DEMOGRAPHIC
4.6 Table Name: VISIT_OCCURRENCE
4.7 Table Name: DRUG_OCCURRENCE
4.8 Table Name: CONDITION_OCCURRENCE
4.9 Table Name: PROCEDURE_OCCURRENCE
4.10 Table Name: OBSERVATION

13
14
15
17
18
21
25
27
30
33
35

5.0 Appendix A: Table Specific Rules

38

6.0 Appendix B: Row Filters

39

7.0 Appendix C: Sending data using flatfiles

46

SAFTINet ETL Specifications Document

Page 3

Document Control Authors and Contributors


Name

Organization

Title

Patrick Hosokawa
Michael Kahn
Elias Brandt
Lisa Schilling

COHO

Statistician/Analyst

Reviewers
Name

Role

Title

Christian Reich

OMOP

Project Manager

Date Reviewed

Patrick Ryan

OMOP

Co-investigator

Document References
Document Title

Type of Reference

Document Location

OMOP CDM V3 Specification


OMOP CDM Core and Dictionary Tables Release
Notes
OMOP OSIM Specification

Business Rules
Detailed Technical
Information
Detailed Technical
Information

OMOP Download Center


OMOP Download Center

SAFTINet ETL Specifications Document

OMOP Download Center

Page 4

Change Record
Date

Author

Version

Change Reference

02-Nov-2009

1.0

Original OMOP ETL Template Document

04-Oct-2011

Vicki Fan, Mark


Khayter
Patrick Hosokawa

2.0

20-Dec-2011

Patrick Hosokawa

2.1

Document adapted to SAFTINet ETL data model,


flowcharts added to detail data flow from ETL model
to grid model
Document updated to 12/20/11 ETL data model

17-Mar-2012

Patrick Hosokawa

2.2

Document updated to 3/17/12 ETL data model

06-Aug-2012

Patrick Hosokawa

4.0

03-Mar-2013

Patrick Hosokawa

4.1

Change section removed, Appendix B updated, final


move to CDMv4. Added data on labs provided to
Appendix B.
Additions to Appendix B, Added Appendix C for
flatfile instructions

SAFTINet ETL Specifications Document

Page 5

1.0 Introduction
This document reflects the requirements, assumptions, business rules and transformations for the
implementation of OMOP CDM V3, as recommended for SAFTINet.
The purpose of this document is two-fold:
1. Describe ETL mapping of data from SAFTINet partners into Common Data Model.
2. Serve as a blueprint for equivalent ETL mapping processes for other data sources into CDM.
In each section, the tables and their mapping are individually reviewed along with any source specific
rules and exceptions.
The intended audiences for this document are the SAFTINet team and partner ETL technical personnel.
Sections of the document are targeted specifically towards each audience with appropriate focus and
level of detail.

SAFTINet ETL Specifications Document

Page 6

2.0 Definition of Terms


TERM
Activity

DEFINITION
A query or query response performed across the grid network as
described in the following use cases

Care Site (entity)

The Care Site table refers to the lower level of the provider
care hierarchy. Individual provider care locations will be
stored in this table.

Cohort

A collection of subjects who meet specific demographic or clinical


characteristics
The CDM intends to facilitate observational analyses of disparate
healthcare databases. The CDM defines table structures for each
of the data entities (e.g., Persons, Visit Occurrence, Drug
Exposure, Condition Occurrence, Observation, ProcedureOccurrence, etc.). It includes all observational data elements that
are relevant to identifying exposure to various treatments and
defining condition occurrence. The CDM includes both the
vocabulary of terms and the entity domain tables.
A concept is the basic unit of information. Concepts may be
grouped into a given domain. A concept is a unique term that has
a unique and static identifier/name, belongs to a Namespace, and
may exist in relation to other concepts. The vertical relationships
consist of "is a" statements that form a logical hierarchy. In
general, concepts above a given concept are referred to as
ancestors and those below as descendants.
A condition is a disease, such as a heart condition, as in medical
condition.
Condition Occurrences record individual instances of a Persons
Conditions (i.e., diagnoses) extracted from source data.
Conditions are recorded in various data sources in different forms
with varying levels of standardization, and are stored in the
CONDITION_OCCURRENCE table.
A terminology that is maintained by the American Medical
Association (AMA). It is used by hospitals for Medicare hospital
outpatient and by physician for outpatient services.
The data element mappings between two distinct data models,
terminologies, or concepts. Data mapping is the process of
creating data element mappings between two distinct data
models. Data mapping is used as a first step for a wide variety of
data integration tasks.
Demographics refer to selected population characteristics.
Demographics may include data such as race, age, sex, date of
birth, location, etc.
A data domain refers to all the unique values which a data
element may contain. For example, a database table that has
information about people, with one record per person, might
have a "gender" column. This gender column might be declared

Common Data Model

Concept

Condition
Condition Occurrence (entity)

Current Procedural
Terminology (CPT), 4th edition
Data Mapping

Demographics

Domain

SAFTINet ETL Specifications Document

Page 7

as a string data type, and allowed to have one of two known code
values: "M" for male, "F" for female -- and NULL for records
where gender is unknown or not applicable (or arguably "U" for
unknown as a sentinel value). The data domain for the gender
column is: "M", "F".

Drug

Drug Exposure (entity)

Encrypted Unique Identifiers

Electronic Health Record (EHR)

Electronic Medical Record


(EMR)

Extract, Transform, Load (ETL)


Generic Product Information
(GPI)

In database technology, domain refers to the description of an


attribute's allowed values. The physical description is a set of
values the attribute can have, and the semantic, or logical,
description is the meaning of the attribute.
In pharmacology, a drug as "a chemical substance used in the
treatment, cure, prevention, or diagnosis of disease or used to
otherwise enhance physical or mental well-being." Drugs may be
prescribed for a limited duration, or on a regular basis for chronic
disorders.
The Drug Exposure entity contains individual records that suggest
drug utilization by the person. Drug Exposure indicators store key
information about each person medication and the timing
thereof, including the drug (captured as standard Concept code in
the CDM), quantity, beginning date of medication, number of
days supply, period of exposure, and prescription refill data. Drug
Exposures are stored in the DRUG_EXPOSURE table.
Output of a de-identification process used to hash the identity of
subjects, providing them with a unique but de-identified
identifier.
Electronic health record refers to an individual person's medical
record in digital format. It may be made up of electronic medical
records from many locations and/or sources. The EHR is a
longitudinal electronic record of person health information
generated by one or more encounters in any care delivery
setting. Included in this information are person demographics,
progress notes, problems, medications, vital signs, past medical
history, immunizations, laboratory data and radiology reports.
The EHR has the ability to generate a complete record of a clinical
person encounter - as well as supporting other care-related
activities directly or indirectly via interface - including evidencebased decision support, quality management, and outcomes
reporting.
An electronic medical record is a computerized legal medical
record created in an organization that delivers care, such as a
hospital or outpatient setting. Electronic medical records tend to
be a part of a local stand-alone health information system that
allows storage, retrieval and manipulation of records. This
document will reference EHR moving forward even if certain data
sources internally use the EMR definition.
Process of getting data out of one data store (Extract), modifying
it (Transform), and inserting it into a different data store (Load).
A proprietary unique identifier for a drug used by the commercial
Medi-Span formulary database

SAFTINet ETL Specifications Document

Page 8

Grid-enabled network
Grid Node
Grid Portal

Healthcare Common Procedure


Coding System (HCPCS)

International Classification of
Disease, 9th Revision, Clinical
Modifications (ICD9-CM)
Investigator

Logical Observation Identifiers


Names and Codes (LOINC)

Limited Data Set

A collection of grid nodes (virtual organizations) capable of


responding to/with grid query/response services
A grid-enabled database containing data owned by a specific
health care entity or virtual organization.
Contains that set of services that allows queries to be sent, to
give access to authorized user, and administer query and
response activities.
HCPCS Level I codes are managed by the AMA (licensing fees
apply). The HCPCS Level II codes are managed by CMS (Centers
for Medicare & Medicaid Services). The Level II codes includes:
alphanumeric HCPCS procedure and modifier codes, their long
and short descriptions, and applicable Medicare administrative,
coverage, and pricing data. These codes are used for Medicare
outpatient services.
The official system of assigning codes to diagnoses and
procedures associated with hospital utilization in the United
States.
Any authorized clinician or researcher, or person designated to
act on their behalf (e.g., research assistant, statistician) who has
been authenticated for access to query and response
functionality on the grid-enabled network
Universal code names and identifiers to medical terminology
related to the Electronic Health Record and assists in the
electronic exchange and gathering of clinical results (such as
laboratory tests, clinical observations, outcomes management
and research).
As defined by HIPAA, limited data sets are data sets stripped of
certain direct identifiers that are specified in the Privacy Rule.
They are not de-identified information under the Privacy Rule. A
limited data set is PHI that excludes the following direct
identifiers of the individual or of relatives, employers, or
household members of the individual: (1) names; (2) postal
address information, other than town or city, state, and ZIP code;
(3) telephone numbers; (4) fax numbers; (5) e-mail addresses; (6)
social security numbers; (7) medical record numbers; (8) health
plan beneficiary numbers; (9) account numbers; (10)
certificate/license numbers; (11) vehicle identifiers and serial
numbers, including license plate numbers; (12) device identifiers
and serial numbers; (13) web URLs; (14) Internet Protocol (IP)
address numbers; (15) biometric identifiers, including fingerprints
and voiceprints; and (16) full-face photographic images and any
comparable images. Importantly, unlike de-identified data, PHI in
limited data sets may include the following: city, state and ZIP
codes; all elements of dates (such as admission and discharge
dates); and unique codes or identifiers not listed as direct
identifiers. Recognizing that institutions, IRBs and investigators
are frequently faced with applying both the Common Rule and
the HIPAA Privacy Rule, OHRP does not consider a Limited Data

SAFTINet ETL Specifications Document

Page 9

Local Reference Value

National Drug Codes (NDC)


Observation (entity)

Observational Medical
Outcomes Partnership (OMOP)
Organization (entity)

Person (entity)

Primary Care Physician

Procedure Occurrence (entity)

Protected Health Information


(PHI)

Set (as defined under the HIPAA Privacy Rule) to constitute


individually identifiable information under 45 CFR 46.102(f)(2).
The specific value stored used in the partners data to refer to any
given concept. This value will be mapped to a standardized
concept value for translation by ROSITA.
Unique identifiers assigned to individual drugs. NDCs are used
primarily as an inventory code and for prescriptions.
The Observation table contains all general observations that are
tracked as attributes, including source Observation code,
matching standard Concept Code, date of the Observation, type
of Observation, type of result, number/text/Concept code, and
reference range for numeric results. Observation entities are
recorded in the Observation table.
A public-private partnership designed to protect human health by
improving the monitoring of drugs for safety and effectiveness.
The Organization table is the highest level of the partner care
infrastructure hierarchy. Each organization may have multiple
care sites. Providers will work at one or more care sites.
A Person entity is one of the basic dimensions of analysis. It
presents the framework for active drug surveillance. The Person
entity is Concept driven, and its attribute values are stored as
standard Concept codes rather than original (i.e., raw) source
values and is stored in the logical X_Demographic table.
A physician designated as responsible to provide specific care to a
patient, including evaluation and treatment as well as referral to
specialists.
A Procedure Occurrence records individual instances of medical
procedures extracted from source data. Procedures are recorded
in various data sources in different forms with varying levels of
standardization such as CPT-4, ICD-9-CM, and HCPCS procedure
codes. These are stored in the PROCEDURE_OCCURRENCE table.
Protected health information (PHI) under HIPAA includes any
individually identifiable health information. Identifiable refers not
only to data that is explicitly linked to a particular individual
(that's identified information). It also includes health information
with data items which reasonably could be expected to allow
individual identification. De-indentified information is that from
which all potentially identifying information has been removed.

Provider (entity)

The Provider table contains information on local care


providers including type and specialty. Providers are
assigned to an individual care site.

Query

A request for data based on the query specifications sent via a


grid services portal to a specified grid network.
A software package designed to transition SAFTINet data from
the partner XML download to a grid database compatible form.
This package will translate local source codes into OMOP
concepts and will remove PHI other than dates of birth, dates of
service, and zip codes.

ROSITA

SAFTINet ETL Specifications Document

Page 10

RxNorm

A standardized nomenclature for clinical drugs and drug delivery


devices is produced by the National Library of Medicine. In
RxNorm, the name of a clinical drug combines its ingredients,
strengths, and/or form.
RxNorm provides normalized names for clinical drugs and links its
names to many of the drug vocabularies commonly used in
pharmacy management and drug interaction software, including
those of First DataBank, Micromedix, MediSpan, Gold Standard
Alchemy, and Multum. By providing links between these
vocabularies, RxNorm can mediate messages between systems
not using the same software and vocabulary.

Subject

A patient, client or person of interest in the use cases


described whose clinical and demographic data are
contained within the virtual organization(s)

Systematized Nomenclature of
Medicine - Clinical Terms
(SNOMED-CT)

SNOMED-CT is one of a suite of designated standards for use in


U.S. Federal Government systems for the electronic exchange of
clinical health information, and is also a required standard in
interoperability specifications of the U.S. Healthcare Information
Technology Standards Panel. SNOMED-CT is also being
implemented internationally as a standard within other IHTSDO
Member countries.
Technical or special terms used in a business or special subject
area.
Any entity or group of entities (e.g., clinic, network of clinics,
agency or agencies) whose data is represented by a single grid
node and available through grid services for query/response
activities
The Visit Occurrence entity contains the information available in
the source data about person visits to healthcare providers,
including inpatient, outpatient, and ER visits. Visits are recorded
in various data sources in different forms with varying levels of
standardization. The detail level of the classification and
description of the visit differs by data source. Visit Occurrence
entities are recorded in the VISIT_OCCURRENCE table.
A computerized list (as of items of data or words) used for
reference (as for information retrieval or word processing).

Terminology
Virtual organization (aka
Partner)

Visit Occurrence (entity)

Vocabulary

SAFTINet ETL Specifications Document

Page 11

3.0 Assumptions
The design follows the agreed upon general project assumptions:

Electronic Medical Data: EMR is a subset of EHR. This document will reference EHR moving forward
even if specific data source might internally use Electronic Medical Record (EMR) definition.

Financial Information: The CDM model makes use of financial information such as Fees, Payments,
Deductibles, Copayment, etc. from payer source data, such as Medicaid

Plan Detail Information: The model potentially makes use of fields related to Plan or Coverage details
such as Benefit Plan, Plan Indicator, etc. of the administrative information in the claims data. The model
makes use of medical coverage period and eligibility for prescription drugs.

Cleansing and Validation: The selected data fields will be handled (whether loaded directly or as part of a
transformation) with a validation plan which is to be determined later.

Data Privacy: ETL from EHR/CDW will contain clear text direct patient identifiers and dates. ROSITA will
encrypt all clear text direct patient identifiers. A random identifier (called a GUID) that is unrelated to
any patient identifier will be associated with each patient record. Birth dates and dates of service will
remain unchanged. Zip codes will also be forward to the grid node unchanged and as second variable to
only include the 3-digit zip (the leftward 3 digits). The resulting data exported to the grid node will
therefore be a limited data set containing encrypted direct identifiers with unchanged dates and both 5digit and 3-digit zip codes. The grid node will have no access to any clear text direct patient identifiers
from the EHR/CDW.
Under the assumption that payer data will be provided with clear text direct identifiers, ROSITA will
perform record linkage to link the clinical record with the financial record using clear text identifiers. If a
match is made, the same GUID assigned to the clinical data will be assigned to the financial data.
Otherwise, a new GUID will be generated that is unrelated to any patient identifier. Dates will remain
unchanged. The resulting data exported to the grid node will be consistent with a Limited Data Set
containing encrypted direct identifiers, unchanged dates, a 5- and 3-digit zip code, and a GUID random
identifier. The grid node will have no access to any clear text direct patient identifiers from payer (e.g.
Medicaid) data.

Concept Identifiers: Data are represented through standard concept identifiers using a standardized
terminology. During ETL, source data representations (raw data codes) will be translated to standard
concept identifiers through a mapping process. If no standard concept identifier is available, the concept
identifier field will contain 0 as a value.

SAFTINet ETL Specifications Document

Page 12

4.0 Source Data Mapping Approach


This section covers the high-level assumptions and approach to extraction, transformation and loading
(ETL) of raw source data into the Common Data Model (CDM). The assumptions and approach are
defined with a special focus on claims and EHR data. The section covers each of the major tables in the
CDM separately, elaborating the distinct handling required for each.
Unless otherwise specified with Required in field listing, missing attributes will not disqualify data from
being loaded into the Common Data Model. Missing attributes for Concept Identifiers will be populated
with the value zero (0) in the CDM, while the rest of the missing attributes will be populated with NULL.
The Source Field and Applied Rule fields are left blank for the partners to fill in. The source field should
be filled in with the equivalent field in the partners source data. The Applied Rule field should contain
any specialized rules (i.e. filtering, translation, combination of categories etc) that the partner
implements when filling in the field.
In the flowcharts, the colors red, yellow, and green are used in the following manner.
Left Side (ETL View): represents desired source data
Red Field is not brought forward into the grid
Green Field is brought forward into the grid
Right Side (Grid View): represents desired grid-facing data
Red Field is generated by the Rosita application. It is not derived from any ETL data field.
Yellow Field is generated from ETL data, but does not exist as a field in the ETL data.
Green Field is brought forward from ETL data unchanged.
The arrows indicate that the field on the right (in yellow) is generated from the field on the left (green for
those fields brought forward, red otherwise).
The grid facing data model (right side of the flowcharts) closely matches the OMOP v3 data model.
However, the SAFTINet grid model has a few extra fields needed specifically for SAFTINet. All fields
present in SAFTINet but not in the OMOP model use the prefix X_ (i.e. X_Organization_Source).

SAFTINet ETL Specifications Document

Page 13

4.1 Changes to existing tables


Table
Visit_Occurrence

Changed Field
Change
x_visit_occurrence_source_identifier Changed from
visit_occurrence_source_identifier, new
x_ prefix is so the field can pass through
to the grid
Drug_Exposure
x_visit_occurrence_source_identifier Changed from
visit_occurrence_source_identifier, new
x_ prefix is so the field can pass through
to the grid
Condition_Occurrence x_visit_occurrence_source_identifier Changed from
visit_occurrence_source_value, new x_
prefix is so the field can pass through to
the grid
Procedure_Occurrence x_visit_occurrence_source_identifier Changed from
visit_occurrence_source_value, new x_
prefix is so the field can pass through to
the grid
Observation
x_visit_occurrence_source_identifier Changed from
visit_occurrence_source_value, new x_
prefix is so the field can pass through to
the grid

SAFTINet ETL Specifications Document

Page 14

4.2 Table Name: ORGANIZATION


The Organization table is the highest level of the partner care infrastructure hierarchy. Each organization may have multiple care sites.
Providers can work at one or more care sites. Address information submitted with the organization will be used to create a new location
record which will be linked to the organization record via the Location_ID field.
The field mapping is performed as follows:
Destination Field
organization_source
_value

X_data_source_type

Data Type /
Required
String(50) /
Required

place_of_service_source
_value

String(20) /
Required
String (50) /
Required

organization_address_1
organization_address_2
organization_city
organization_state
organization_zip
organization_county

String (50)
String (50)
String (50)
String (2)
String (9)
String (20)

Source Field

SAFTINet ETL Specifications Document

Applied Rule

Comment
Local reference value for organization, used to
create the organization_id field on the grid
facing record. This value will also be used in
other records to refer to the organization.
Data Source Identifier (EHR / CDW / Medicaid)
The type of organization. If the organization
type is not defined in the source data refer to
the place_of_service_type section of the
Concept ID Table. Used to create
place_of_service_concept _id.
First line of the address
Second line of the address
City portion of the address
State portion of the address
Zip code of the address
County portion of the address

Page 15

4.2.1 Example of ORGANIZATION source / destination data

ETL View

Organization Table - XML


organization_source_value1
x_data_source_type
place_of_service_source_value
organization_address_1
organization_address_2
organization_city
organization_state
organization_zip
organization_county

UC Internal Medicine
EHR
Academic Practice
13199 E Montview Blvd
Suite 300, Mail Stop F443
Aurora
CO
80045
Arapahoe

Green Brought forward into grid model / Red Removed in processing

1.

2.
3.

Grid View
Organization Table - Grid
organization_id
organization_source_value
x_data_source_type
place_of_service_concept_id
place_of_service_source_value
location_id
x_gride_node_id

22770494
UC Internal Medicine
EHR
3389
Academic Practice
39458
1

Location Table - Grid


location_id
location_source_value
x_data_source_type
address_1
address_2
city
state
zip
x_ zip_deidentified2
county
x_location_type3
x_grid_node_id

39458
UC Internal Medicine
EHR
13199 E Montview Blvd
Suite 300, Mail Stop F443
Aurora
CO
80045
800
Arapahoe
Organization
1

Green Brought forward from ETL / Yellow Generated from ETL field / Red
Generated locally or from multiple ETL fields

The organization_source_value field will be compared to the current set of locations. If the value does not already occur in the table (new location) a row will
be added to the table and a new ID (location_id) will be generated. Either a newly generated value or a pre-existing value (if the record is found) of the
location table Primary Key will be placed into location_id.
x_zip_deidentified will be generated from organization_zip. This field was created specifically for person locations to support the creation of Safe Harbor
Limited Data Sets.
x_location_type will be derived from the XML record type (Organization in this case)

SAFTINet ETL Specifications Document

Page 16

4.3 Table Name: CARE_SITE


The Care Site table refers to the lower level of the provider care hierarchy. Individual provider care locations will be stored in this table.
The field mapping is performed as follows:
Destination Field
care_site_source_value

Data Type
String (50) /
Required

x_data_source_type

String(20) /
Required
String (50) /
Required

organization_source
_value
place_of_service_source
_value

String (50)

x_care_site_name
care_site_address_1
care_site_address_2
care_site_city
care_site_state
care_site_zip
care_site_county

String(50)
String (50)
String (50)
String (50)
String (2)
String (9)
String (20)

Source Field

SAFTINet ETL Specifications Document

Applied Rule

Comment
Local reference value for care site, used to create
the care_site_id field on the grid facing record. This
value will also be used in other records to refer to
the care site.
Data Source Identifier (EHR / CDW / Medicaid)
Local reference value for organization. This value
will be matched against the organization table to
obtain the corresponding organization_id.
The type of care site. If the care site type is not
defined in the source data refer to the
place_of_service_type section of the Concept ID
Table. Used to create place_of_service_concept _id.
Name of the clinic (care site)
First line of the address
Second line of the address
City portion of the address
State portion of the address
Zip code of the address
County portion of the address

Page 17

4.3.1 Example of CARE SITE source / destination data

ETL View

Care Site Table - XML


care_site_source_value1
x_data_source_type
organization_source_value
place_of_service_source_value
x_care_site_name
care_site_address_1
care_site_address_2
care_site_city
care_site_state
care_site_zip
care_site_county

UC Internal Medicine
EHR
University of Colorado
Internal Medicine
Eastside Clinic
13199 E Montview Blvd
Suite 300, Mail Stop F443
Aurora
CO
80045
Arapahoe

Green Brought forward into grid model / Red Removed in processing

1.

2.
3.

Grid View
Care Site Table - Grid
care_site_id
care_site_source_value
x_data_source_type
location_id
organization_id
place_of_service_concept_id
place_of_service_source_value
x_care_site_name
x_grid_node_id

22770494
UC Internal Medicine
EHR
49382
382392
39458
Internal Medicine
Eastside Clinic
1

Location Table - Grid


location_id
location_source_value
x_data_source_type
address_1
address_2
city
state
zip
x_zip_deidentified2
county
x_location_type3
x_grid_node_id

49382
UPI Building
EHR
13199 E Montview Blvd
Suite 300, Mail Stop F443
Aurora
CO
80045
800
Arapahoe
Care Site
1

Green Brought forward from ETL / Yellow Generated from ETL field / Red
Generated locally or from multiple ETL fields

The care_site_source_value field will be compared to the current set of locations. If the value does not already occur in the table (new location) a row will
be added to the table and a new ID (location_id) will be generated. Either a newly generated value or a pre-existing value (if the record is found) of the
location table Primary Key will be placed into location_id.
x _zip_deidentified will be generated from care_site_zip. This field was created specifically for person locations to support the creation of Safe Harbor
Limited Data Sets
x_location_type will be derived from the XML record type (Care Site in this case)

SAFTINet ETL Specifications Document

Page 18

4.4 Table Name: PROVIDER


The Provider table contains information on local care providers including type and specialty. Providers are assigned to an individual care site.
The field mapping is performed as follows:
Destination Field
provider_source_value

Data Type
String (50) /
Required

x_data_source_type
npi
dea
specialty_source_value

String(20) /
Required
String (50)
String (50)
String (50)

x_provider_first
x_provider_middle
x_provider_last
care_site_source_value

String (75)
String (75)
String (75)
String (50)

x_organization_source
_value

String (50) /
Required

Source Field

SAFTINet ETL Specifications Document

Applied Rule

Comment
Local reference value for provider, used to create
the provider_id field on the grid facing record. This
value will also be used in other records to refer to
the provider.
Data Source Identifier (EHR / CDW / Medicaid)
Provider NPI
Provider DEA Number
Provider type as recorded at the source (e.g.
Physican, NP, MA, etc). If the provider type is not
defined in the source data refer to the Health Care
Provider Specialty section of the Concept ID Table.
Used to create specialty_concept_id
Provider First Name
Provider Middle Name (or initial)
Provider Last Name
Local reference value for Care Site. This value will
be matched against the Care Site table to obtain the
corresponding care_site_id.
Local reference value for Organization. This value
will be matched against the Care Site table to obtain
the corresponding organization_id.

Page 19

4.4.1 Example of PROVIDER source / destination data

ETL View
Provider Table - XML
provider_source_value
x_data_source_type
npi
dea
specialty_source_value
x_provider_first
x_provider_middle
x_provider_last
care_site_source_value
x_organization_source_value

349302
EHR
34930302
49492
General Practitioner
Marcus
W
Welby
UC Internal Medicine
University of Colorado

Green Brought forward into grid model / Red Removed in processing /


Blue Item under discussion

SAFTINet ETL Specifications Document

Grid View
Provider Table - Grid
provider_id
provider_source_value
x_data_source_type
npi
dea
specialty_source_value
specialty_concept_id
x_provider_first
x_provider_middle
x_provider_last
care_site_id
x_organization_id
x_grid_node_id

2399450
349302
EHR
34930302
49492
General Practitioner
20302
Marcus
W
Welby
22770494
3939
1

Green Brought forward from ETL / Yellow Generated from ETL field / Red
Generated locally or from multiple ETL fields / Blue Item under discussion

Page 20

4.5 Table Name: X_Demographic


The X_Demographic table stores information about individual patients, the PHI elements of this record will be stripped out in the
transformation to the grid model. Address information will be limited and used to create a new location record.
The field mapping is performed as follows:
Destination Field
Data Type
person_source_value
String (50) /
Required

x_data_source_type
medicaid_id_number
ssn
last
middle
first
address_1
address_2
city
state
zip
county
year_of_birth

Source Field

String (20) /
Required
String (50)
String (50)
String (75)
String (75)
String (75)
String (50)
String (50)
String (50)
String (2)
String (9)
String (20)

Applied Rule

Comment
Person unique identifier at the source (MRN). Used
to create the person_id field on the grid facing
record. This value will also be used in other records
to refer to the person.
Data Source Identifier (EHR / CDW / Medicaid)
Medicaid ID Number
Social Security Number
Last Name
Middle Name or Initial
First Name
The first line of the person's actual address.
The first line of the person's actual address.
The city portion of the person's actual address.
The state portion of the person's actual address.
Zip code of the person's actual address.
The county portion of the persons address as
recorded at source.
Year of birth

month_of_birth
day_of_birth
gender_source_value

Number(4) /
Required
Number (2)
Number (2)
String (50)

race_source_value

String (50)

Local reference value for race of the person. Used


to create race_concept_id.

ethnicity_source_value

String (50)

Local reference value for ethnicity of the person.


Used to create ethnicity_concept_id.

SAFTINet ETL Specifications Document

Month of birth
Day of birth
Local reference value for gender of the person.
Used to create gender_concept_id

Page 21

provider_source_value

String (50)

care_site_source_value

String (50)

x_organization_source
_value

String (50) /
Required

SAFTINet ETL Specifications Document

Local reference value for patients primary provider


(if any). This value will be matched against the
Provider table to obtain the corresponding
provider_id.
Local reference value for the patients primary Care
Site (if any). This value will be matched against the
Care Site table to obtain the corresponding
care_site_id.
Local reference value for patients organization. This
value will be matched against the Organization table
to obtain the corresponding organization_id.

Page 22

4.5.1 Example of X_Demographic source / destination data

ETL View
X_Demographic Table - XML
person_source_value
x_data_source_type
medicaid_id_number
ssn
last
middle
first
address_1
address_2
city
state
zip
county
year_of_birth
month_of_birth
day_of_birth
gender_source_value
race_source_value
ethnicity_source_value
provider_source_value
care_site_source_value
x_organization_source_value

29201082
EHR
3903432
999-99-9999
Doe
D
John
123 Fake St
Apt 566
Aurora
CO
80045
Arapahoe
1965
2
9
Male
White
Non-Hispanic
35346346
UC Internal Medicine
University of Colorado

Green Brought forward into grid model / Red Removed in processing

SAFTINet ETL Specifications Document

GRID View
Person Table - Grid
person_id
person_source_value2
location_id1
year_of_birth
month_of_birth
day_of_birth
gender_concept_id
gender_source_value
race_concept_id
race_source_value
ethnicity_concept_id
ethnicity_source_value
provider_id3
care_site_id
x_organization_id
x_grid_node_id

22770494
49382
1965
2
9
675
Male
344
White
202
Non-Hispanic
34235556
22770494
382392
1

Location Table - Grid

location_id1
39458
location_source_value
x_data_source_type
EHR
4
address_1
address_24
city
Aurora
state
CO
zip
80045
x_zip_deidentified5
800
county
Arapahoe
6
x_location_type
34344
x_grid_node_id
1
Green Brought forward from ETL / Yellow Generated from ETL field / Red
Generated locally or from multiple ETL fields

Page 23

1.
2.
3.
4.
5.
6.

The location ID value is not linked to a location_source_ value in this case. When the address information is transferred to the location table, the resulting
ID value will be placed in the person record for reference
The grid version of the person table contains a blank field for person_source_value to comply with the OMOP standard. The value for
person_source_value on the ETL side will not be carried forward due to privacy concerns.
The grid facing provider_id will be derived from the ETL field provider_source_value.
When creating the location table the local values for person address will not be passed through to the grid, although they are labeled green because in
other instances, such as Organization and Care Site, they do move forward to the grid facing database
x _zip_deidentified will be generated from zip. This field was created specifically for person locations to support the creation of Safe Harbor
Limited Data Sets
x_location_type will be derived from the XML record type (Person in this case)

SAFTINet ETL Specifications Document

Page 24

4.6 Table Name: VISIT_OCCURRENCE


The Visit Occurrence table contains a record for each patient-provider encounter. The provider, patient and location are all stored as well as
the type of visit.
The field mapping is performed as follows:
Destination Field
x_visit_occurrence
_source_identifier

Data Type
String (50) /
Required

x_data_source_type

String (20) /
Required
String (50) /
Required

person_source_value

visit_start_date
visit_end_date

place_of_service
_source_value

Source Field

DATE/
Required
DATE /
Required
String (50)

x_provider_source_value

String (50)

care_site_source_value

String (50)

SAFTINet ETL Specifications Document

Applied Rule

Comment
Local reference value for visit, used to create the
visit_occurrence_id field on the grid facing record.
Data Source Identifier (EHR / CDW / Medicaid)
Person unique identifier at the source (MRN). This
value will be matched against the Person table to
obtain the corresponding person_id.
The date on which the Visit started
The date on which the Visit ended
Visit type (office visit, med refill, face-to-face,
telephone, med refill etc). If the visit site type is
not defined in the source data refer to the
Visit_Type section of the Concept ID Table. Used to
create place_of_service_concept_id
Local reference value for the provider conducting
the visit. This value will be matched against the
Provider table to obtain the corresponding
provider_id.
Local reference value for the Care Site of the visit.
This value will be matched against the Care Site
table to obtain the corresponding care_site_id.

Page 25

4.6.1 Example of VISIT OCCURRENCE source / destination data

ETL View
Visit Occurrence Table - XML
x_visit_occurrence_source
_identifier
x_data_source_type
person_source_value
visit_start_date
visit_end_date
place_of_service_source_value
x_provider_source_value
care_site_source_value

349302
EHR
2302202
5/23/2011
5/25/2011
Physical
20302340
UC Internal Medicine

Green Brought forward into grid model / Red Removed in processing

SAFTINet ETL Specifications Document

Grid View
Visit Occurrence Table - Grid
visit_occurrence_id
x_data_source_type
person_id
visit_start_date
visit_end_date
place_of_service_concept_id
place_of_service_source_value
x_provider_id
care_site_id
x_grid_node_id

3203402
EHR
30205202
5/23/2011
5/25/2011
302023003
Physical
04594020
202033
1

Green Brought forward from ETL / Yellow Generated from ETL field / Red
Generated locally or from multiple ETL fields

Page 26

4.7 Table Name: DRUG_EXPOSURE


The Drug Occurrence table contains a record for each prescribed medication. The prescriber, patient, and prescription information are all
stored as well as the associated visit and condition.
The field mapping is performed as follows:
Destination Field
drug_exposure_source
_identifier

Data Type
String (50) /
Required

x_data_source_type

String (20) /
Required
String (50) /
Required

person_source_value

drug_source_value

String (50)

drug_source_value
_vocabulary
drug_exposure_start
_date

String (50) /
Required
Date /
Required

drug_exposure_end_date

Date

drug_type_source_value

String (50) /
Required

stop_reason

String (20)

refills

Number(4)

Source Field

SAFTINet ETL Specifications Document

Applied Rule

Comment
Unique Transaction Identifier - Could be an Rx Order
ID, used to create the drug_exposure_id field on the
grid facing record.
Data Source Identifier (EHR / CDW / Medicaid)
Person unique identifier at the source (MRN). This
value will be matched against the Person table to
obtain the corresponding person_id.
Local reference value for drug identifier. The types
of identifiers allowed include National Drug Codes
(NDCs), Generic Product Identifier (GPI) codes. Used
to create the drug_concept_id field on the grid
facing record.
Vocabulary from which the source values are
derived (used for 2-field match to concept ID)
This is the Start Date for the current instance of drug
utilization. Valid indicators include a start date of a
prescription, the date a prescription was filled, or
the date on which a drug administration procedure
was recorded.
This is the End Date for the current instance of drug
utilization. It is not available from all sources
Type of drug exposure (prescription, med history,
fulfillment) as recorded in source data. If the drug
type is not defined in the source data refer to the
Drug Exposure Type section of the Concept ID Table.
Used to create drug_type_concept_id
The reason the medication was stopped, where
available. Reasons include Regimen completed,
Changed, Removed, etc.
The number of refills for the prescription
Page 27

quantity

Number (8,2)

days_supply

Number (4)

x_drug_name

String (255) /
Required
String (50)
String (500)
String (50) /
Required

x_drug_strength
sig
provider_source_value

x_visit_occurrence_
source_identifier

String (50)

relevant_condition
_source_value

String (50)

SAFTINet ETL Specifications Document

The quantity of drug recorded in the corresponding


Drug Exposure Instance
The number of days' supply of the medication
recorded in the corresponding Drug Exposure
Instance.
Drug name taken verbatim from source field
Strength (taken verbatim) (e.g. 20, 1000, 2-4, 1)
Sig (if available)
Local reference value for prescribing/administering
provider (if any). This value will be matched against
the Provider table to obtain the corresponding
provider_id.
Local reference value for the visit where the drug
was prescribed/administered. This value will be
matched against the Visit Occurrence table to obtain
the corresponding visit_occurrence_id.
Associated Diagnosis Source Code. This is the code
for the condition for which the drug was given. This
value is independent and will not be matched
against the Condition Occurrence table.

Page 28

4.7.1 Example of DRUG EXPOSURE source / destination data

ETL View
Drug Exposure Table - ETL
drug_exposure_source_identifier
x_data_source_type
person_source_value
drug_source_value
drug_source_value_vocabulary
drug_exposure_start_date
drug_exposure_end_date
drug_type_source_value
stop_reason
Refills
quantity
days_Supply
x_drug_name
x_drug_strength
sig
provider_source_value
x_visit_occurrence_source
_identifier
relevant_condition_source_value

30003400
EHR
2302202
4594930302
NDC
4/19/2011
5/19/2011
Prescription
Regimen Completed
1
60
30
Amoxicillin
500
239292
3499202
393821

Green Brought forward into grid model / Red Removed in processing

SAFTINet ETL Specifications Document

Grid View
Drug Exposure Table - Grid
drug_exposure_id
x_data_source_type
person_id
drug_concept_id
drug_source_value
drug_exposure_start_date
drug_exposure_end_date
drug_type_concept_id
stop_reason
refills
quantity
days_Supply
x_drug_name
x_drug_strength
sig
prescribing_provider_id
visit_occurrence_id
relevant_condition_concept_id
x_grid_node_id

9947839
EHR
30205202
499506
4594930302
4/19/2011
5/19/2011
983921
Regimen Completed
1
60
30
Amoxicillin
500
3935050
040200
059439333
1

Green Brought forward from ETL / Yellow Generated from ETL field / Red
Generated locally or from multiple ETL fields

Page 29

4.8 Table Name: CONDITION_OCCURRENCE


The Condition Occurrence table contains a record for each patient condition. The codes associated with the conditions as well as the
associated person, provider, and visits/encounters are also recorded.
The field mapping is performed as follows:
Destination Field
condition_occurrence
_source_identifier

Data Type
String (50) /
Required

x_data_source_type

String (20) /
Required
String (50) /
Required

person_source_value

condition_source_value
condition_source_value
_vocabulary
x_condition_source_desc
condition_start_date
x_condition_update_date
condition_end_date

Source Field

String (50) /
Required
String(50) /

Applied Rule

Comment
Source Condition Primary Key; could be a unique
record identifier. Used to create the
condition_occurrence_id field on the grid facing
record.
Data Source Identifier (EHR / CDW / Medicaid)
Person unique identifier at the source (MRN). This
value will be matched against the Person table to
obtain the corresponding person_id.
Local diagnosis code (e.g. ICD-9, SNOMED etc).
Used to create condition_concept_id
Type of code (e.g. ICD-9) used for condition.

Required
String (50)
Date / Required
Date
Date

condition_type_source
_value

String (50) /
Required

stop_reason

String (20)

associated_provider
_source_value

String (50)

SAFTINet ETL Specifications Document

Source Diagnosis Text Description


Onset Date
Date condition was updated/reviewed
Resolved Date Leave blank for unresolved
conditions.
Type of condition as recorded in source data (e.g.
chief complaint, problem list, etc). If the condition
type is not defined in the source data refer to the
Condition_Occurrence section of the Concept ID
Table. Used to create condition_type_concept_id
The reason, if available, that the condition was no
longer recorded, as indicated in the source data.
Valid values include discharged, resolved etc
Provider ID from the source - Provider of record.
This value will be matched against the Provider table
to obtain the corresponding provider_id.

Page 30

x_visit_occurrence
_source_identifier

String (50)

SAFTINet ETL Specifications Document

Local reference value for visit. This value will be


matched against the Visit Occurrence table to obtain
the corresponding Visit Occurrence ID.

Page 31

4.8.1 Example of CONDITION OCCURRENCE source / destination data

ETL View
Condition Occurrence Table - ETL
condition_occurrence_source_identifier
x_data_source_type
person_source_value
condition_source_value
condition_source_value_vocabulary
x_condition_source_desc
condition_start_date
x_condition_update_date
condition_end_date
condition_type_source_value
stop_reason
associated_provider_source_value
x_visit_occurrence_source_identifier

Grid View
30003400
EHR
393030
162.9
ICD9
Malignant Neop
4/19/2011
10/19/2011
Chief Complaint
392904
403030

Green Brought forward into grid model / Red Removed in processing

SAFTINet ETL Specifications Document

Condition Occurrence Table - Grid


condition_occurrence_id
x_data_source_type
person_id
condition_concept_id
condition_source_value
x_condition_source_desc
condition_start_date
x_condition_update_date
condition_end_date
condition_type_concept_id
stop_reason
associated_provider_id
visit_occurrence_id
x_grid_node_id

8349393
EHR
94849303
884934
162.9
Malignant Neop
4/19/2011
10/19/2011
499404
39304
90493023
1

Green Brought forward from ETL / Yellow Generated from ETL field / Red
Generated locally or from multiple ETL fields

Page 32

4.9 Table Name: PROCEDURE_OCCURRENCE


The Procedure Occurrence table contains a record for each procedure. The type of procedure as well as the associated person and visit are
recorded.
The field mapping is performed as follows:
Destination Field
procedure_occurrence_s
ource_identifier

Data Type
String (50)
/Required

x_data_source_type

String (20) /
Required
String (50) /
Required

person_source_value

procedure_source_value

String (50) /
Required

procedure_source_value
_vocabulary
procedure_date

String(50) /
Required
DATE / Required

procedure_type_source
_value

String (50)

provider_record_source
_value

String (50)

x_visit_occurrence
_source_identifier

String (50)

relevant_condition
_source_value

String (50)

Source Field

SAFTINet ETL Specifications Document

Applied Rule

Comment
Source Procedure Primary Key. Used to create the
procedure_occurrence_id field on the grid facing
record.
Data Source Identifier (EHR / CDW / Medicaid)
Person unique identifier at the source (MRN). This
value will be matched against the Person table to
obtain the corresponding person_id.
The Procedure Code as captured from the source
data. Values include CPT-4, ICD-9-CM (Procedure),
HCPCS, and other procedure codes. Used to create
procedure_concept_id.
Type of code (e.g. CPT) used for condition.
The date on which the procedure began (or was
performed)
The procedure type as stored in source. If the
procedure type is not defined in the source data
refer to the Procedure Occurrence section of the
Concept ID Table. Used to create
procedure_type_concept_id.
Local Reference value for Provider. This value will be
matched against the Provider table to obtain the
corresponding provider_id.
Local Reference value for visit. This value will be
matched against the Visit Occurrence table to obtain
the corresponding visit_occurrence_id.
First Associated Diagnosis Code. Used to create
relevant_condition_concept_id.
Page 33

4.9.1 Example of PROCEDURE OCCURRENCE source / destination data

ETL View
Procedure Occurrence Table - ETL
procedure_occurrence_source_identifier
x_data_source_type
person_source_value
procedure_source_value
procedure_source_value_vocabulary
procedure_date
procedure_type_source_value
provider_record_source_value
x_visit_occurrence_source_identifier
relevant_condition_source_value

Grid View
9848493
EHR
594928
49750
CPT
4/19/2011
Inpatient header
23902023
2302320
20230

Green Brought forward into grid model / Red Removed in processing

SAFTINet ETL Specifications Document

Procedure Occurrence Table - Grid


procedure_occurrence_id
x_data_source_type
person_id
procedure_concept_id
procedure_source_value
procedure_date
procedure_type_concept_id
associated_provider_id
visit_occurrence_id
relevant_condition_concept_id
x_grid_node_id

393948230
EHR
3493030
39949023
49750
4/19/2011
884934
34040222
20923042
23032009
1

Green Brought forward from ETL / Yellow Generated from ETL field / Red
Generated locally or from multiple ETL fields

Page 34

4.10 Table Name: OBSERVATION


The Observation table contains records for labs, measurements such as height and weight, etc It is also where information from Past
Medical History, Past Surgical History, Allergy, and Social/Personal History are stored.
The field mapping is performed as follows:
Destination Field
observation_source
_identifier

Data Type
String (50) /
Required

x_data_source_type

String (20) /
Required
String (50) /
Required

person_source_value

observation_source
_value
observation_source
_value_vocabulary
observation_date
observation_time
value_as_number

String (50) /
Required
String(50) /
Required
Date / Required
Time
NUMBER(14,3)

value_as_string

String (60)

unit_source_value

String (50)

range_low

NUMBER(14,3)

Source Field

SAFTINet ETL Specifications Document

Applied Rule

Comment
Source Primary Key for Observation Record. Used to
create the obs_occurrence_id field on the grid facing
record.
Data Source Identifier (EHR / CDW / Medicaid)
Person unique identifier at the source (MRN). This
value will be matched against the Person table to
obtain the corresponding person_id.
The Observation Code as it appears in the source
data. Used to create obs_concept_id
Vocabulary used for the observation
The date of the Observation
The time of the observation
The observation result stored as a numeric value.
This is applicable to observations where the result is
expressed as a numeric value.
The observation result stored as character string. It
is applicable to the observations where the result is
expressed as a character string. Used to create
obs_value_as_concept_id.
Unit of measure for Observation result when
measured as a numeric value. Used to create
unit_concept_id
The lower limit of the numeric range of the
Observation value. It is not applicable if the
observation results are non-numeric or categorical,
and must be in the same units of measure as the
observation value
Page 35

range_high

NUMBER(14,3)

observation_type_source
_value

String (50) /
Required

associated_provider
_source_value

String (50)

x_visit_occurrence_
source_identifier

String (50)

relevant_condition
_source_value
x_obs_comment

String (50)
String (500)

SAFTINet ETL Specifications Document

The upper limit of the numeric range of the


Observation value. It is not applicable if the
observation results are non-numeric or categorical,
and must be in the same units of measure as the
observation value
Type of observation (e.g. PRO, Lab, History of, Social
History, Allergies). If the visit site type is not defined
in the source data refer to the Observation section
of the Concept ID Table. Used to create
observation_type_concept_id
Provider ID from the source. This value will be
matched against the Provider table to obtain the
corresponding provider_id.
Local reference value for visit. This value will be
matched against the Visit Occurrence table to obtain
the corresponding visit_occurrence_id.
First Associated Diagnosis Code. Used to create
relevant_condition_concept_id.
Contains Result Comments do not use this field
for now

Page 36

4.10.1 Example of OBSERVATION source / destination data

ETL View
Observation Table - ETL
observation_source_identifier
x_data_source_type
person_source_value
observation_source_value
observation_source_value_vocabulary
observation_date
observation_time
value_as_number
value_as_string
unit_source_value
range_low
range_high
observation_type_source_value
asociated_provider_source_value
x_visit_occurrence_source_identifier
relevant_condition_source_value
x_obs_comment

40230320
EHR
20202302
BP_Systolic
University Lab
7/12/2011
4:53:00 PM
148
mmHg
50
200
Lab Value
930392
2020200
401.2

Green Brought forward into grid model / Red Removed in processing

SAFTINet ETL Specifications Document

Grid View
Observation Table - Grid
observation_id
x_data_source_type
person_id
observation_concept_id
observation_source_value
observation_date
observation_time
value_as_number
value_as_string
value_as_concept_id
unit_concept_id
unit_source_value
range_low
range_high
observation_type_concept_id
associated_provider_id
visit_occurrence_id
relevant_condition_concept_id
x_obs_comment
x_grid_node_id

23902323
EHR
3903030
102190
8393929
7/12/2011
4:53:00 PM
148

020333
mmHg
50
200
2032002
939393
2002303
302023
1

Green Brought forward from ETL / Yellow Generated from ETL field / Red
Generated locally or from multiple ETL fields

Page 37

Appendix A: Table Specific Rules


Person Table
o

Recordset should consist of all information (including inpatient and outpatient visits) about any patients
with activity (outpatient visits) at a participating primary care site within the past 5 years (back to
1/1/2007 for initial SAFTINet load)

For any patient seen within the past 5 years we request data retrospectively as described below.

SAFTINet ETL Specifications Document

Page 38

Appendix B: Row filters


This section details the types of data that will go into each table. For each table, the rightmost columns lists
the general data domains (e.g. Lab values) along with the specific concepts (e.g. Blood Pressure) within
each domain that should be gathered for the table. When a date is listed with a concept, please gather all
records after that date. For most concepts, this will mean gathering the last 5 years of data (2007-2012),
though some concepts go back further such as colonoscopy and pneumovax.

Organization
Care Site
Provider
Person

One record per grouping of care sites operating under a single health care hierarchy
Include a record for any location where care is provided (examples include clinics, mobile
units and "home-health care"). Multiple separate care-sites in a single building could be
grouped together, or not depending on partner's preference
Include a record for every provider who appears in the "provider" table OR the subset of
the table that can be linked to a claim, a visit, or a prescription, whatever is easiest. If
filtering, include all providers who have been active since 1/1/2007 even if not currently
active.
Include a record for each person who has had some sort of contact with the participating
clinics since 1/1/2007 (regardless of current activity status). This set of persons can be
used to filter the rest of the clinical data - only pull data related to this set of patients.

SAFTINet ETL Specifications Document

Page 39

For the following four tables, we wish to collect the specified record types. Please check the Collected? column for
any record types that will be included in the source data file. Also, please list the local source value for that type.
Example: If the local tag for Systolic BP that will go into the observation_source_value field is SBP, put that in the
local name column where systolic BP is listed.

Record Type

Drug
Exposure

Condition
Occurrence

Observation

Minimum
Date

Result Type

Collected?

Local Name

Include a record for each prescription / fill / drug administration.

Prescription
Medication List
Administered Drugs
Fulfillment
Include a record for each entry on the problem list as well as a record for each encounter level diagnosis code.
Generally, these will be ICD-9 codes.
Problem list
Visit-level diagnosis codes
ICD-9 codes from claims record
Data that do not fit in another table belong here. Observation table contains data from the following categories: lab
observations (i.e. test results), general clinical findings, signs, and symptoms, along with other domains listed below.
Vital Signs
Height
1/1/2007
Height Percentile (for children)
1/1/2007
Weight
1/1/2007
Weight Percentile (for children) 1/1/2007
Pulse oximetry
1/1/2007
Pulse
1/1/2007
Blood Pressure - Systolic
Blood Pressure - Diastolic
Social History
Smoking Status
(Current/Past/Former/Second
Hand Exposure)
Drinking Status

1/1/2007
1/1/2007
All Records /
No Date Limit

All Records /
No Date Limit
Past Medical History (To be defined)
SAFTINet ETL Specifications Document

Page 40

Past Surgical History (To be defined)


Lab Results
Cholesterol
1/1/2007
LDL
1/1/2007
Alanine transaminase
1/1/2007
Albumin
1/1/2007
Alkaline Phosphatase
Aspartate aminotransferase
Bilrubin (Total, Indirect and
Direct)
Blood Urea Nitrogen_Serum
Calcium-Serum
CBC_% lymphocytes
CBC_% Neutrophils
CBC_White Blood Cell Count
Chlamydia trachomatis DNA
assay (procedure)
Chol HDL
Chol_LDL, calculated
Chol_LDL, measured directly
Chol_Total
Creatinine_Serum
Free T4
Glucose, Fasting_Serum
Glucose, Random_Serum
Glucose_Serum
Hemoglobin A1c
Hemoglobin_Serum
Hepatitis B core antibody
Hepatitis B e antibody
Hepatitis B e antigen
Hepatitis B surface antibody
Hepatitis B surface antigen
Hepatitis C antibody
Hepatitis C antigen
INR
Platelet Count
Potassium
SAFTINet ETL Specifications Document

1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
Page 41

Prostate specific antigen


measurement (procedure)
Pulmonary Function Test
Sodium
Triglycerides
TSH
Urinary Protein
Urine microalbumin/creatinine
ratio measurement (procedure)
Urine protein/creatinine ratio
measurement (procedure)
Urine_Microalbuminuria
measurement (procedure)
Urine_Protein measurement
(procedure)
Creatinine_phosphokinase
GFR, estimated
influenza assay
influenza rapid assay (poct)
pertussis test
respiratory syncytial test
FEV1, pre, number
FEV1, pre, percent
FEV1, post, number
FEV1, post, percent
FVC, pre, number
FVC, pre, percent
FVC, post, number
FVC, post, percent
PFT: Peak expiratory flow
Allergies
Family History
Family History of CVD
Patient Reported Outcomes
Medication Adherence Survey
MAS 1a
MAS 1b
MAS 1c
MAS 1d
SAFTINet ETL Specifications Document

1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007

1/1/2007

1/1/2007
1/1/2007
1/1/2007
1/1/2007

Yes/No
Yes/No
Yes/No
Yes/No

Page 42

MAS 1e
MAS 1f
MAS 1f Text
MAS 1g
MAS 1g Text
MAS 2

1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007

MAS Q1
MAS Q2
MAS Q2a

1/1/2007
1/1/2007
1/1/2007

Yes/No
Yes/No
Text
Yes/No
Text
Categorical or
Numeric
Yes/No
Categorical
Yes/No

MAS Q2b
Asthma Control Test
ACT Total Score
ACT Category1
ACT-1
ACT-2
ACT-3
ACT-4
ACT-5
Childhood Asthma Control Test
C-ACT Total Score
C-ACT Category1
C-ACT-1
C-ACT-2
C-ACT-3
C-ACT-4
C-ACT-5
C-ACT-6
C-ACT-7
PHQ-2 Q1 score
PHQ-2 Q2 score
PHQ-2 total score
PHQ-9 FuncQ score
PHQ-9 Q1 score
PHQ-9 Q2 score
PHQ-9 Q3 score
PHQ-9 Q4 score
PHQ-9 Q5 score
PHQ-9 Q6 score

1/1/2007

Yes/No

SAFTINet ETL Specifications Document

1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
Page 43

PHQ-9 Q7 score
PHQ-9 Q8 score
PHQ-9 Q9 score
PHQ-9 Total score
Demographic Information
Highest Education Level
Achieved
Language Preference

Procedure
Occurrence

1/1/2007
1/1/2007
1/1/2007
1/1/2007

All Records /
No Date Limit
All Records /
No Date Limit
Imputed Race / Ethnicity
All Records /
No Date Limit
Person % Fed Poverty level
1/1/2007
Person family size
1/1/2007
Family income
1/1/2007
Person relationship status
1/1/2007
Person Practice Status (active or Most Recent
moved or gone elsewhere)
/ No Date
Limit
Include a record for each procedure performed on a patient (CPT-4, ICD-9-CM (Procedures), and HCPCS codes). If you
want to filter the procedure table, at least include the following procedures
Procedures
Bone mineral density (DEXA
scan)
Colonoscopy
Diabetic Eye Exam
Diabetic Foot Exam
Double contrast barium enema
Mammogram
Pap Smear
Pulmonary Function Test
Spirometry
Mechanical Ventilation
Continuous nebulized therapy
Endotracheal intubation
Critical Care
Fecal occult blood test
Immunizations

SAFTINet ETL Specifications Document

1/1/2007

1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007
1/1/2007

Page 44

Pneumovax
Other Immunizations
Education
Education Nutrition
Education Weight loss
management

1/1/2007
1/1/2007
1/1/2007

1. ACT and C-ACT categories should be one of the following:


1 = ACT in control (Total score > 19)
2 = ACT poorly controlled (Total score 16-19)
3 = ACT very poorly controlled (Total score < 15)

SAFTINet ETL Specifications Document

Page 45

Appendix C: Sending data using flatfiles


Some users may wish to send their data in a standard flatfile as opposed to the current XML. ROSITA is
being modified to handle such files. The basic file should be a .txt style text file with columns arranged in
the order listed in this document. Individual column values should be separated by a pipe |character. A
total of 9 files should be loaded in the initial round, one for each table in Sections 4.2-4.10. The files will be
processed in the same fashion as the current XML files (see ROSITA Admin Guide for further details)

Example: 1 row from a sample Organization file


This record (from Section 4.2):
Organization Table - XML
organization_source_value1
x_data_source_type
place_of_service_source_value
organization_address_1
organization_address_2
organization_city
organization_state
organization_zip
organization_county

UC Internal Medicine
EHR
Academic Practice
13199 E Montview Blvd
Suite 300, Mail Stop F443
Aurora
CO
80045
Arapahoe

Should be represented as follows in the file (the actual text should be all on one line):
UC Internal Medicine|EHR|Academic Practice|13199 E Montview Blvd|Suite 300, Mail Stop
F443|Auora|CO|80045|Arapahoe

Users should apply the following rules when generating flatfiles:


-

Send a separate file for each data table

Files should be named using the following convention [table name].txt

Column values should be separated with the | character used as a delimiter

Files should contain one record per row. No header row is needed, the first row should be actual
data

Quotation marks occurring within column values should be escaped so the processor can locate
them. This should be done with the \ character the end result should look like \

SAFTINet ETL Specifications Document

Page 46

Backslash marks occurring within column values should also be escaped with a second backslash
the end result should look like \\

Datetime values should be in the following format 2012-01-09T12:00:00Z (example: 2012-01-09


4:15:00 PM) and dates should be use the following format YYYY-MM-DD

SAFTINet ETL Specifications Document

Page 47

You might also like