You are on page 1of 8

Paper PO12

Pharmaceutical Programming: From CRFs to Tables, Listings and Graphs, a process


overview with real world examples
Mark Penniston, Omnicare Clinical Research, King of Prussia, PA
Shia Thomas, Omnicare Clinical Research, King of Prussia, PA

ABSTRACT
®
SAS is the de facto standard programming language for statistical analysis in the pharmaceutical industry. The mainstay of
its use is in the generation of tables, listings and graphs based upon the rules and instructions described in the statistical analysis
plan on data stored within SAS datasets usually derived from a clinical data management database. This information is
collected on a case report form (CRF) or an electronic data capture (EDC) system processed through a database for query
resolution with the source documents at the site and sent to the Statisticians and SAS programmers for their analysis.

INTRODUCTION
The purpose of this paper is to provide an overview process of table and listing generation as is it applies in the SAS
pharmaceutical programming arena. It is not presented as the only method for table generation. It is an attempt to show the
fundamental data flow process, from data capture to presentation and the methods SAS is used in such presentation.

Clinical trials have many documents two of which are:


A protocol which describes the purpose of the clinical trial. It will present a hypothesis for the action of a
particular drug, biologic agent or device and describes a test to prove this thinking.
A Case Report Form (CRF) which are a series of forms to be completed at the location of the clinical trial
(typically an investigator’s site) recording information for a particular person in the trial.

For the purpose of this paper assume the protocol is a randomized trial, patients can be enrolled equally into either a
compound called Treatment X or Placebo (a sugar pill) equally. That the hypothesis to be tested is that one can
enroll patients into this trial equally.

Figure 1 presents one particular page in a CRF. The data it is interested in collecting is demographic data or patient
characteristic data. A person enrolled in a clinical trial will have information such as this collected to determine the
homogeneity of the patient or subject population enrolled in the trial. A person at the investigational site will complete
the form on this crf. This data will then be entered into a database to create an electronic version of the paper
information.
A Statistical Analysis Plan (SAP) is a document describing the planned analysis that will be performed on the
electronic CRF data. The following represents some sample SAP text:

The purpose of this study is compare study drug X with placebo in demographic information for baseline testing.
Subjects will be enrolled in a 1:1 ratio in this 2 arm open-label trial to see what baseline effects, if any, occur.
Descriptive statistics will be presented for all parameters collected with no inferential analysis being performed.
Statistics for continuous parameters (age) will be presented by N, mean, median, minimum and maximum values.
Age will be calculated from the difference of the study randomization date and the date of birth. Categorical
parameters (gender, ethnicity) will have groupings presented as counts. All information collected will be listed.

As the SAP text is written it is very common for the statistician to create mock data displays which are tables and
listings demonstrating how the analysis described in the SAP will be presented. The mock describes the layout of the
data in listings and the statistics performed in the table. Figures 2 and 3 demonstrate mock a mock table and listing
based on the crf data to be collected and the sample SAP text previously stated. In pharmaceutical SAS
programming, a listing supporting a table is almost always produced. One listing can support many tables.

Figure 2: Sample Mock Table


Mock Table 1
Demographics
(Intent-to-Treat Population)

Treatment X Placebo Total


(N=n) (N=n) (N=n)

Age[1] (yrs)
n n n n
Mean x.x x.x x.x
Median x.x x.x x.x
Min, Max x.x, x.x x.x, x.x x.x, x.x

Sex
Male n (%) n (%) n (%)
Female n (%) n (%) n (%)

Race
African n (%) n (%) n (%)
Asian n (%) n (%) n (%)
Caucasian n (%) n (%) n (%)
Hispanic n (%) n (%) n (%)
Other n (%) n (%) n (%)

Percentages are based on the total number of subjects in each treatment group.
[1] Based on date of collection.

Figure 3: Sample Listing Mock

Mock Listing 1
Demographics
Intent-to-Treat Subjects

Site/
Subject Date of Age
Treatment Number Birth (yrs) Gender Ethnic Origin

Treatment X 0001/0001 DDMMMYYYY 23 Female Caucasian

Placebo 0002/0064 DDMMMYYY 37 Male Hispanic


Now we have the Protocol, CRF, SAP and the mocks. The next item to consider is the database that the information
captured on the CRF is to be placed into. Using a data entry database package we can obtain our data into a SAS
dataset. When we run a proc contents on this data we find the following variables:

-----Alphabetic List of Variables and Attributes-----

# Variable Type Len Pos Label


----------------------------------------------------------
6 dmaged Num 3 151 Age (Calculated)
4 dmdob Char 8 19 Date of Birth
5 dmdobd Num 8 0
8 dmeth Char 10 33 Ethnicity
9 dmethsp Char 100 43 Ethnicity Specify
7 dmgndr Char 6 27 Gender
3 dminit Char 3 16 Initials
10 dtrt Char 8 143 Treatment Group
1 siteno Char 4 8 Site Number
2 subjid Char 4 12 Subject Identifier

Looking at the dataset through SAS viewer with the label statement turned off see the following information captured:

siteno subjid dminit dmdob dmdobd dmaged dmgndr dmeth dmethsp dtrt
0001 0009 MTW 19530723 51 Male Caucasian placebo
0001 0025 SST 19810828 23 Female Asian x
0003 0023 SIN 19590521 45 Male Asian
0004 0047 NAP 19650312 39 Male Caucasian x
0004 0057 QAA 19841218 20 Female Other Angloindian
0006 0008 TSC 19721001 32 Female Asian x
0008 0040 ECN 19571109 47 Female African placebo
0012 0003 SAV 19580527 46 Male Other American Indian placebo
0021 0065 TTM 19480531 56 Male Hispanic placebo
0033 0005 ADC 19100210 94 Female Hispanic x

Many times the CRF will be annotated with the SAS variable names to aid programming. The next series of steps a
programmer can take are the annotation of the mock tables and listings with the SAS variables to be used to present
each part of the data to be presented. Mock annotation provides a the following benefits:
It provides other people the information on what variables are being presented
It provides the programmer a tool to state what derived (calculated) variables will need to be presented
It records a plan of action to be taken before any SAS code is written
Figures 4 and 5 represent the annotated mocks for the study.

Figure 4: Annotated Mock Table


Mock Table 1
Demographics
(Intent-to-Treat Population) DERIVED.itt=1
DERIVED
DERIVED.trt_d
Treatment X Placebo Total
(N=n) (N=n) (N=n)

Age[1] (yrs) dmaged


n n n n
Mean x.x x.x x.x
Median x.x x.x x.x
Min, Max x.x, x.x x.x, x.x x.x, x.x

Sex sex_d
Male n (%) n (%) n (%)
Female n (%) n (%) n (%)

Race ethn_d
African n (%) n (%) n (%)
Asian n (%) n (%) n (%)
Caucasian n (%) n (%) n (%)
Hispanic n (%) n (%) n (%)
Other n (%) n (%) n (%)

Percentages are based on the total number of subjects in each treatment group.
[1] Based on date of collection.

Figure 5: Annotated Mock Listing

Mock Listing 1
Demographics
Intent-to-Treat Subjects DERIVED.itt=1

Site/
Subject Date of Age
Treatment trt_d Number Birth (yrs) Gender Ethnic Origin
sitesubj dob_d dmaged dmgndr dmeth

Treatment X 0001/0001 DDMMMYYYY 23 Female Caucasian

Placebo 0002/0064 DDMMMYYY 37 Male Hispanic


Collectively we now have the following:
A protocol
A CRF
A database with data
A Statistical Analysis Plan (SAP) with mocks
Annotated mocks

With this information, programming can now begin. It is important to try to obtain (or create) as many of the
documents while programming. This gives the programmer all the information needed to generate the tables and
listings correctly the first time. The pharmaceutical industry is a regulated industry. As such, a programmer should
always be able to describe the methodology and documentation for generating summarized information.

One approach for programmers to use is to store their calculated fields in a dataset prior to table and listing
generation. These datasets are called derived (as derived from raw) and allow others to see the calculation prior their
display on the output files (tables and listings). It is easier to store an age calculation in a dataset than to duplicate it
in the programs producing the tables and listings. The following program creates a derived dataset called DERIVED.

*******************************************;
* Title: Derived Dataset for Presentation
* Program: derived.sas
* Author: Shia Thomas
* Date: September 30, 2004
********************************************;

*Creating the derived dataset from the raw dataset.;

data data.derived;
set data.testdemo;

*Creating the intent to treat population.;

if dtrt='x' or dtrt='placebo' then itt=1;


else itt=0;

*Creating a variable for concatenating site number and subject number.;

length sitesubj $10;


sitesubj = trim(left(siteno))||'/'||trim(left(subjid));

*Creating the derived variable for the treatments.;

if dtrt='x' then trt_d=1;


else if dtrt='placebo' then trt_d=2;
else trt_d=.;

*Creating the derived variable for sex.;

if dmgndr='Male' then sex_d=1;


else if dmgndr='Female' then sex_d=2;
else sex_d=.;

*Creating the intent to treat male population.;

if sex_d=1 and itt=1 then mitt=1;


else mitt=0;

*Creating the derived variables for race.;

if dmeth='African' then ethn_d=1;


else if dmeth='Asian' then ethn_d=2;
else if dmeth='Caucasian' then ethn_d=3;
else if dmeth='Hispanic' then ethn_d=4;
else ethn_d=5;

*Formatting the date variable.;

format dob_d date9.;


dob_d=input(dmdob, yymmdd8.);
run;

Proc contents and SAS viewer display of the derived dataset based on the mock annotations and the previously
described SAS program.

----Alphabetic List of Variables and Attributes-----

# Variable Type Len Pos Format Label


---------------------------------------------------------------------------------
6 dmaged Num 3 209 Age (Calculated)
4 dmdob Char 8 67 Date of Birth
5 dmdobd Num 8 0
8 dmeth Char 10 81 Ethnicity
9 dmethsp Char 100 91 Ethnicity Specify
7 dmgndr Char 6 75 Gender
3 dminit Char 3 64 Initials
17 dob_d Num 8 48 DATE9. Date of Birth
10 dtrt Char 8 191 Treatment Group
16 ethn_d Num 8 40 Ethnicity
11 itt Num 8 8 Intent to Treat Population
15 mitt Num 8 32 Male Intent to Treat Population
14 sex_d Num 8 24 Gender
1 siteno Char 4 56 Site Number
12 sitesubj Char 10 199 Site and Subject Number
2 subjid Char 4 60 Subject Identifier
13 trt_d Num 8 16 Treatment Group

siteno subjid dminit dmdob dmdobd dmaged dmgndr dmeth dmethsp dtrt itt trt_d sex_d mitt ethn_d dob_d
0001 0009 MTW 19530723 51 Male Caucasian placebo 1 2 1 1 3 7/23/1953
0001 0025 SST 19810828 23 Female Asian x 1 1 2 0 2 8/28/1981
0003 0023 SIN 19590521 45 Male Asian 0 1 0 2 5/21/1959
0004 0047 NAP 19650312 39 Male Caucasian x 1 1 1 1 3 3/12/1965
0004 0057 QAA 19841218 20 Female Other Angloindian 0 2 0 5 12/18/1984
0006 0008 TSC 19721001 32 Female Asian x 1 1 2 0 2 10/1/1972
0008 0040 ECN 19571109 47 Female African placebo 1 2 2 0 1 11/9/1957
0012 0003 SAV 19580527 46 Male Other American placebo 1 2 1 1 5 5/27/1958
0021 0065 TTM 19480531 56 Male Hispanic placebo 1 2 1 1 4 5/31/1948
0033 0005 ADC 19100210 94 Female Hispanic x 1 1 2 0 4 2/10/1910
From the derived dataset one can now write code to produce the table and listing. The following shows the final
output from these programs. The output can be created through many of SAS’s procedures or through a data null
statement.

Figure 6: Table Output as programmed in SAS


Table 1
Demographics
(Intent-to-Treat Population)

Treatment X Placebo Total


(N=4) (N=4) (N=8)

Age[1] (yrs)
n 4 4 8
Mean 47.0 50.0 48.5
Median 35.5 49.0 46.5
Min, Max 23, 94 46, 56 23, 94

Sex
Male 1 (25%) 3 (75%) 4 (50%)
Female 3 (75%) 1 (25%) 4 (50%)

Race
African - 1 (25%) 1 (12.5%)
Asian 2 (50%) - 2 (25.0%)
Caucasian 1 (25%) 1 (25%) 2 (25.0%)
Hispanic 1 (25%) 1 (25%) 2 (25.0%)
Other - 1 (25%) 1 (12.5%)

Percentages are based on the total number of subjects in each treatment group.
[1] Based on date of collection.

Figure 7: Listing Output as programmed in SAS

Listing 1
Demographics
Intent-to-Treat Subjects

Site/
Subject Date of Age
Treatment Number Birth (yrs) Gender Ethnic Origin

Treatment X 0001/0025 28AUG1981 23 Female Asian

0004/0047 12MAR1965 39 Male Caucasian

0006/0008 01OCT1972 32 Female Asian

0033/0005 10FEB1910 94 Female Hispanic

Placebo 0001/0009 23JUL1953 51 Male Caucasian

0008/0040 09NOV1957 47 Female African

0012/0003 27MAY1958 46 Male Other: American Indian

0021/0065 31MAY1948 56 Male Hispanic


CONCLUSION

SAS programming of tables and listings in the pharmaceutical industry is a stepwise process, always dependent on
previous documents and descriptions of what is to be produced. Many companies have various different processes
and documents in addition to those described in this paper. It is important to understand those processes that are
specific to a given company. In general the flow of rules and data can be described as in the figure 8, each step
dependent on the previous one. When the steps are not followed, there is the potential for mistakes.

Protocol and CRF

SAP/Mocks
Database
Annotated CRF

Annotated Mocks
Derived Datasets
Programming Rules

Tables and Listings

CONTACT INFORMATION (HEADER 1)


(In case a reader wants to get in touch with you, please put your contact information at the end of the paper.)
Your comments and questions are valued and encouraged. Contact the author at:
Mark Penniston
Shia Thomas
Omnicare Clinical Research
630 Allendale Road
King of Prussia, PA 19406
Work Phone: 484 679 2436
Fax: 484 679 2509
Email: mark.penniston@omnicarecr.com
shia.thomas@omnicarecr.com
Web:www.omnicarecr.com

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.

Other brand and product names are trademarks of their respective companies.

You might also like