Professional Documents
Culture Documents
Unit 5
SAS for Data Description
Welcome!
purposes, including (1) monitoring and tracking of a study cohort, (2) informing project planning,
and (3) cohort description. Data summaries are also useful because they provide clues to data
Data summarization might take the form of a listing of the data, the reporting of averages (and
The better summaries are those that are self-explanatory. They are well labeled (have titles and
variable values identified) and are straightforward to understand. It’s also helpful to accompany the
summarization with documentation of the data source (name and version of data set) and the name
The SAS procedures discussed in this reading, Unit 5 week 1 (week 11 of course), are PRINT,
MEANS, SUMMARY, UNIVARIATE, TABLUATE, FREQ, FORMS, REPORT, CHART, and PLOT.
Week 11 Page 1 of 58
Week 11 SAS Procedures to Summarize Data
In the Unit 5 week 2 (week 12 of course) reading, the procedures discussed are CHART and
PLOT. The Unit 5 week 2 (week 12 of course) reading also includes a brief introduction to using the
SAS ANALYST for producing graphics with the SASGRAPH module. These are higher quality
graphics than the printer character charts and plots produced with PROCs CHART and PLOT.
TIP - Most procedures produce results that appear in the output window. Along with directly
producing output, many of the procedures to be discussed can produce new SAS data sets (output
data), which in turn can be used in other procedures, such as PRINT or TABULATE. In this way, it
is possible to have more control over the format in which results are printed. Output data sets can
also be modified in subsequent DATA steps, to add labels or formats before printing.
TIP – Be sure to see descriptions of these procedures in the SAS Procedures Guide. Learn to use
the SAS manuals - there are many options to use with all procedures. These course notes
Week 11 Page 2 of 58
Week 11 SAS Procedures to Summarize Data
1. to be competent in using SAS procedures to “write out” data values in a manner that is
easy to read;
2. to appreciate the utility of “writing out” data as a preliminary to data quality assessment;
4. to be (at least a little) competent in using SAS procedures to “write out” data as part of the
production of forms (admittedly MS ACCESS might be a better tool in this regard);
Week 11 Page 3 of 58
Week 11 SAS Procedures to Summarize Data
PROC PRINT is a simple way to get a listing of data in a SAS data set.
• We have already used PROC PRINT to get a listing of data (see Unit 4 week 1)
(1) all of the variable values for all of the observations in a SAS data set,
(2) some of the variable values for all of the observations in a SAS data set, or
(3) some of the variable values for some of the observations in a SAS data set.
• Selected options for PROC PRINT are illustrated in the examples that follow.
Examples – The examples below use data on two neurological assessment scales used in a
cardiopulmonary bypass study. Data on pre-op, post-op and follow-up scores are printed in
different ways to illustrate some of the options available for printing in SAS.
Week 11 Page 4 of 58
Week 11 SAS Procedures to Summarize Data
Example
• The data in this example are arranged with one record per subject. Included in this record are
pre, post, and follow-up scores for each of two assessment scales.
• Scores are printed for all three periods, for the two scales.
• When the keyword LABEL is included in the PROC PRINT statement, the variable labels are
• TIP - It is also possible to assign new labels in the PROC PRINT procedure.
• TIP - A split character can be used when creating variable labels. This character is used to
split the labels into two or more lines for printing. To do this:
- Write SPLIT=’ ’, where the split character (which can be a space) is enclosed
• In this example, new labels using the split character * were defined for all variables. You can
see the advantage of using a split: the column width for printing would be determined by the
• When the ID statement is used, no observation number is printed. Instead, the variable named
after ID appears in the leftmost column, before the variables in the VAR statement.
• IF NO VAR statement is written, ALL of the variables in the dataset are printed, and they are
Week 11 Page 5 of 58
Week 11 SAS Procedures to Summarize Data
Example -
*__________________________________________________;
** print neurologic summary scores for pre, post, **;
** & follow-up using label and ID options **;
**************************************************************
* Use ‘*’ to define split character **;
* print only first 10 observations **;
PROC PRINT DATA=MNSCORE(OBS=10) SPLIT='*';
** define the variable to put in first column in place of obs number **;
ID PATID;
Week 11 Page 6 of 58
Week 11 SAS Procedures to Summarize Data
• To print with a BY statement, the data must first be sorted by the BY variable(s). For
example, if you wanted to print data that is sorted by LNAME (e.g. LNAME is the variable
name for “last name”), the print instruction must be preceded by a sort instruction. This
• When a BY variable is used for printing, the data are grouped under a header line the gives
• The second example uses the same data, but in this case it is arranged with multiple records
per subject, for pre, post, and follow-up status. Here the data are printed in two ways, first
• Tip - When the same variable is named in both an ID statement and a BY statement, the
grouping variable is listed in the first column, and not repeated for subsequent observations.
• Each by-group is separated by a blank line; this makes clear the separation of groups.
Week 11 Page 7 of 58
Week 11 SAS Procedures to Summarize Data
• Here, labels have been created with the split character as part of the PROC PRINT
• Tip - Do include a split character in variable labels; it makes your output easier to read!.
• Tip – And while you’re at it, nclude the variable name within the label. For example: LABEL
PATID = ‘PATID:*Patient*ID’; This also enhances readability since it produces the variable
Week 11 Page 8 of 58
Week 11 SAS Procedures to Summarize Data
*************************************************************;
** print neurologic summary scores for pre, post, & **;
** follow-up using label and ID options and BY statements **;
*************************************************************;
Week 11 Page 9 of 58
Week 11 SAS Procedures to Summarize Data
Output follows. Note that no observation number is listed, due to the option NOOBS on the PROC
PRINT statement.
PRE-OP 99 100
POST-OP 87 85
FOLLOW-UP 98 90
Week 11 Page 10 of 58
Week 11 SAS Procedures to Summarize Data
Recall - When the same variable is named in both an ID statement and a BY statement, the
grouping variable is listed in the first column, and not repeated for subsequent observations.
Example – The following example uses BY statement and ID statement together, and groups the
Week 11 Page 11 of 58
Week 11 SAS Procedures to Summarize Data
POST-OP 24 97 85
28 87 85
60 100 100
65 98 90
74 97 95
FOLLOW-UP 24 97 85
28 98 90
60 100 100
65 100 100
74 100 100
Note the variable listed in both the BY and ID statements is listed to the left, and written only once
There are several more options to control printing, including line spacing (e.g., double or single),
• See the PRINT procedure in the SAS Procedures Guide or use the online HELP.
Week 11 Page 12 of 58
Week 11 SAS Procedures to Summarize Data
The WHERE statement instructs SAS to perform its task on a selected set of observations.
• WHERE is used to select which observations will be used in the procedure being performed.
• For example, if we are interested in listing only data for the pre-operative assessment, this
can be done by creating a subset data file – using a data step to take a subset of data with
PSTATUS=1, and then using this new data set in PROC PRINT.
• Alternatively, PROC PRINT can be used with a WHERE statement, with the condition:
WHERE PSTATUS=1;
• Note - In the following example, notice the difference in observation numbers that appear on
Week 11 Page 13 of 58
Week 11 SAS Procedures to Summarize Data
Note that the observation numbers differ. In this case, the data file MNS2 had been previously
sorted by patient id and patient status, so that the WHERE statement selected every third
Week 11 Page 14 of 58
Week 11 SAS Procedures to Summarize Data
There are four procedures that provide basic descriptive statistics for continuous variables.
• These are MEANS, SUMMARY, UNIVARIATE, and TABULATE. The SAS Procedures Guide
• The procedures differ in the choice of statistics that can be produced. They also differ in the
formatting of results.
• MEANS, SUMMARY and UNIVARIATE can be used to create output datasets containing
summary statistics that can be used in other procedures, such as PROCs PRINT and
REPORT, which give many options for controlling the formatting of the data.
• PROCs MEANS and SUMMARY can be used to compute means, minimums, quantiles,
maximums, standard deviations and standard errors, range, number of missing values,
• CLASS and/or BY statements can be used to compute the statistics separately for subgroups
of observations.
• For example, when computing statistics on subject AGE, using the statement BY SEX; would
provide separate statistics on AGE for males and females, for a data set previously sorted
• CLASS statements produce separate statistics for subgroups, along with overall statistics for
the whole group. For example using the statement CLASS SEX; would produce statistics
on AGE for all subjects, as well as for males and females separately.
• Note - Use of a CLASS statement does not require a prior sort of the data.
Week 11 Page 15 of 58
Week 11 SAS Procedures to Summarize Data
The primary difference between PROC MEANS and PROC SUMMARY is in the defaults for
printing.
• PROC MEANS, by default, provides results in the output window, although an output data set
• PROC SUMMARY, by default, produces only an output data set, although results can be
The primary difference between CLASS and BY statements is that the format for printing is
different.
Each of the procedures, MEAN and SUMMARY produce a default set of statistics; however, you
• This gives you control of the order the statistics appear in on the output, when the statistics
are printed.
• “How to” - Statistics are requested on the PROC statement, before the first semi-colon.
• See the SAS Procedures Manual or the online documentation for details of available
statistics.
• This example uses produces summary statistics on the neurologic assessment scales using
• Creation of an output data set, subsequently printed with PROC PRINT is also illustrated.
Week 11 Page 16 of 58
Week 11 SAS Procedures to Summarize Data
• TIP - Use the MAXDEC option to control decimal places printed in the output! This is
illustrated in the example. If you don’t use the MAXDEC option, the default is 8 places after
the decimal point – and no one should have to look at that much nonsense.
*************************************************************;
** get means of neurologic scores by patient status **;
** do this with PROC MEANS and SUMMARY to show options **;
*************************************************************;
Week 11 Page 17 of 58
Week 11 SAS Procedures to Summarize Data
RUN;
N Obs Variable Label N Minimum Maximum Mean Std Dev Std Error
------------------------------------------------------------------------------------
32 MATTOTAL MATHEW TOTAL SCORE 32 95.00 100.00 99.03 1.37 0.24
NTOTAL NEUROLOGICAL TOTAL 32 90.00 100.00 98.43 3.22 0.57
------------------------------------------------------------------------------------
N Obs Variable Label N Minimum Maximum Mean Std Dev Std Error
------------------------------------------------------------------------------------
31 MATTOTAL MATHEW TOTAL SCORE 31 73.00 100.00 96.48 5.73 1.03
NTOTAL NEUROLOGICAL TOTAL 31 80.00 100.00 94.67 6.31 1.13
------------------------------------------------------------------------------------
N Obs Variable Label N Minimum Maximum Mean Std Dev Std Error
------------------------------------------------------------------------------------
28 MATTOTAL MATHEW TOTAL SCORE 28 94.00 100.00 98.39 1.39 0.26
NTOTAL NEUROLOGICAL TOTAL 28 85.00 100.00 96.42 4.48 0.84
-----------------------------------------------------------------------------------
Week 11 Page 18 of 58
Week 11 SAS Procedures to Summarize Data
************************************************************;
** repeat, using PROC SUMMARY with a class statement **;
************************************************************;
* name input data set, & print results *;
* to output window *;
PROC SUMMARY DATA=MNS2 PRINT MAXDEC=2
N MEAN STD STDERR MIN MAX ; /* name statistics*/
SUMMARY STATISTICS
FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES
USING SUMMARY WITH A CLASS STATEMENT
PSTATUS _TYPE_ _FREQ_ MEANMAT MEANN STDMAT STDN SE_MAT SE_N MINMAT MINN MAXMAT MAXN
Week 11 Page 19 of 58
Week 11 SAS Procedures to Summarize Data
• MEANS produces results to the output window, by default, and this is shown, along with a
• SUMMARY produces overall statistics for all observations, in addition to subgroup statistics,
• In this example, the overall statistics are not particularly meaningful, so the advantages of
• Explanation of the _TYPE_ Variable - The _TYPE_ variable is a SAS produced variable. It
is produced by both MEANS and SUMMARY when an output dataset is requested. _TYPE_
indicates the level of breakdown. 0 indicates overall statistics; 1 indicates that 1 level of sub-
grouping is used, etc. When several variables are used in a CLASS statement, such as
CLASS SEX PSTATUS; then statistics will be produced for females pre-op/ females post-
op/ females follow-up which would have _TYPE_=2, for a 2 level breakdown. Statistics for
all females and for all males would be produced with _TYPE_=1, as well as for all pre-op,
• Again, many options are available for controlling the ways in which variable groups and
subgroups are defined, which makes these procedures very powerful for summarizing data.
Week 11 Page 20 of 58
Week 11 SAS Procedures to Summarize Data
PROC UNIVARIATE also produces descriptive statistics for continuous numeric variables –along
with greater detail (including selected graphical descriptions) on the distribution of the variables.
• UNIVARIATE can be used to produce percentiles, such as the 10th and 90th percentiles
(or any other percentile you desire). There are several options for computing percentiles.
• Tests for normality of the distribution of the data are also available, along with a normal
• The five smallest and five largest values can also be identified by an ID variable – which is
useful when identifying cases with outliers. The number and percent missing values for a
• UNIVARIATE can also be used with a BY statement, for previously sorted data, to produce
plots; these allow you to compare visually the distribution of groups on a variable of interest.
• WARNING !!! PROC UNIVARIATE can take a lot of time, and produce tons of pages of
output. This is especially true if you are not careful in defining your variable list, or you use
• WARNING or TIP ??? (you decide) PROC UNIVARIATE will produce statistics for the
group (or groups) defined by missing values for the BY variable. This is because missing
• Along with producing output, PROC UNIVARIATE can be used to produce an output data
set. You can specify any set of percentile values to be included in the output data set can
Week 11 Page 21 of 58
Week 11 SAS Procedures to Summarize Data
Example - The example that follows uses data from a study of peri-operative beta blocker use in
surgical patients.
Week 11 Page 22 of 58
Week 11 SAS Procedures to Summarize Data
Moments
Week 11 Page 23 of 58
Week 11 SAS Procedures to Summarize Data
Quantiles (Definition 5)
Quantile Estimate
100% Max 93
99% 92
95% 85
90% 79
75% Q3 72
50% Median 62
25% Q1 47
10% 40
5% 35
1% 25
0% Min 21
Week 11 Page 24 of 58
Week 11 SAS Procedures to Summarize Data
Extreme Observations
----------Lowest--------- ---------Highest---------
Value COUNTER Obs Value COUNTER Obs
21 1 1 87 18 18
25 84 84 87 27 27
25 8 8 88 102 102
27 109 109 92 65 65
31 75 75 93 9 9
Week 11 Page 25 of 58
Week 11 SAS Procedures to Summarize Data
When a BY statement is used, the same output is generated separately for each group defined by
the BY variable.
• Note - The separate statistics for the 2 gender groups are not shown in the output that
Week 11 Page 26 of 58
Week 11 SAS Procedures to Summarize Data
PROC TABULATE procedure is initially confusing to use but bear with it; it is a powerful tool for
producing nicely formatted tables of descriptive statistics for groups and subgroups of classification
variables.
• In particular, PROC TABULATE can be used to produce formatted tables that can be
• Moreover, while all of the statistics available in TABULATE (plus more) can be produced in
• Counts and percentages for categorical variables can also be reported using PROC
TABULATE.
• TIP - PROC TABULATE can also be used to print tables of results from other
There is a higher start up learning time for using PROC TABULATE than the other procedures that
• The reason is – once you’ve survived the learning time, you are later spared the task of
copying numbers, or repetitive cut and paste from crudely formatted output into tables for a
report. This is especially true for summary reports that are produced at regular intervals
throughout a study.
Week 11 Page 27 of 58
Week 11 SAS Procedures to Summarize Data
PROC TABULATE requires the specification of CLASS (categorical) variables used to form groups
and subgroups, and continuous numeric analysis variables (identified on a VAR statement) for
• A TABLE statement is used to define the rows, columns, and pages of a TABLE, along with
• LABEL and FORMAT statements can be used to provide more descriptive information for
• KEYLABEL can be used to provide more descriptive row and column titles for the statistics
requested. For example in the place of N as a column heading for the number of
observations, a KEYLABEL statement would allow you to use the phrase "NO. OF OBS".
separate manual devoted entirely to PROC TABULATE. Description of the options is also
Following are a few examples of PROC TABULATE. They illustrate some of the different ways of
Example -
• The data for this example come from a study of functional status outcome six months post-
post-operatively.
Week 11 Page 28 of 58
Week 11 SAS Procedures to Summarize Data
cardiac catheterization patients with respect to their change over time in functional status.
Functional status was assessed using 2 scores: physical functioning and mental
functioning.
• In the tables that follow, summary statistics are printed for change in physical and mental
function scores. These summary statistics are reported for (1) diabetic patients, (2) non-
diabetic patients, (3) groups defined by age group, and (4) the entire study cohort.
• The examples that follow serve to illustrate the control in table formatting that is available.
TIP - While it is possible to name and create several tables in a single PROC TABULATE
procedure (before the RUN; statement), it is recommended that you request separate TABLE
• Only one set of titles can be specified for a procedure -- to produce new titles for a new table,
• Table rows - statement factors listed before a comma (,) define the table rows.
• Table columns – statement factors listed after the comma (,) define the table columns.
Week 11 Page 29 of 58
Week 11 SAS Procedures to Summarize Data
• How to format the printing of a statistic – To accomplish formatting, the statistic name is
(1) For example, MEAN*F=8.2 requests that the mean be printed, using 8 columns
(including 1 for the decimal place) with 2 of the 8 after the decimal place (i.e.,
#####.##).
(2) Alternatively, a single format can be defined for printing all statistics, by using
the phrase FORMAT=8.2 on the PROC TABULATE statement (see final example).
• The keyword ALL is used to get overall statistics in addition to subgroup statistics (e.g., DIAB
Week 11 Page 30 of 58
Week 11 SAS Procedures to Summarize Data
Week 11 Page 31 of 58
Week 11 SAS Procedures to Summarize Data
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ†
‚ ‚# of ‚ ‚ ‚ ‚ ‚
‚ ‚ OBS ‚ Mean ‚ Std ‚ Min ‚ Max ‚
‡ƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚MF2_1: ‚DIAB ‚AGEGRO-‚ ‚ ‚ ‚ ‚ ‚
‚Mental ‡ƒƒƒƒƒƒƒ‰UP ‚ ‚ ‚ ‚ ‚ ‚
‚Functi-‚Nondia-‡ƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚
‚on ‚betic ‚<65 ‚ 345‚ -0.2‚ 11.48‚ -52‚ 32‚
‚Change ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚ ‚>=65 ‚ 287‚ 0.5‚ 9.90‚ -30‚ 28‚
‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚ ‚TOTAL ‚ 632‚ 0.1‚ 10.79‚ -52‚ 32‚
‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Diabet-‚AGEGRO-‚ ‚ ‚ ‚ ‚ ‚
‚ ‚ic ‚UP ‚ ‚ ‚ ‚ ‚ ‚
‚ ‚ ‡ƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚
‚ ‚ ‚<65 ‚ 79‚ -0.2‚ 10.17‚ -26‚ 20‚
‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚ ‚>=65 ‚ 98‚ 0.6‚ 8.85‚ -18‚ 21‚
‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚ ‚TOTAL ‚ 177‚ 0.2‚ 9.45‚ -26‚ 21‚
‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚TOTAL ‚AGEGRO-‚ ‚ ‚ ‚ ‚ ‚
‚ ‚ ‚UP ‚ ‚ ‚ ‚ ‚ ‚
‚ ‚ ‡ƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚
‚ ‚ ‚<65 ‚ 424‚ -0.2‚ 11.23‚ -52‚ 32‚
‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚ ‚>=65 ‚ 385‚ 0.6‚ 9.64‚ -30‚ 28‚
‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚ ‚TOTAL ‚ 809‚ 0.1‚ 10.50‚ -52‚ 32‚
Šƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒ‹ƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒŒ
Week 11 Page 32 of 58
Week 11 SAS Procedures to Summarize Data
KEYLABEL ALL=TOTAL
N='# of OBS';
Week 11 Page 33 of 58
Week 11 SAS Procedures to Summarize Data
SUMMARY STATISTICS FOR CHANGE SCORES
Example 2: statistics in rows and Subgroups in Columns
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚ ‚ DIAB ‚ ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚
‚ ‚ Nondiabetic ‚ Diabetic ‚ TOTAL ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ‰
‚ ‚ AGEGROUP ‚ ‚ AGEGROUP ‚ ‚ AGEGROUP ‚ ‚
‚ ‡ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ‰ ‡ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ‰ ‡ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ‰ ‚
‚ ‚ <65 ‚ >=65 ‚ TOTAL ‚ <65 ‚ >=65 ‚ TOTAL ‚ <65 ‚ >=65 ‚ TOTAL ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚PF2_1: Physical ‚# of OBS ‚ 345‚ 287‚ 632‚ 79‚ 98‚ 177‚ 424‚ 385‚ 809‚
‚Function Change ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Mean ‚ 5.7‚ 2.2‚ 4.1‚ 1.6‚ 0.7‚ 1.1‚ 4.9‚ 1.8‚ 3.4‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Std ‚ 13.03‚ 11.57‚ 12.50‚ 12.16‚ 11.85‚ 11.96‚ 12.95‚ 11.64‚ 12.44‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Min ‚ -29‚ -28‚ -29‚ -28‚ -28‚ -28‚ -29‚ -28‚ -29‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Max ‚ 41‚ 36‚ 41‚ 37‚ 36‚ 37‚ 41‚ 36‚ 41‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚MF2_1: Mental ‚# of OBS ‚ 345‚ 287‚ 632‚ 79‚ 98‚ 177‚ 424‚ 385‚ 809‚
‚Function Change ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Mean ‚ -0.2‚ 0.5‚ 0.1‚ -0.2‚ 0.6‚ 0.2‚ -0.2‚ 0.6‚ 0.1‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Std ‚ 11.48‚ 9.90‚ 10.79‚ 10.17‚ 8.85‚ 9.45‚ 11.23‚ 9.64‚ 10.50‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Min ‚ -52‚ -30‚ -52‚ -26‚ -18‚ -26‚ -52‚ -30‚ -52‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Max ‚ 32‚ 28‚ 32‚ 20‚ 21‚ 21‚ 32‚ 28‚ 32‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒŒ
Week 11 Page 34 of 58
Week 11 SAS Procedures to Summarize Data
KEYLABEL ALL=TOTAL
N='# of OBS';
RUN;
Week 11 Page 35 of 58
Week 11 SAS Procedures to Summarize Data
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚ ‚ AGEGROUP ‚ ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚
‚ ‚ <65 ‚ >=65 ‚ TOTAL ‚
‚ ‡ƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒˆƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒˆƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ‰
‚ ‚# of ‚ ‚ ‚ ‚ ‚# of ‚ ‚ ‚ ‚ ‚# of ‚ ‚ ‚ ‚ ‚
‚ ‚ OBS ‚ Mean ‚ Std ‚ Min ‚ Max ‚ OBS ‚ Mean ‚ Std ‚ Min ‚ Max ‚ OBS ‚ Mean ‚ Std ‚ Min ‚ Max ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚PF2_1: Physical ‚DIAB ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
‚Function Change ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
‚ ‚Nondiabetic ‚ 345‚ 5.7‚ 13.03‚ -29‚ 41‚ 287‚ 2.2‚ 11.57‚ -28‚ 36‚ 632‚ 4.1‚ 12.50‚ -29‚ 41‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Diabetic ‚ 79‚ 1.6‚ 12.16‚ -28‚ 37‚ 98‚ 0.7‚ 11.85‚ -28‚ 36‚ 177‚ 1.1‚ 11.96‚ -28‚ 37‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚TOTAL ‚ 424‚ 4.9‚ 12.95‚ -29‚ 41‚ 385‚ 1.8‚ 11.64‚ -28‚ 36‚ 809‚ 3.4‚ 12.44‚ -29‚ 41‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚MF2_1: Mental ‚DIAB ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
‚Function Change ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
‚ ‚Nondiabetic ‚ 345‚ -0.2‚ 11.48‚ -52‚ 32‚ 287‚ 0.5‚ 9.90‚ -30‚ 28‚ 632‚ 0.1‚ 10.79‚ -52‚ 32‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Diabetic ‚ 79‚ -0.2‚ 10.17‚ -26‚ 20‚ 98‚ 0.6‚ 8.85‚ -18‚ 21‚ 177‚ 0.2‚ 9.45‚ -26‚ 21‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚TOTAL ‚ 424‚ -0.2‚ 11.23‚ -52‚ 32‚ 385‚ 0.6‚ 9.64‚ -30‚ 28‚ 809‚ 0.1‚ 10.50‚ -52‚ 32‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒŒ
Week 11 Page 36 of 58
Week 11 SAS Procedures to Summarize Data
Additional options in TABULATE let you define if and where you want lines in the tables.
• For example you may choose not to have vertical or horizontal separators. This is
useful for producing tables for publication – some journals ask that tables be
The next example illustrates reporting counts (N) and percentages (PCTN) using PROC
TABULATE.
• In this example a single format is defined for the whole table, on the PROC line, rather
than a different format for each statistic. This works here, because counts will always
be whole numbers, and I’m content to report percents rounded to a whole percent.
• How to get column percents - Column percents are defined by listing the ROW
• How to get row percents - To get row percents, the Column variable (DIAB in this
case) would be listed (e.g., PCTN<DIAB ALL>) instead of the row variables.
• The KEYLABEL statement is used to replace the word ‘ALL’ on the output with the
word ‘TOTAL’.
Week 11 Page 37 of 58
Week 11 SAS Procedures to Summarize Data
Week 11 Page 38 of 58
Week 11 Procedures to Summarize Data
PROC FREQ produces frequency table summaries of the distributions of discrete numeric or character
variables.
• Counts and percentages are produced for each group defined by variable values, or by the crossing of
variable values.
• Chi-square tests and other measures of association can also be produced by PROC FREQ (these will
• PROC FREQ can also be used to produce an output data set in addition to, or in place of results in the
output window.
• To produce one-way tables, simply list the variable(s) on the TABLES statement separated by
spaces.
• To get cross-tabulations, name the variables separated by an asterisk (i.e., VAR1 * VAR2).
To define rows - The first named variable (var1) defines the rows.
To define columns - The second named variable (var2) defines the columns.
• You can list any combination of individual variables, cross-tabulations and multi-way tables on a single
TABLES statement. Multiple TABLES statements can be given in the same procedure.
• TIP - Only one set of titles may be given for all the tables requested in a PROC FREQ Thus, titles
may be the limiting factor in the number of tables requested in a single procedure.
Week 11 Page 39 of 58
Week 11 Procedures to Summarize Data
There are shorthand methods for listing variables in the TABLES statement.
• If a set of variables has the same prefix with sequential numbering, name the first and last variable,
TABLES score1-score10;
would produce frequency tables for all ten variables, score1 through score10.
• For variables without a common prefix, list the first and last in order as they appear in a data set
(use PROC CONTENTS POSITION to know correct order), separated by a double hyphen, no
spaces. All the variables listed in this manner must be of the same type, character or numeric. For
TABLES age--insur;
would produce tables for age, insur, and all variables in order between them in the data set, as long
• For example if you want cross-tabulations of sex by a whole set of variables use
• WARNING!! Be careful requesting a group of variables crossed by another group – listing (var1--varn)
* (varlist1--varlistn) – you may end up with a lot more tables than you bargained for!
Week 11 Page 40 of 58
Week 11 Procedures to Summarize Data
• For multi-way tables, (i.e., A*B*C), the first variable defines the table, the second the rows, and the third
would produce two cross-tabulations, one table for males and one for females, of age group against
• An alternative way to achieve the result would be to request tables of AGEGR * HSMOKE, using a
separate BY SEX statement. This requires that the data set be previously sorted by sex, and would
produce different statistics, should you be doing analyses as well as summary tables (e.g., different
chi-square tests).
Missing values are NOT included in frequency tables unless the options MISSPRINT or MISSING are
used.
• Options are listed in the TABLES statement and appear after a slash (/).
Week 11 Page 41 of 58
Week 11 Procedures to Summarize Data
• This example uses data from a study of patients transferred from midwifery care during the course of a
pregnancy. One reason patients left midwifery care was early loss. If a woman suffered an early loss,
more detail about the loss was requested. Thus, for other women this question was not applicable.
• Two questions of interest in this study might be: 1) What proportion of all study women had a
therapeutic abortion? 2) What proportion of early pregnancy losses were therapeutic abortions?
• Recall that options are listed on a TABLES statement following a slash; the slash appears after the
listing of variables.
• The default leaves missing values out of the table altogether. The MISSPRINT option includes
these in the table, so that the difference between those for whom the question is not applicable, and
• Note that the WHERE statement is used in this example; this is done so that the PROC FREQ
Week 11 Page 42 of 58
Week 11 Procedures to Summarize Data
***********************************************************;
* example to illustrate different missing options in FREQ *;
* 1st define formats, including formats for missing codes *;
***********************************************************;
PROC FORMAT;
VALUE EARLYFMT 1='TAB'
2='SAB <11 WEEKS'
3='SAB 12-19 WEEKS'
4='SAB 20-24 WEEKS'
.='MISSING'
.N='NOT APPLICABLE';
RUN;
** with and without missprint and missing options **;
PROC FREQ DATA=CNMT;
FORMAT EARLYLOS EARLYFMT.;
TABLES EARLYLOS;
TITLE1 'CNM TRANSFER STUDY';
TITLE2 ‘DEFAULT OPTION FOR MISSING VALUES’;
RUN;
Week 11 Page 43 of 58
Week 11 Procedures to Summarize Data
Week 11 Page 44 of 58
Week 11 Procedures to Summarize Data
• In the first two tables the computed percentages are identical, but use of the MISSPRINT option
distinguishes the not applicable (did not have an early loss) from those with missing information.
From these we can see, for example that of the patients with an early loss of known type, 19.6
• The third table uses the MISSING option, which includes missing values in computation of percentages.
From this table we can see that of all the transferred patients, 3.2 percent had therapeutic
abortions.
• The final table uses the WHERE statement as well as the MISSING option, to create a table only on
patients with applicable data – in this case only those with early loss. Since 6 patients had early loss
of unknown type or age, these can now be included in the table to show that among the patients with
Week 11 Page 45 of 58
Week 11 Procedures to Summarize Data
an early loss, 9.7% were of unknown type/age, and that among all patients with early loss, 17.7%
• TIP - The availability of different options is useful, depending on what you want to know.
• NOCUM suppresses printing of cumulative frequencies and percentages. This is especially appropriate
for nominal data, inasmuch as cumulative frequencies and cumulative percentages don’t make much
sense.
• NOFREQ suppresses printing of cell counts, NOROW, NOCOL, and NOPERCENT suppress row,
• The first example produces two tables, trimester by age group, and trimester by earlylos.
• The second TABLES statement produces separate tables of age group by earlylos for each level of
trimester. Note that there is no table for the third trimester, since by definition no one in the third
Week 11 Page 46 of 58
Week 11 Procedures to Summarize Data
.N='NOT APPLICABLE';
RUN;
** cross-tabulations **;
PROC FREQ DATA=CNMT;
FORMAT AGEGR AGEFMT. EARLYLOS EARLYFMT.;
Week 11 Page 47 of 58
Week 11 Procedures to Summarize Data
Week 11 Page 48 of 58
Week 11 Procedures to Summarize Data
Frequency Missing = 15
Frequency Missing = 77
Week 11 Page 49 of 58
Week 11 Procedures to Summarize Data
Proc FORMS can be used to print mailing labels, file cards – any printer forms that have a regular pattern,
• Beware! It may take some trial and error (with settings) to obtain the exact look (e.g. spacing)
that you are after. In some instances, it may be easier to export your data from SAS to ACCESS,
and use the special features that allow you to specify the mailing label format name, so spacing is
• There are options within the procedure to define the form dimensions, spacing, indentation, number of
units per page, and more. An example for printing mailing labels is shown.
• An alternative approach to producing “forms” is to use a PUT statement in a DATA _NULL_; step .
When you are printing combinations of text and variable values, this can sometimes be easier to use.
The PROC FORMS statement names the data file to use, followed by a series of options to control page and
line size, number of forms to print down and across a page, number of lines to skip between forms (in this
• LINE statements give a line number, followed by the variable names to be printed on the line. Options
for printing follow the slash (/). This example uses the LASTNAME feature to reorder a name given
as last, first. The option puts the text after a comma first, followed by the text before the comma, and
the comma is not printed. Alternatively, the first and last names could have been read in as separate
variables.
• The option PACK removes extra spaces that would appear, if a character variable doesn’t use all of the
available character variable length – as shown in the second example, without the options.
• Note that the statement TITLE1; must be used to remove all titles.
Week 11 Page 50 of 58
Week 11 Procedures to Summarize Data
* without options *;
proc forms data=mail;
line 1 name ;
line 2 addr1 ;
line 3 addr2 zip;
run;
Week 11 Page 51 of 58
Week 11 Procedures to Summarize Data
List of subjects
Brenda K. Abbott
568 Trillion Ct.,
Denver, CO 80237
Juan Rodriquez
619 Powell Dr.,
Charleston, SC 29412
Mary K. Stevenson
22 Meredith Blvd.,
Austin, TX 78702
Patrick E. Hawks
Rt. 1, Box 523,
Taylorsville, NC 28681
Chen Lee
123 Maple St.
Raleigh, NC 27606
Joseph M. Weinstein
Rt. 4, Box 466,
Dixon, IL 61021
Bonnie G. Baskowshi
P.O. Box 42,
Sacramento, CA 85841
Week 11 Page 52 of 58
Week 11 Procedures to Summarize Data
Week 11 Page 53 of 58
Week 11 Procedures to Summarize Data
PROC REPORT is another PROC that is worth the time and effort required to learn.
• When used well, reports can be generated directly as SAS output that require little or no further editing
before presentation.
• PROC REPORT encompasses many of the features of Procs PRINT, MEANS and TABULATE.
• Its features allow the presentation of detail (individual observations) and summary data, incorporated
• The flexibility in reporting using PROC REPORT is a little better than that for PROC TABULATE.
However, both save later “cut and paste” work to create a project document.
• TIP – Consider using this procedure when you need to generate regular, multiple status reports. PROC
REPORT is handy inasmuch as, once a report has been designed, and the programming statements
• PROC REPORT is also nice for having features that allow comprehensive control over fonts,
background and text colors (or output text or html files) as well as the controls it provides over
spacing, page breaks and other formatting tools. PROC FORMAT also has features akin to a PUT
statement (LINE in Proc Report); this allows you to insert your own text in the report.
Week 11 Page 54 of 58
Week 11 Procedures to Summarize Data
When using PROC REPORT, plan in advance the layout of your report.
• A report’s layout is largely determined by the designation of variables into various categories.
• For each variable, a DEFINE statement is used to designate the display category for the report:
DISPLAY – A row appears for every observation for variables with this designation. By default,
all variables are considered DISPLAY variables, unless another designation is given.
ORDER – A row appears for every observation for variables with this designation, as a display
variable, but this designation will order by value.
ACROSS – Variables with this designation determine columns for the report – one for each
distinct value of the variable present in the input data. ACROSS variables are comparable to
CLASS variables in other procedures, but used to define columns in PROC REPORT.
GROUP – This designation groups on variable values to determine rows in the report, akin to
using a CLASS variable in other procedures.
ANALYSIS – Numeric variables that are used for computation of summary statistics, for each
cell of a report produced by ACROSS by GROUP designations.
Example
• This example uses data from the study of change in functional status, 6 months post-cardiac
catheterization among diabetic and non-diabetic patients. This study was described in the
status.
Week 11 Page 55 of 58
Week 11 Procedures to Summarize Data
* example part 1 *;
* use NOWD to suppress windows, headskip to skip line after header row*;
PROC REPORT DATA=FD2 NOWD HEADSKIP;
Week 11 Page 56 of 58
Week 11 Procedures to Summarize Data
PF PF PF
DIABETIC AGE CHANGE CHANGE CHANGE
STATUS GROUP SEX N MEAN MIN MAX
Week 11 Page 57 of 58
Week 11 Procedures to Summarize Data
STD
PF ERROR
DIABETIC AGE CHANGE PF P-VALUE:
STATUS GROUP SEX N MEAN CHANGE MEAN=0
Week 11 Page 58 of 58