You are on page 1of 76

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

STATISTICAL ANALYSIS WITH STATA


Stat 325 - Statistical Computing

First Semester, SY 2010-2011

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

CONTENTS of the Course

Introduction to STATA

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

CONTENTS of the Course

1 2

Introduction to STATA Getting Stated With STATA

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

CONTENTS of the Course

1 2 3

Introduction to STATA Getting Stated With STATA Data Management

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

CONTENTS of the Course

1 2 3 4

Introduction to STATA Getting Stated With STATA Data Management Elementary Data Analysis

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

CONTENTS of the Course

1 2 3 4 5

Introduction to STATA Getting Stated With STATA Data Management Elementary Data Analysis Multiple Regression

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

CONTENTS of the Course

1 2 3 4 5 6

Introduction to STATA Getting Stated With STATA Data Management Elementary Data Analysis Multiple Regression Curve Fitting and Logistic Regression

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

CONTENTS of the Course

1 2 3 4 5 6 7

Introduction to STATA Getting Stated With STATA Data Management Elementary Data Analysis Multiple Regression Curve Fitting and Logistic Regression Multivariate Methods

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

CONTENTS of the Course

1 2 3 4 5 6 7 8

Introduction to STATA Getting Stated With STATA Data Management Elementary Data Analysis Multiple Regression Curve Fitting and Logistic Regression Multivariate Methods Time Series Analysis

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

1.1 Features, Capabilities and Limitations of STATA

The software STATA is a statistical analysis package for standard and specialized estimation procedures. STATA is actually very easy and intuitive to use once you get used to its jargons and formats. The syntax of the commands can be readily learned, perhaps more easily than other statistical packages syntax. STATA has both a command driven interface and a graphical user interface (GUI), that is, point and click features. With its command driven interface, actions are set o by commands which can be issued either in interactive or batch mode.

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Interactive Mode

With the interactive mode, users can issue commands line by line and yield results after each command line is issued. Commands are also retrieved with the Page-Upand Page-Down keys or retrieved from the list of previous commands. With the batch mode, users can run a set of commands and have the results all saved into a le or shown on the screen. Whether one runs commands via interactive or batch mode, documentation of the data management and analysis is easy to do in contrast to working with spreadsheets.

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

1.2 STATA Versions

STATA is oered in three versions: Small

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

1.2 STATA Versions

STATA is oered in three versions: Small Intercooled Stata

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

1.2 STATA Versions

STATA is oered in three versions: Small Intercooled Stata Special Edition (SE)

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

1.2 STATA Versions

STATA is oered in three versions: Small Intercooled Stata Special Edition (SE) SMALL Version SMALL is student version, limited in the number of variables - 99 variables only, and 1000 observations or cases, but otherwise complete in functionality.

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

STATA Versions

Intercooled Version The Intercooled Version is the standard version. It supports up to 2,047 variables in a data set, with the number of observations limited by available RAM ( technically, as large as 2.147 billion), as the entire data set is held in memory. Intercooled allows matrices up to 800 rows or columns.

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

STATA Versions

Intercooled Version The Intercooled Version is the standard version. It supports up to 2,047 variables in a data set, with the number of observations limited by available RAM ( technically, as large as 2.147 billion), as the entire data set is held in memory. Intercooled allows matrices up to 800 rows or columns. Special Edition The Special Edition or SE version arose during the life of version 7.0 in response to users needs for analyzing much larger data sets. Thus, SE allows signicantly more variables in the data set (32,767) and supports larger matrices (up to 11,000 rows or columns).

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

1.3 Advantages of STATA over other Softwares

STATA is designed for research. STATA provides vivid exchange of ideas and experiences among its users largely among users from academics, research and government institutions, as well as international organizations while other statistical software such as SPSS increasingly targets the business world.

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

1.3 Advantages of STATA over other Softwares

STATA is designed for research. STATA provides vivid exchange of ideas and experiences among its users largely among users from academics, research and government institutions, as well as international organizations while other statistical software such as SPSS increasingly targets the business world. It can incorporate a sample survey design into the estimation process

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

1.3 Advantages of STATA over other Softwares

STATA is designed for research. STATA provides vivid exchange of ideas and experiences among its users largely among users from academics, research and government institutions, as well as international organizations while other statistical software such as SPSS increasingly targets the business world. It can incorporate a sample survey design into the estimation process It has a very powerful set of facilities for handling panel/longitudinal data, for survival analysis and for working with time series data.

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Advantages of STATA ... continued...


STATA can be run on any 386 or higher PC with at least 8MB of RAM and a math co-processor.

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Advantages of STATA ... continued...


STATA can be run on any 386 or higher PC with at least 8MB of RAM and a math co-processor. STATA has cross platform compatibility. A dataset, graph or add-on program created using STATA on one operating system, may be read without translation by STATA running on a dierent platform. If you change your platform, all your STATA data, commands and graphs will work on your new platform without the need of a data set translator.

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Advantages of STATA ... continued...


STATA can be run on any 386 or higher PC with at least 8MB of RAM and a math co-processor. STATA has cross platform compatibility. A dataset, graph or add-on program created using STATA on one operating system, may be read without translation by STATA running on a dierent platform. If you change your platform, all your STATA data, commands and graphs will work on your new platform without the need of a data set translator. STATA includes comprehensive graphics procedures designed to summarize results of the analyses and to assist in diagnostic checking in the use of standard statistical models. Results can be readily copied across to other programs, such as word processors, spreadsheets and presentation software.

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Advantages of STATA ... continued...


STATA can be run on any 386 or higher PC with at least 8MB of RAM and a math co-processor. STATA has cross platform compatibility. A dataset, graph or add-on program created using STATA on one operating system, may be read without translation by STATA running on a dierent platform. If you change your platform, all your STATA data, commands and graphs will work on your new platform without the need of a data set translator. STATA includes comprehensive graphics procedures designed to summarize results of the analyses and to assist in diagnostic checking in the use of standard statistical models. Results can be readily copied across to other programs, such as word processors, spreadsheets and presentation software. Data can be comprehensively managed with STATAs versatile commands. Data can be readily imported from or exported to ASCII formats, as well as spreadsheet formats.
DLPolestico STAT 325 STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.1 Getting Started With STATA

To start a STATA session, perform either of the following three procedures: double click the STATA icon in your desktop.

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.1 Getting Started With STATA

To start a STATA session, perform either of the following three procedures: double click the STATA icon in your desktop. nd the Windows START button at bottom left corner of the screen; click on this button. Then click on PROGRAMS STATA STATASE (Intercooled or Small)

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.1 Getting Started With STATA

To start a STATA session, perform either of the following three procedures: double click the STATA icon in your desktop. nd the Windows START button at bottom left corner of the screen; click on this button. Then click on PROGRAMS STATA STATASE (Intercooled or Small) Use the Windows explorer to navigate across the STATA version and nd the button of the executable le eg wsestata. Double click on this button.

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.2 Four Main STATA Windows


Variables Window - displays the variable list of the data set in the active memory; found in the lower left hand corner which

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.2 Four Main STATA Windows


Variables Window - displays the variable list of the data set in the active memory; found in the lower left hand corner which Review Window - provides a list of past commands; found in the upper left hand corner

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.2 Four Main STATA Windows


Variables Window - displays the variable list of the data set in the active memory; found in the lower left hand corner which Review Window - provides a list of past commands; found in the upper left hand corner Results Window - is the Log Window; it displays your outputs; found in the upper left hand corner

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.2 Four Main STATA Windows


Variables Window - displays the variable list of the data set in the active memory; found in the lower left hand corner which Review Window - provides a list of past commands; found in the upper left hand corner Results Window - is the Log Window; it displays your outputs; found in the upper left hand corner Command Window - where the commands are typed or entered; found in the lower middle area
DLPolestico STAT 325 STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Some notes to remember!!!


A variable name in the Variables Window can be pasted to the Command line Window or an active dialog box eld by just clicking on the variable name.

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Some notes to remember!!!


A variable name in the Variables Window can be pasted to the Command line Window or an active dialog box eld by just clicking on the variable name. Clicking a command in the Review Window, gets it pasted to the Command window where you can edit or execute the edited command

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Some notes to remember!!!


A variable name in the Variables Window can be pasted to the Command line Window or an active dialog box eld by just clicking on the variable name. Clicking a command in the Review Window, gets it pasted to the Command window where you can edit or execute the edited command You may also scroll through past commands in the Review window using the PgUp and PgDn keys

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Some notes to remember!!!


A variable name in the Variables Window can be pasted to the Command line Window or an active dialog box eld by just clicking on the variable name. Clicking a command in the Review Window, gets it pasted to the Command window where you can edit or execute the edited command You may also scroll through past commands in the Review window using the PgUp and PgDn keys STATA is case sensitive, it is important to take note of when lower and upper case characters are used

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Some notes to remember!!!


A variable name in the Variables Window can be pasted to the Command line Window or an active dialog box eld by just clicking on the variable name. Clicking a command in the Review Window, gets it pasted to the Command window where you can edit or execute the edited command You may also scroll through past commands in the Review window using the PgUp and PgDn keys STATA is case sensitive, it is important to take note of when lower and upper case characters are used The STATA computing environment has a menu and a toolbar at the top used to perform STATA operations, and a directory status bar at the bottom that shows the current directory

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Some notes to remember!!!


A variable name in the Variables Window can be pasted to the Command line Window or an active dialog box eld by just clicking on the variable name. Clicking a command in the Review Window, gets it pasted to the Command window where you can edit or execute the edited command You may also scroll through past commands in the Review window using the PgUp and PgDn keys STATA is case sensitive, it is important to take note of when lower and upper case characters are used The STATA computing environment has a menu and a toolbar at the top used to perform STATA operations, and a directory status bar at the bottom that shows the current directory STATA yields a viewer window when online help is requested from the menu bar
DLPolestico STAT 325 STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.3 Menu and Toolbars The menu and toolbar can be used to issue dierent STATA commands, although most of the time it is more convenient to use the STATA Command window to perform those tasks.

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.3 Menu and Toolbars The menu and toolbar can be used to issue dierent STATA commands, although most of the time it is more convenient to use the STATA Command window to perform those tasks. GUI The Graphical User Interface allows users to directly perform data management using the Data menu, obtain some data analysis with the Statistics menu, or generate all sorts of graphs on the data set with the Graphics menu, with its point and click features.

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.4 Exiting from a STATA session

To leave a STATA session, you can do any of the following:


1

Click on FILE then EXIT

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.4 Exiting from a STATA session

To leave a STATA session, you can do any of the following:


1 2

Click on FILE then EXIT Press ALT and F4 keys simultaneously

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.4 Exiting from a STATA session

To leave a STATA session, you can do any of the following:


1 2 3

Click on FILE then EXIT Press ALT and F4 keys simultaneously Click on the Close (X) button on the uppermost right hand corner of the screen, or

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.4 Exiting from a STATA session

To leave a STATA session, you can do any of the following:


1 2 3

Click on FILE then EXIT Press ALT and F4 keys simultaneously Click on the Close (X) button on the uppermost right hand corner of the screen, or Type exit on the Command line window

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.5 Obtaining HELP


If you know a STATA command and wish to obtain information on the syntax, then you can type on the Command window Help syntax: help stata command name e.g. help regress or alternatively using the menu, click on Help Stata Command. However, if information is needed on a particular STATA command, then the Search command is used instead and we write as syntax search stata Command name e.g. search regress or alternatively using the menu choose in the following Help - Search
DLPolestico STAT 325 STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.5 Obtaining HELP


If you know a STATA command and wish to obtain information on the syntax, then you can type on the Command window Help syntax: help stata command name e.g. help regress or alternatively using the menu, click on Help Stata Command. However, if information is needed on a particular STATA command, then the Search command is used instead and we write as syntax search stata Command name e.g. search regress or alternatively using the menu choose in the following Help - Search
DLPolestico STAT 325 STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.5 Obtaining HELP


If you know a STATA command and wish to obtain information on the syntax, then you can type on the Command window Help syntax: help stata command name e.g. help regress or alternatively using the menu, click on Help Stata Command. However, if information is needed on a particular STATA command, then the Search command is used instead and we write as syntax search stata Command name e.g. search regress or alternatively using the menu choose in the following Help - Search
DLPolestico STAT 325 STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.5 Obtaining HELP


If you know a STATA command and wish to obtain information on the syntax, then you can type on the Command window Help syntax: help stata command name e.g. help regress or alternatively using the menu, click on Help Stata Command. However, if information is needed on a particular STATA command, then the Search command is used instead and we write as syntax search stata Command name e.g. search regress or alternatively using the menu choose in the following Help - Search
DLPolestico STAT 325 STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Options

CONTENTS Option The Contents option can be used by beginners unfamiliar with STATA commands while the Search option can be used by users who know the name of the command or topic they wish to search.

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.6 Reading a Pre-Existing STATA Dataset


STATA Datasets

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.6 Reading a Pre-Existing STATA Dataset


STATA Datasets Data les that can be read into STATA are worksheets, ASCIII les, STATA datasets.

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.6 Reading a Pre-Existing STATA Dataset


STATA Datasets Data les that can be read into STATA are worksheets, ASCIII les, STATA datasets. STATA datasets have extension name .dta e.g. cars.dta

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.6 Reading a Pre-Existing STATA Dataset


STATA Datasets Data les that can be read into STATA are worksheets, ASCIII les, STATA datasets. STATA datasets have extension name .dta e.g. cars.dta By default les are to be read by STATA from the folder data in directory c. Thus, if one wishes to open a pre-existing STATA dataset in a dierent directory, STATA must be told to change directory using the command cd

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.6 Reading a Pre-Existing STATA Dataset


STATA Datasets Data les that can be read into STATA are worksheets, ASCIII les, STATA datasets. STATA datasets have extension name .dta e.g. cars.dta By default les are to be read by STATA from the folder data in directory c. Thus, if one wishes to open a pre-existing STATA dataset in a dierent directory, STATA must be told to change directory using the command cd The use command tells STATA to read and employ the specied le. The data name extension need not be written. e.g use then specify the directory where the le is located followed by the le name. Or alternatively from the menu by clicking File Open

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

STATA datasets continued...


useful commands...

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

STATA datasets continued...


useful commands... The sysuse is used to retrieve example datasets installed with STATA. The syntax is given as follows: sysuse []lename[] [, clear] e.g. sysuse auto, clear

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

STATA datasets continued...


useful commands... The sysuse is used to retrieve example datasets installed with STATA. The syntax is given as follows: sysuse []lename[] [, clear] e.g. sysuse auto, clear

The command sysuse dir [, all] list example datasets installed with STATA

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

STATA datasets continued...


useful commands... The sysuse is used to retrieve example datasets installed with STATA. The syntax is given as follows: sysuse []lename[] [, clear] e.g. sysuse auto, clear

The command sysuse dir [, all] list example datasets installed with STATA The command set memory allows the user to increase or decrease the amount of memory allocated to STATA by the operating system while STATA is running. The syntax is given by set memory number[bkmg] [, permanently] permanently species that in addition to making the change right now, the memory setting be remembered and become the default setting when you invoke STATA.
DLPolestico STAT 325 STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.7 Examining a STATA Dataset

There are a number of tasks that can be done when examining STATA dataset in active memory:
1

determining the number of observations

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.7 Examining a STATA Dataset

There are a number of tasks that can be done when examining STATA dataset in active memory:
1 2

determining the number of observations obtaining a brief description of the data

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.7 Examining a STATA Dataset

There are a number of tasks that can be done when examining STATA dataset in active memory:
1 2 3

determining the number of observations obtaining a brief description of the data perform some simple data analysis including obtaining summary statistics, listing some data, obtaining tables,and crosstabulations, etc

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

2.7 Examining a STATA Dataset

There are a number of tasks that can be done when examining STATA dataset in active memory:
1 2 3

determining the number of observations obtaining a brief description of the data perform some simple data analysis including obtaining summary statistics, listing some data, obtaining tables,and crosstabulations, etc perform some basic data management such sorting, dropping some variables, or sampling from the database

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Useful Commands

describe The describe command results to a brief description of the dataset in active memory. The syntax is describe [ varlist ] [, memory options] where memory options could be simple, short, detail, fullnames, and numbers. Alternatively using the menu or header bar, Data Describe Data Describe Variables in Memory .

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

count
count The count command is used to tally the number of observations in the dataset. It can be used with if-conditions to restrict the output to subsets of the dataset. The syntax is given by count [if] [in] by is also permitted, however before using the by option, the variable to be used should be sorted rst. For example in the auto data, count if rep78 > 4 by foreign: count if rep78 > 4

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

count
count The count command is used to tally the number of observations in the dataset. It can be used with if-conditions to restrict the output to subsets of the dataset. The syntax is given by count [if] [in] by is also permitted, however before using the by option, the variable to be used should be sorted rst. For example in the auto data, count if rep78 > 4 by foreign: count if rep78 > 4 ds and codebook Exercise: Explore the use of the commands ds and codebook
DLPolestico STAT 325 STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Commands...
summarize, inspect The inspect command provides a quick summary of the data characteristics that will allow the user to become more familiar with the data, while the summarize command yields some preliminary analysis of the data through some summary statistics. The syntax are as follows: summarize [varlist] [if] [in] [weight] [,options] inspect [varlist] [if] [in] where options could be detail, meanonly, format, or separator(number). Examples are:

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Commands...
summarize, inspect The inspect command provides a quick summary of the data characteristics that will allow the user to become more familiar with the data, while the summarize command yields some preliminary analysis of the data through some summary statistics. The syntax are as follows: summarize [varlist] [if] [in] [weight] [,options] inspect [varlist] [if] [in] where options could be detail, meanonly, format, or separator(number). Examples are: 1 summarize

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Commands...
summarize, inspect The inspect command provides a quick summary of the data characteristics that will allow the user to become more familiar with the data, while the summarize command yields some preliminary analysis of the data through some summary statistics. The syntax are as follows: summarize [varlist] [if] [in] [weight] [,options] inspect [varlist] [if] [in] where options could be detail, meanonly, format, or separator(number). Examples are: 1 summarize
2

summarize mpg weight

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Commands...
summarize, inspect The inspect command provides a quick summary of the data characteristics that will allow the user to become more familiar with the data, while the summarize command yields some preliminary analysis of the data through some summary statistics. The syntax are as follows: summarize [varlist] [if] [in] [weight] [,options] inspect [varlist] [if] [in] where options could be detail, meanonly, format, or separator(number). Examples are: 1 summarize
2 3

summarize mpg weight summarize mpg weight if foreign


DLPolestico STAT 325 STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Commands...
summarize, inspect The inspect command provides a quick summary of the data characteristics that will allow the user to become more familiar with the data, while the summarize command yields some preliminary analysis of the data through some summary statistics. The syntax are as follows: summarize [varlist] [if] [in] [weight] [,options] inspect [varlist] [if] [in] where options could be detail, meanonly, format, or separator(number). Examples are: 1 summarize
2 3 4

summarize mpg weight summarize mpg weight if foreign summarize mpg weight if foreign, detail
DLPolestico STAT 325 STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Commands...
Tabulate and table commands The tabulate command is useful for obtaining frequency tables. The plot option can be added to make a plot to visually show the tabulated values. e.g. tabulate foreign or tabulate rep78, plot

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Commands...
Tabulate and table commands The tabulate command is useful for obtaining frequency tables. The plot option can be added to make a plot to visually show the tabulated values. e.g. tabulate foreign or tabulate rep78, plot The tab1 command can be used as a shortcut to request tables for a series of variables, instead of typing the tabulate command over and over again. e.g. tab1 foreign rep78

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Commands...
Tabulate and table commands The tabulate command is useful for obtaining frequency tables. The plot option can be added to make a plot to visually show the tabulated values. e.g. tabulate foreign or tabulate rep78, plot The tab1 command can be used as a shortcut to request tables for a series of variables, instead of typing the tabulate command over and over again. e.g. tab1 foreign rep78 The command tabulate rowvar columnvar [, options] produces a cross tabulation of the two named variables. The column option requests column percentages. The nofreq option suppresses the frequencies and focuses only on the percentages. tabulate rep78 foreign, column nofreq
DLPolestico STAT 325 STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Saving a Dataset
Exercise: Explore with the command
1

by foreign: summarize weight

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Saving a Dataset
Exercise: Explore with the command
1 2

by foreign: summarize weight tabulate foreign, summarize(weight)

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

TITLE PAGE CONTENTS of the Course Introduction to STATA Getting Started with STATA

Saving a Dataset
Exercise: Explore with the command
1 2

by foreign: summarize weight tabulate foreign, summarize(weight)

save, replace The save command saves data in memory to a le. The syntax is save [lename] [, save options] The replace option overwrites the existing dataset. Specifying a dierent lename after the save command saves the data onto a new le, without losing the original version of the data.

DLPolestico STAT 325

STATISTICAL ANALYSIS WITH STATA

You might also like