You are on page 1of 21

Introduction Overview of the module Materials and Reading Introduction to Stata

BEE3055 - Applied Economics Topic 1: Introduction


Eva Poen E.Poen@exeter.ac.uk
University of Exeter Business School

September 2013

Eva Poen

BEE3055 - Topic 1

Introduction Overview of the module Materials and Reading Introduction to Stata

Outline
1 2

Introduction Overview of the module Aim of the module Why Stata? Format of the course The Project Course outline for term 1 (approx.) Materials and Reading Reading Stata Resources ELE Introduction to Stata What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Eva Poen BEE3055 - Topic 1 Changing the data

Introduction Overview of the module Materials and Reading Introduction to Stata

What is applied economics?


In the history of economics different authors have had different views what applied economics might mean. In modern mainstream economics, there is a shared understanding that there is a body of abstract core economic theory, and that applying economics involved reducing the level of abstraction in order to illuminate specic problems or questions (R. Backhouse and J. Biddle, 2000). This can take the form of Applying economic theory to a ner level of detail (e .g. the trade of CO2 emission certicates, or the sale of mobile phone network frequencies);

Eva Poen

BEE3055 - Topic 1

Introduction Overview of the module Materials and Reading Introduction to Stata

What is applied economics? (cont.)


Imposing a structure on the general theory to draw specic conclusions from the theory; Using empirical methods to test certain aspects of economic theory (e. g. parameters of the theory, or even assumptions behind the theory). As a result of this the economic discipline has split into elds such as labour economics, monetary economics, public economics, game theory, nancial economics, development economics, environmental economics, health economics, agricultural economics etc. We will focus on empirical applied economics, rather than theoretical applied economics.
Eva Poen BEE3055 - Topic 1 4

Introduction Overview of the module Materials and Reading Introduction to Stata

Aim of the module Why Stata? Format of the course The Project Course outline for term 1 (approx.)

Target audience and aims of the module


This course addresses students who are familiar with intermediate statistics and econometrics on a theoretical level, and who want to learn how to conduct applied empirical economic analysis. Applications will include topics from: labour economics; menu costs and the economics of food pricing; behavioural and experimental economics (including experimental macro). The course aims at introducing students to the use of professional statistics software. Four areas of skills will be addressed:
Eva Poen BEE3055 - Topic 1 5

Introduction Overview of the module Materials and Reading Introduction to Stata

Aim of the module Why Stata? Format of the course The Project Course outline for term 1 (approx.)

Aims of the module


Research skills How to conduct an empirical research project from start to nish. Good practice of data analysis Learn how to conduct a well-structured and reproducible analysis, how to avoid mistakes, and how to organize your work such that you can still understand it 6 months later. Data skills Learn how to visualize important properties of your data, generate summary statistics, uncover mistakes in your data, and manipulate, import and export data as well as results.

Eva Poen

BEE3055 - Topic 1

Introduction Overview of the module Materials and Reading Introduction to Stata

Aim of the module Why Stata? Format of the course The Project Course outline for term 1 (approx.)

Aims of the module (continued)


Applied econometrics Lessons are planned on linear regression, models for limited dependent variables, panel data methods and time series data. Technical skills Learn how to exploit the power of professional statistics software, including basic programming. Additional quantitative techniques to be covered: Exploratory data analysis. Regression diagnostics. Missing values: implications and remedies. Simulation methods.
Eva Poen BEE3055 - Topic 1 7

Introduction Overview of the module Materials and Reading Introduction to Stata

Aim of the module Why Stata? Format of the course The Project Course outline for term 1 (approx.)

Software skills
Data analysis skills: Importing data into the software and organizing your work. Data types, display formats and precision. Working with dates and times / high frequency data. Advanced data manipulation. Graphics. Working with loops/other ways of repeating commands. Exporting data, results, graphs and tables to other software. Stata programming (writing procedures for Stata). ...
Eva Poen BEE3055 - Topic 1 8

Introduction Overview of the module Materials and Reading Introduction to Stata

Aim of the module Why Stata? Format of the course The Project Course outline for term 1 (approx.)

Reasons not to use Stata


Command line driven. Not point-and-click oriented and therefore harder to learn (although there are menus available). You have used TSM in the past, and might be required to learn some other software in the future. My friend says that her favourite procedure xyz is not implemented in Stata. Stata is expensive (although we have a site license here).

Eva Poen

BEE3055 - Topic 1

Introduction Overview of the module Materials and Reading Introduction to Stata

Aim of the module Why Stata? Format of the course The Project Course outline for term 1 (approx.)

Reasons to use Stata


Command line driven. Not point-and-click oriented. Very intuitive and consistent syntax. Excellent and well-documented software. Good customer support and active user community. Good balance between user-friendliness and programmability. I like Stata (and so do a lot of other people).

Eva Poen

BEE3055 - Topic 1

10

Introduction Overview of the module Materials and Reading Introduction to Stata

Aim of the module Why Stata? Format of the course The Project Course outline for term 1 (approx.)

Lectures and assessment


Lectures dates: every Wednesday from 09:00 to 11:00 (term 1). Tutorials are held fortnightly, Fridays from 15:00 to 16:00. Practical exercises will feature in lectures and tutorials. 40 % of your marks will come from a practical exam, to be held in early December. 60 % of your marks will come from a project (individual, independent piece of work), to be completed by May 2013.

Eva Poen

BEE3055 - Topic 1

11

Introduction Overview of the module Materials and Reading Introduction to Stata

Aim of the module Why Stata? Format of the course The Project Course outline for term 1 (approx.)

How the project works


Its all yours! The project is an independent piece of research. You decide which topic you want to work on, and how you want to do it. You will be given guidance not only regarding empirical analysis but also regarding choice of topic, research proposal, literature search, data sources etc. Your preparation takes place in term 1 when you will learn new skills and see examples of research. The project is split into the proposal stage and the research stage. Both will be marked separately. The time frame in which to complete your project is the second term, up until the summer examinations period.
Eva Poen BEE3055 - Topic 1 12

Introduction Overview of the module Materials and Reading Introduction to Stata

Aim of the module Why Stata? Format of the course The Project Course outline for term 1 (approx.)

Previous examples

Well-being and the Comparison Income Effect (Economics of Happiness) Demographics and democracy: long term trends and their implications (voting behaviour) CO2 emissions and economic growth Does afnity increase the commitment towards a cause?

Eva Poen

BEE3055 - Topic 1

13

Introduction Overview of the module Materials and Reading Introduction to Stata

Aim of the module Why Stata? Format of the course The Project Course outline for term 1 (approx.)

Which topics will be covered


Exploratory data analysis Summary statistics, graphics, distributional graphs, Kernel density estimation etc. Linear regression Multiple regression (cross-sectional data), outliers, regression diagnostics. Missing data and what to do about it. Also: local macros. Binary dependent variables Modelling binary outcomes, estimation, interpretation, marginal effects etc. Panel data Description and manipulation, linear panel models, time operators. Simulation methods Monte-Carlo simulations. Law of large numbers and central limit theorem at work.
Eva Poen BEE3055 - Topic 1 14

Introduction Overview of the module Materials and Reading Introduction to Stata

Reading Stata Resources ELE

What to read
Reading will be announced for every topic. However, much attention will be paid to practical skills. The two main textbooks that we will use are Microeconometrics Using Stata by Cameron and Trivedi, and Using Stata for Principles of Econometrics by Adkins and Hill. All relevant material for the practical exam will be presented in the lectures and tutorials. 60 % of the credits for this module will come from an individual, independent piece of work (project). Further reading for the project will depend on your chosen topic.
Eva Poen BEE3055 - Topic 1 15

Introduction Overview of the module Materials and Reading Introduction to Stata

Reading Stata Resources ELE

Manuals and related material

Documentation Stata v13 comes with a complete set of manuals in pdf format Books on Stata Visit the Stata Bookstore at http://www.stata.com/bookstore/ for books on Stata. Several of them are available in the library. On the web Many academics use Stata for their teaching and research. A simple web search will come up with lots of information.

Eva Poen

BEE3055 - Topic 1

16

Introduction Overview of the module Materials and Reading Introduction to Stata

Reading Stata Resources ELE

Materials on the web


There are several interesting web pages devoted to Stata.
1 2

Ofcial Stata web site: www.stata.com Stata FAQs are at http://www.stata.com/support/faqs/. Current issue of the Stata Journal: http://www.stata-journal.com/current.html Discussion list server: http://www.stata.com/statalist/ Browse the ofcial Stata web site for more (SSC software archive, UCLA Stata site, etc.).
Eva Poen BEE3055 - Topic 1 17

Introduction Overview of the module Materials and Reading Introduction to Stata

Reading Stata Resources ELE

Materials on ELE
Lecture notes, handouts and additional exercises. Data les. Please use the data les from this course only for the purpose of the course. You may not distribute data unless it is explicitly stated that you are allowed to do so. You may not use the data for commercial purposes, or attempt to sell the data. Stata code (in the form of do-les and ado-les). (The copyright of any Stata code distributed in class lies with the authors.) Discussion board: feel free to discuss issues like general data analysis questions, Stata problems etc. More information regarding the practical exam and project to follow soon.
Eva Poen BEE3055 - Topic 1 18

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Features
Fully-featured statistical software package. Strong data manipulation techniques. Many built-in statistics, including time series applications, cross sections, panel models and survey analysis. Very good graphics capabilities (publication quality graphics). Stata is extensible: users can add their own routines, and publish them on the web. Plugins written in C can be attached to the executable. Since version 9, Stata comes with a matrix programming language called Mata, which is comparable to e. g. Matlab.
Eva Poen BEE3055 - Topic 1 19

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Basics
Please start the software via the Start menu. Command window To execute a command, type it into the command window and press [Return]. By hitting the [Page Up] key, you can recover and edit commands you have previously entered. Result window All results or messages will appear in the result window. Review window Previously issued commands show up in the review window. A single click brings a command back to the command window. A double click on a command executes it.
Eva Poen BEE3055 - Topic 1 20

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Basics (cont.)

Variables window Lists all variables in the dataset, alongside their labels. Try left, right click and selection. Properties window This window displays information about the dataset (number of observations, number of variables, size), and information about a variable that has been selected in the Variables window. It also allows you to edit certain properties of a variable such as labels.

Eva Poen

BEE3055 - Topic 1

21

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Is there a college premium in the wages of American women?

Step 1 Formulate research question. Step 2 Find appropriate data to help answer research question. First type clear , then type sysuse nlsw88 Step 3 Conduct analysis. regress wage collgrad Step 4 Interpret ndings and report. The college premium is estimated to be 3.62 US Dollars.

Eva Poen

BEE3055 - Topic 1

22

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Results
. regress wage collgrad Source | SS df MS ---------+----------------------------Model | 5307.01034 1 5307.01034 Residual | 69060.9571 2244 30.7758276 ---------+----------------------------Total | 74367.9674 2245 33.1260434 Number of obs F( 1, 2244) Prob > F R-squared Adj R-squared Root MSE = = = = = = 2246 172.44 0.0000 0.0714 0.0709 5.5476

---------------------------------------------------------------------wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-----------------------------------------------------------collgrad | 3.615502 .2753268 13.13 0.000 3.07558 4.155424 _cons | 6.910561 .1339984 51.57 0.000 6.647788 7.173335 ----------------------------------------------------------------------

Eva Poen

BEE3055 - Topic 1

23

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

On notation
Notation In all lecture notes and exercise sheets, Stata commands are typeset like this: about . To issue the command, you need to type about at the command window and press [Return]. All commands are also accessible via the menus, but we will focus on typing commands and writing simple programs. File names and paths Some commands only work with the specied le names or directory names. If you choose different names, you will need to adjust the command accordingly. Sometimes, only the command name is given, not the complete syntax, e. g. regress .
Eva Poen BEE3055 - Topic 1 24

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Know where you are in Stata


Stata has the concept of a current working directory. This is the place where all les are saved to and read from by default. To nd out the current working directory, type pwd . It is also shown in the bottom left corner of the screen. To change to another directory, use cd newdir . You should change to the appropriate directory for the project you are working on, so you can nd your les later. To create a new directory in Stata: mkdir u:\bee3055 To change to that directory: cd u:\bee3055 . Repeat for subdirectories. You can create the directories via the Windows explorer, too. But then, dont forget to use cd to change to that directory.
Eva Poen BEE3055 - Topic 1 25

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Reading data into Stata


Double click on a le in Stata format will open the data set in Stata but will leave you with an incorrect working directory. Better to start the software and then issue the command use mydir\mydata , ore use the menus. If the data are saved as a text le (tab delimited or comma delimited format): insheet using mydata.txt We will cover other ways of getting data into Stata as we go along. Stata can only have one data set at a time in memory. To save your data, type save mydata . This will write a le mydata.dta in the current directory.
Eva Poen BEE3055 - Topic 1 26

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Log your work


Logging the output In order to capture the output in the viewer, you need to open a log le: log using lecture1.log, replace . View your output log Once you have nished your Stata session, type log close to close the log. To view and print it, type view lecture1.log . Logging your input When using Stata interactively, it is useful to log the input as well: cmdlog using lecture1.do, replace View your input log At the end of your session, type cmdlog close to close the log. To view it, clean it up and print it, type doedit lecture1.do .
Eva Poen BEE3055 - Topic 1 27

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Basic data reporting


Lets start with the pen01aL3 data: clear and then use pen01aL3 . It is a random sample from one wave of an employment panel data set. browse lets us view the data in spreadsheet format (write-protected). Note the different colors. describe shows basic features of the data: number of observations and number of variables, storage type, labels etc. list lists all observations and variables in the result window. (To scroll, press the space bar, and to break, type q .) To use commands like list properly, we need to know about the basic syntax shared by most Stata commands.
Eva Poen BEE3055 - Topic 1 28

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Basic syntax
The syntax of list : list [varlist] [if] [in] [,options] All components in square brackets are optional (as we saw before, list on its own works ne). varlist Restricts the execution of the command to one or more variables. Examples for varlist: lnw id pencil city exp-civil
Eva Poen BEE3055 - Topic 1 29

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Basic syntax (continued)


list [varlist] [if] [in] [,options] if Restricts the execution of the command to the observations for which a logical expression is true. Example: list if lnw < 2 Logical expressions in Stata may include > greater than < less than >= greater than or equal to <= less than or equal to == equal to != not equal to & and | or Eva Poen BEE3055 - Topic 1

30

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Basic syntax (continued)


list [varlist] [if] [in] [,options] in Restricts the execution of the command to the specied range of observations. Examples: list in 1/5 list in 93 list in -10/l Note that in refers to the current sort order of the data. if and in may be specied together: list if lnw < 2 in 1/5

Eva Poen

BEE3055 - Topic 1

31

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Basic data description


Next to describe and list, these are very useful: summarize produces summary statistics. tabulate produces one-way or two-way tabulations. count counts the number of observations. Try these descriptive graphs: histogram lnw graph box lnw graph hbox lnw, over(female, total) spikeplot ed dotplot lnw, over(female)
Eva Poen BEE3055 - Topic 1 32

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

How to get help


Online help Typing help command will display the help le for command. Example: help tabulate . Search In case you dont know the name of a command, or you need help for a function: search whatever . Links In online help les, follow the link to see help les for related entries. Manuals The manuals contain much more information than the online help les, along with examples illustrating the use of the commands. If you work with data seriously, you should consult the manuals on the commands you use.
Eva Poen BEE3055 - Topic 1 33

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Operators in Stata
Logical operators Stata knows about the logical operators & (AND), | (OR), != (NOT equal). A synonym for ! is . Relational operators Relational operators (>,<,>=,<=) work as expected, apart from equality: == Arithmetic operators Arithmetic operators also work as expected: + - * /. The symbol for power is , e.g. 52 is typed in as 52. Calculator The Stata function display can be used as a calculator: display log(137)*_pi2 .

Eva Poen

BEE3055 - Topic 1

34

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Changing the data


Knowing about the operators, we can start changing our data, e. g. generating new variables and replacing the content of the existing ones. generate This command lets you add a new variable to your data. Example: use the log wage to generate the nominal wage. generate wage = exp(lnw) does this. exp() denotes the exponential function. replace If you want to replace the contents of an existing variable, you will need to use replace . Example: generate experienced = 1 if exp >=20 replace experienced = 0 if exp < 20 Stata has many mathematical and statistical functions. Type help functions or use the expression builder to nd out more.
Eva Poen BEE3055 - Topic 1 35

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Extending generate
The generate command works on a row-by-row basis. To see this, use the example data: clear and use example generate sumx = sum(x) list Advantage: You can exploit this feature, to create AR processes and the like. Disadvantage: What if you want to create a constant, equal to the sum of x? Use egen for this: drop sumx egen sumx = sum(x)
Eva Poen BEE3055 - Topic 1 36

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Explicit subscripting
Stata allows you to refer to a specic observation (row): generate a = x[5] which will copy the value of x in observation no. 5 into every row of a new variable called a. The current observation number can be referred to as [_n]. The total number of observations can be referred to as [_N]. This allows to create lags and leads: generate xl1 = x[_n-1] generate xf1 = x[_n+1] Generate a unique ID in your data set: generate id = _n
Eva Poen BEE3055 - Topic 1 37

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Sort order
In many situations it is important that your data is sorted in a certain order, i. e. when using explicit subscripting. Try the following using the pen01aL3 data: sort female sorts your data in ascending order by female (so, males rst). sort female ptime sorts the data by female, and within female, by ptime. We can specify more variables if necessary. If the sorting criteria do not uniquely identify each observation, the data are sorted randomly within the remaining subgroups. Example: female sort female list id in 1 sort ptime Repeat.
Eva Poen BEE3055 - Topic 1 38

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

The by: prex


Stata offers an easy way to repeat commands for different subgroups. To nd out the mean log wage for males and females, we can type by female: summarize lnw

Note that the data have to be sorted by female for this to work. If they are not, we can either issue a sort female prior to the command, or we can modify our command to include the sorting: bysort female: summarize lnw

Combining the by: prex with explicit subscripting is a powerful tool for dealing with panel data. Example of usage in cross section: bysort female (lnw): generate wagerank = _n browse id female lnw wagerank
Eva Poen BEE3055 - Topic 1 39

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Data that are shipped with Stata


Stata comes with several data sets. These are very useful to work out the examples in the manuals and online help les; quickly demonstrate a Stata command; demonstrate a Stata problem you ran into with your own data. If you manage to replicate it with one of the built-in data les, it is easier for someone else to help you. To see a list of system data les: sysuse dir To load a system data le into memory: sysuse name The most commonly used system data le is auto.dta which contains the measurements, price and mileage of 74 cars. Many posts on statalist use the auto data to illustrate a problem or a solution. Advice: Never ever overwrite system data les.
Eva Poen BEE3055 - Topic 1 40

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Save your data to disk or not?


As long as you record all steps you undertook on your data ( cmdlog ), there is no need to save your changed data to disk. After a major cleanup, it is advisable to save the data and work with the clean version from then on. Avoid having many versions of your data around. Decide on one (clean) version and freeze it. The best way to keep track of your work is to organize it in do-les. We will discuss this later.
Eva Poen BEE3055 - Topic 1 41

Introduction Overview of the module Materials and Reading Introduction to Stata

What is Stata? Getting started in Stata Empirical research in four easy steps! Stata basics Changing the data Bits and pieces

Memory

The size of the data you can have in Stata is limited by the amount of memory that is allocated to Stata, which is limited by the amount of RAM in your computer. Type help memory to nd out about memory usage.

Eva Poen

BEE3055 - Topic 1

42

You might also like