14 views

Uploaded by sipho23

- probablity distribution
- DiscreteRandomVariables (1)
- RELTIQUES_V1N2
- 2014spring440midtermtwosols
- Competing Risk
- Microbial Distribution 2010.pdf
- Schmittlein, D. C., & Mahajan, V. (1982). Maximum Likelihood Estimation for an Innovation Diffusion Model of New Product Acceptance. Marketing Science, 1(1), 57-78
- Warranty Cost Paper9_Jun
- 159.pdf
- Pointwise Stationary Fluid Models
- Stat(Ian Castro)5
- Newdistns.pdf
- RMT MODEL
- 3 Constant Failure Rate Models
- Lindsey 1997 Applying Generalized Linear Models.pdf
- Session3__PSQT_dkj
- 11.1 Probabilistic Risk Analysis - Examples_with Notes
- NON-SCIENCE_Outline_Parts_1and2.docx
- Classification,Tabulation,Graphical and Diagrammatic Presentation of Data
- Probabilistic Design - The Future of Rock Engineering

You are on page 1of 6

These notes are not a complete introduction to R. They are designed to give you enough information

that in conjuction with Rs built in help you can complete the exercises on sampling distribtuions,

maximum likelihood and bootstrapping. A good reference for using R for statistics is Modern Applied

Statistics with S by Venables and Ripley.

The hardest part about learning a new program is trying to find functions you know exist but you

don’t know what they are called. The easiest way to find what you need in R is with,

help.search('keyword')

R also has a HTML version of the help pages that you can access using help.start(). If you have

never used R before this is a good place to start - there is a link to an introductory manual.

If you need more information on any of the functions talked about in these notes, try

?topic

or

help('topic')

This will bring up the relevent help page and hopefully answer any questions.

R contains by default many distributions. For example, ?Normal tells us about four functions relating

to the normal distribution,

• pnorm - the distribution function

• rnorm - a function to generate normal random variates

• qnorm - the quantile function

• dnorm - the density function

All of these functions can take arguments to specify the mean and variance of the distribution. In

general, the first letter of the function determines what information about the distribution we want

and then the distribution nickname follows directly. For example, if we want the density function

for the Poisson distribution we use dpois. Some useful distributions are:

?Normal norm the normal distribution

?Possion pois the poisson distribution

?Binomial binom the binomial distribution

?Geometric geom the geometric distribution

?TDist t the Students t distibution

?FDist f the F distribution

?Chisquare chisq the Chi-Squared Distribution

1

Remember to check the functions define the distribution the way you expect.

Often we want to draw random numbers but want our results to replicable. The set.seed

command lets us set the starting point for the random number generation and if we use the same

seed we will always get the same random numbers (note: sometimes this is not what you want).

Example

> rnorm(10)

[1] -0.48625474 -1.04637480 0.35224455 -1.07573884 1.11397047 -2.76540668

[7] 0.09103617 0.07424120 -1.77744938 -0.16139351

> rnorm(10)

[1] -0.25579874 -0.60993009 0.58799858 1.78268665 -0.37946733 1.97735026

[7] -0.76431899 -0.82588271 0.42827234 0.09188668

> set.seed(18749)

> rnorm(10)

[1] -0.54555908 1.22938955 2.58249311 1.57057115 -0.03186376 0.32983807

[7] -0.44899542 0.56599635 -0.23902963 0.79350037

> set.seed(18749)

> rnorm(10)

[1] -0.54555908 1.22938955 2.58249311 1.57057115 -0.03186376 0.32983807

[7] -0.44899542 0.56599635 -0.23902963 0.79350037

It is often useful to be able to randomly draw from a set of numbers we have already (for example

when bootstrapping). This is done using the function sample. We supply sample with a vector of

numbers to draw from. Without any other arguments sample will randomly permute the vector. If

in addition we supply an argument size, sample will randomly draw size numbers from our vector

without replacement. There is also an argument replace that specifies whether drawing should be

with replacement (by defualt without replacement). The argument prob can be used to specify a

probability distribution other than uniform with which to draw the numbers.

Example

> sample(1:10)

[1] 10 7 8 9 3 2 6 1 4 5

> sample(1:10, size = 5)

[1] 5 8 6 1 10

> sample(1:10, size = 15, replace = TRUE)

[1] 9 5 1 4 5 10 1 2 1 8 1 2 7 5 10

Basic functions

Being a statistical package R has plenty of built in functions for preforming basic statistical opera-

tions. Some are given below and you can probably guess many others.

2

Example

> x <- rnorm(20, mean = 10, sd = 20)

> mean(x)

[1] 8.144194

> sd(x)

[1] 17.36403

> sqrt(var(x))

[1] 17.36403

> median(x)

[1] 8.448636

5% 95%

-18.35355 31.73562

> summary(x)

-18.630 -6.728 8.449 8.144 25.220 36.880

There are also some basic mathemathical functions that will be useful.

> length(y)

[1] 20

> max(y)

[1] 20

> min(y)

[1] 1

> log(y)

[8] 2.0794415 2.1972246 2.3025851 2.3978953 2.4849066 2.5649494 2.6390573

[15] 2.7080502 2.7725887 2.8332133 2.8903718 2.9444390 2.9957323

> sum(y)

[1] 210

> prod(y)

[1] 2.432902e+18

3

Plots

There are lots of ways to create plots in R. Functions like hist, boxplot and qqnorm produce

standard statistical plots given some data. plot can be adjusted to plot a lot of different types of

plot (and there are often defaults for different R objects, for example try plot(log)). There are also

a number of functions that will add things to a plot: points, lines, text. The most useful one for

the exercises is abline which will add a straight line to a plot.

> s <- 2 * z + 4 + rnorm(100)

> par(mfrow = c(2, 2))

> hist(z)

> plot(s, z)

> qqnorm(z)

> qqline(z)

> curve(log(x), from = 0, to = 10)

> abline(v = 4, col = "red", lty = "dashed")

Histogram of z

●

● ●● ●

●

20

● ● ● ● ●

1

● ●●●● ● ●●●● ●

●

● ● ●● ● ● ●

●●●●●

Frequency

● ●

15

● ● ● ● ●

● ●●●●● ●

●●●● ● ●

0

● ● ●● ● ●● ●● ● ●

● ●●

z

● ● ● ●● ●

10

●●● ●

●

−1

● ● ●

● ●●

● ●

● ● ●

5

●●● ●

−2

●

0

−2 −1 0 1 2 0 2 4 6 8

z s

●

●

2

●●●●●

Sample Quantiles

●●

●●

●

1

●●

●

●●

●●

●

●

●

●

●●

●

●●

●

●●

●

●●

1

●●

●

●

●

●

●●

●

●

●

●●

●

●

●●

●●

●

log(x)

●●

●

●

0

●

●●

●

●

●●

●

●

●

●●

●

0

●●

●

●

●●

●

●

●●

●

●

●

−1

●

●●

●

●

●

−2 −1

●

●●●●

● ●●●

−2

−2 −1 0 1 2 0 2 4 6 8 10

Theoretical Quantiles x

4

Optimizing functions

Most of the functions in R for optimization are minimizers. You generally have to rewrite your

problem so that it is one of minimization. For example, to maximise a likelihood you minimize

the negative likelihood. Or, to find a root of a function you could minimize its square. There are

many functions that will do this: nlm, optim, optimize, nlminb and constrOptim. My favourite is

nlminb because it handles both one parameter and multiparameter problems and it is easy to place

constraints on the parameters. At the very least you need to give the function a place to start and

a function to minimize.

Imagine we have 20 observations from an exponential distribution with unknown parameter λ (we’ll

simulate this data). We want to find the maximum likelihood estimate for λ We know the density

for an exponential distribution is

f (x|λ) = λe−λx x ≥ 0.

We can write the log likelihood as,

20

X

l(λ) = (log λ − λxi )

i=1

20

X

= n log λ − λ xi

i=1

> x <- rexp(20, rate = 4)

> n <- length(x)

> nllhood = function(lambda) {

+ -1 * (n * log(lambda) - lambda * sum(x))

+ }

> fit <- nlminb(6, nllhood)

> fit

$par

[1] 3.491674

$objective

[1] -5.007625

$convergence

[1] 0

$message

[1] "relative convergence (4)"

$iterations

[1] 6

$evaluations

5

function gradient

8 9

> fit$par

[1] 3.491674

apply

apply is an incredibly useful function when you are making the same calculation repeatedly over

the columns or rows of an object. It takes three arguments. The first the object you wish to apply

the function to, the second either 1 or 2 depending on whether you are working across the rows or

down the columns, and the third the function you wish to apply.

Example

> x <- matrix(rep(1:5, 5), ncol = 5)

> x

[,1] [,2] [,3] [,4] [,5]

[1,] 1 1 1 1 1

[2,] 2 2 2 2 2

[3,] 3 3 3 3 3

[4,] 4 4 4 4 4

[5,] 5 5 5 5 5

> apply(x, 1, mean)

[1] 1 2 3 4 5

> apply(x, 2, mean)

[1] 3 3 3 3 3

> apply(x, 2, function(y) c(mean(y), sd(y)))

[,1] [,2] [,3] [,4] [,5]

[1,] 3.000000 3.000000 3.000000 3.000000 3.000000

[2,] 1.581139 1.581139 1.581139 1.581139 1.581139

Also look at lapply, sapply, tapply and mapply.

Reading in data

> p.data <- read.table("http://www.stat.berkeley.edu/~wickham/poisson.txt")

> head(p.data)

V1

1 4

2 2

3 4

4 4

5 11

6 8

- probablity distributionUploaded byDhrubajyoti Datta
- DiscreteRandomVariables (1)Uploaded byGermanjoe
- RELTIQUES_V1N2Uploaded bygustavo orozco
- 2014spring440midtermtwosolsUploaded byEd Z
- Competing RiskUploaded byhimanshu
- Microbial Distribution 2010.pdfUploaded byGustavo Sotomayor
- Schmittlein, D. C., & Mahajan, V. (1982). Maximum Likelihood Estimation for an Innovation Diffusion Model of New Product Acceptance. Marketing Science, 1(1), 57-78Uploaded byJohan Mark Berends
- Warranty Cost Paper9_JunUploaded byjasonkidd007
- 159.pdfUploaded byIJAR Journal
- Pointwise Stationary Fluid ModelsUploaded byandrenr
- Stat(Ian Castro)5Uploaded byAlvin Romualdo
- Newdistns.pdfUploaded byomid_kharazmi
- RMT MODELUploaded bySubathra Devi Mourougane
- 3 Constant Failure Rate ModelsUploaded byeeit_nizam
- Lindsey 1997 Applying Generalized Linear Models.pdfUploaded byCarlos Andrade
- Session3__PSQT_dkjUploaded bydharmendrakumar_jaiswal
- 11.1 Probabilistic Risk Analysis - Examples_with NotesUploaded byFlora Lin
- NON-SCIENCE_Outline_Parts_1and2.docxUploaded byClint Austin
- Classification,Tabulation,Graphical and Diagrammatic Presentation of DataUploaded bySonali Singh
- Probabilistic Design - The Future of Rock EngineeringUploaded byBrian Conner
- Yanhong Wu-Inference for Change Point and Post Change Means After a CUSUM Test (Lecture Notes in Statistics)-Springer (2005)Uploaded bysuperpol
- stat_TDUploaded byBruno Gomes
- [B8]GIE2001 _with changes_2016_17Uploaded byJacky
- Unveiling the Identity of PIN from the Flash CrashUploaded byChristian Westheide
- ProbabilityUploaded bybenitogaldos19gmail.com
- UT Dallas Syllabus for te3341.501.07s taught by Md Ali (mohammed)Uploaded byUT Dallas Provost's Technology Group
- Discrete Random VariablesUploaded byShannon Burton
- CSE-lpUploaded bysa1bala
- ReviewerUploaded byMarvel Tating
- Computational Field TheoryUploaded byJuvenal Tordocillo

- Henny Penny 500-561-600 TM_FINAL-FM06-009 9-08Uploaded byDiscman2
- Electrical MCQSUploaded byMuhammad Azhar Iqbal
- EPRI-DOE Handbook of Energy Storage for TransmissionUploaded bySAEIDKIAN
- CI001001R100 Ver 1 Indoor Dist Substation Design StdUploaded byahmadgce04
- Design, Development and Testing of Circular Waveguide Taper for Millimeter Wave Transmission LineUploaded byeditor3854
- 700htp Iseng0100 Fluke Hydraulic Pump TesterUploaded byDineros Nieva
- heat n glow gateway fireplaceUploaded byMatt Murdy
- Canadian Journal of Civil Engineering Volume 9 Issue 2 1982 [Doi 10.1139%2Fl82-025] Jaeger, Leslie G.; Bakht, Baidar -- The Grillage Analogy in Bridge AnalysisUploaded byKashiPrakash Dambal
- E-C130 Geophysical MethodsUploaded byRoxana Alb
- _Waterhammer Analysis John Parmakain Charts.pdfUploaded bydaskir
- Load Balancing Techniques Major Challenges in Cloud Computing a Systematic ReviewUploaded byAmr Elmestekawi
- Alimentação hospitalar proposições para a qualificação do serviço de alim e nut aval. pela com cientificaUploaded byEvelyn Kowalczyk Dos Santos
- F449-02(2014) Standard Practice for SubsurfaceUploaded bysharon blushtein
- ERIC F. CLARKE_Ways of Listening.pdfUploaded byYeseli Queen
- Value Stream Mapping at Siemens LtdUploaded bynakulmodani
- Catalogue of Mongol Mss From OrdosUploaded byShankerThapa
- Curriculum Guidelines (2)Uploaded byRamandeep Sachdeva
- 200+ Keywords_Mains 2018_theIAShub_MK Yadav_MUploaded byclnreddy1993
- 3.2 Scada GeneralUploaded byAnjaneyulu Bodhanapu
- Case.pdfUploaded byHamilton Miranda
- Jorge Diaz Cintas The Didactics of Audiovisual Translation Benjamins Translation Library.pdfUploaded byHelena Santiago
- ThesisUploaded byArjun Mullu
- analysis on mangaUploaded byapi-362004314
- Service Manual A SeriesUploaded bytothzoltan1
- EC GATE'14solve Paper 01Uploaded byAdzLinkBalaoang
- Estimate Hec-ras Dam BreachUploaded byDerek Aherne
- SM4125-07Uploaded byNguyen Huu Hanh Lam
- MP 2009 Intro AIUploaded byAfiq Iqmal
- O'Neil & Marsick - Peer Mentoring & Action LearningUploaded byslimceagrrl
- Physical Properties HDPE PE3408Uploaded byDanny Milton Silva Vasquez