Computational Statistics Matlab

Appendix D
MATLAB Code
In this appendix, we provide the MATLAB functions for some of the more
complicated techniques covered in this book. This includes code for the boot-
strap confidence interval, the adaptive mixtures algorithm for probabil-
ity density estimation, classification trees, and regression trees.
D.1 Bootstrap Confidence Interval
function[blo,bhi,bvals,z0,ahat]=...
csbootbca(data,fname,B,alpha)
thetahat = feval(fname,data);
[bh,se,bt] = csboot(data,fname,50);
[n,d] = size(data);
bvals = zeros(B,1);
% Loop over each resample and
% calculate the bootstrap replicates.
for i = 1:B
% generate the indices for the B bootstrap
% resamples, sampling with
% replacement using the discrete uniform.
ind = ceil(n.*rand(n,1));
% extract the sample from the data
% each row corresponds to a bootstrap resample
xstar = data(ind,:);
% use feval to evaluate the estimate for
% the i-th resample
bvals(i) = feval(fname, xstar);
end
numless = length(find(bvals<thetahat));
z0 = norminv(numless/B,0,1);
% find the estimate for acceleration using jackknife
jvals = zeros(n,1);
BC
a
BC
a
2002 by Chapman & Hall/CRC
540 Computational Statistics Handbook with MATLAB
for i = 1:n
% use feval to evaluate the estimate
% with the i-th observation removed
% These are the jackknife replications.
jvals(i) =...
feval(fname, [data(1:(i-1));data((i+1):n)]);
end
num = (mean(jvals)-jvals).^3;
den = (mean(jvals)-jvals).^2;
ahat = sum(num)/(6*sum(den)^(3/2));
zlo = norminv(alpha/2,0,1); % this is the z^(a/2)
zup = norminv(1-alpha/2,0,1); % this is the z^(1-a/2)
% Equation 14.10, E & T
arg = z0 + (z0 + zlo)/(1-ahat*(z0+zlo));
alpha1 = normcdf(arg,0,1);
arg = z0 + (z0 + zup)/(1-ahat*(z0+zup));
alpha2 = normcdf(arg,0,1);
k1 = floor(((B+1)*alpha1));
k2 = ceil(((B+1)*alpha2)); % ???
sbval = sort(bvals);
blo = sbval(k1);
bhi = sbval(k2);
D.2 Adaptive Mixtures Density Estimation
First we provide some of the helper functions that are used in csadpmix.
This first function calculates the estimated posterior probability, given the
current estimated model and the new observation.
% function post=rpostup(x,pies,mus,vars,nterms)
% This function will return the posterior.
function post = rpostup(x,pies,mus,vars,nterms)
f = exp(-.5*(x-mus(1:nterms)).^2./...
vars(1:nterms)).*pies(1:nterms);
f = f/sum(f);
post = f;
Next we need a function that will update the mixing coefficients, the means
and the variances using the posteriors and the new data point.
% This function will update all of the parameters for
% the adaptive mixtures density estimation approach
Appendix D: MATLAB Code 541
function [piess,muss,varss]=...
csrup(x,pies,mus,vars,posterior,nterms,n)
inertvar = 10;
betan = 1/(n);
piess = pies(1:nterms);
muss = mus(1:nterms);
varss = vars(1:nterms);
post = posterior(1:nterms);
% update the mixing coefficients
piess = piess+(post-piess)*betan;
% update the means
muss = muss+betan*post.*(x-muss)./piess;
% update the variances
denom = (1/betan)*piess+inertvar;
varss = varss+post.*((x-muss).^2-varss)./denom;
Finally, the following function will set the initial variance for newly created
terms.
% This function will update the variances
% in the AMDE. Call with nterms-1,
% since new term is based only on previous terms
function newvar = cssetvar(mus,pies,vars,x,nterms)
f=exp(-.5*(x-mus(1:nterms))...
.^2./vars(1:nterms)).*pies(1:nterms);
f = f/sum(f);
f = f.*vars(1:nterms);
newvar = sum(f);
Here is the main MATLAB function csadpmix that ties everything together.
For brevity, we show only the part of the function that corresponds to the
univariate case. View the M-file for the multivariate case.
function [pies,mus,vars] = cadpmix(x,maxterms)
n = length(x);
mus = zeros(1,maxterms);
vars = zeros(1,maxterms);
pies = zeros(1,maxterms);
posterior = zeros(1,maxterms);
tc = 1;
% lower bound on new pies
minpie = .00001;
% bound on variance
sievebd = 1000;
% initialize density to first data point
nterms = 1;
mus(1) = x(1);
% rule of thumb for initial variance - univariate
vars(1) = (std(x))^2/2.5;
pies(1) = 1;
% loop through all of the data points
for i = 2:n
md = ((x(i)-mus(1:nterms)).^2)./vars(1:nterms);
if min(md)>tc & nterms<maxterms
create = 1;
else
create = 0;
end
if create == 0 % update terms
posterior(1:nterms)=...
csrpostup(x(i),pies,mus,vars,nterms);
[pies(1:nterms),mus(1:nterms),...
vars(1:nterms)]=csrup(x(i),pies,mus,...
vars,posterior,nterms,i);
else % create a new term
nterms = nterms+1;
mus(nterms) = x(i);
pies(nterms) = max([1/(i),minpie]);
% update pies
pies(1:nterms-1)=...
pies(1:nterms-1)*(1-pies(nterms));
vars(nterms)=...
cssetvar(mus,pies,vars,x(i),nterms-1);
end % end if statement
% to prevent spiking of variances
index = find(vars(1:nterms)<1/(sievebd*nterms));
vars(index) = ones(size(index))/(sievebd*nterms);
end % for i loop
% clean up the model - get rid of the 0 terms
mus((nterms+1):maxterms) = [];
pies((nterms+1):maxterms) = [];
vars((nterms+1):maxterms) = [];
D.3 Classification Trees
In the interest of space, we only include (in the text) the MATLAB code for
growing a classification tree. All of the functions for working with trees are
included with the Computational Statistics Toolbox, and the reader can easily
view the source code for more information.
function tree = csgrowc(X,maxn,clas,Nk,pies)
[n,dd] = size(X);
if nargin == 4% then estimate the pies
pies = Nk/n;
end
% The tree will be implemented as a structure.
% get the initial tree - which is the data set itself
tree.pies = pies;
% need for node impurity calcs:
tree.class = clas;
tree.Nk = Nk;
% maximum number to be allowed in the terminal nodes:
tree.maxn = maxn;
% number of nodes in the tree - total:
tree.numnodes = 1;
% vector of terminal nodes:
tree.termnodes = 1;
% 1=terminal node, 0=not terminal:
tree.node.term = 1;
% total number of points in the node:
tree.node.nt = sum(Nk);
tree.node.impurity = impure(pies);
tree.node.misclass = 1-max(pies);
% prob it is node t:
tree.node.pt = 1;
% root node has no parent
tree.node.parent = 0;
% This will be a 2 element vector of
% node numbers to the children.
tree.node.children = [];
% pointer to sibling node:
tree.node.sibling = [];
% the class membership associated with this node:
tree.node.class = [];
% the splitting value:
tree.node.split = [];
% the variable or dimension that will be split:
tree.node.var = [];
% number of points from each class in this node:
tree.node.nkt = Nk;
% joint prob it is class k and it falls into node t
tree.node.pjoint = pies;
% prob it is class k given node t
tree.node.pclass = pies;
% the root node contains all of the data:
tree.node.data = X;
% Now get started on growing the very large tree.
% first we have to extract the number of terminal nodes
% that qualify for splitting.
% get the data needed to decide to split the node
[term,nt,imp]=getdata(tree);
% find all of the nodes that qualify for splitting
ind = find( (term==1) & (imp>0) & (nt>maxn) );
% now start splitting
while ~isempty(ind)
for i=1:length(ind)% check all of them
% get split
[split,dim]=...
splitnode(tree.node(ind(i)).data,...
tree.node(ind(i)).impurity,...
tree.class,tree.Nk,tree.pies);
% split the node
tree = addnode(tree,ind(i),dim,split);
end % end for loop
tree.termnodes = find(term==1);
length(tree.termnodes);
itmp = find(term==1);
end % end while loop
D.4 Regression Trees
Below is the function for growing a regression tree. The complete set of func-
tions needed for working with regression trees is included with the
Computational Statistics Toolbox.
function tree = csgrowr(X,y,maxn)
n = length(y);
% The tree will be implemented as a structure
tree.maxn = maxn;
tree.n = n;
tree.numnodes = 1;
tree.termnodes = 1;
tree.node.term = 1;
tree.node.nt = n;
tree.node.impurity = sqrer(y,tree.n);
tree.node.parent = 0;
tree.node.children = [];
tree.node.sibling = [];
tree.node.yhat = mean(y);
tree.node.split = [];
tree.node.var = [];
tree.node.x = X;
tree.node.y = y;
% Now get started on growing the tree very large
% find all of the nodes that qualify for splitting
% now start splitting
while ~isempty(ind)
for i=1:length(ind)
% get split
[split,dim]=splitnoder(...
tree.node(ind(i)).x,...
tree.node(ind(i)).y,...
tree.node(ind(i)).impurity,...
tree.n);
% split the node
tree = addnoder(tree,ind(i),dim,split);
end % end for loop
tree.termnodes = find(term==1);
end % end while loop

Computational Statistics Matlab

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computational Statistics Matlab

Uploaded by

Copyright:

Available Formats

Appendix D

You might also like