You are on page 1of 7

Appendix D

MATLAB Code
In this appendix, we provide the MATLAB functions for some of the more
complicated techniques covered in this book. This includes code for the boot-
strap confidence interval, the adaptive mixtures algorithm for probabil-
ity density estimation, classification trees, and regression trees.
D.1 Bootstrap Confidence Interval
function[blo,bhi,bvals,z0,ahat]=...
csbootbca(data,fname,B,alpha)
thetahat = feval(fname,data);
[bh,se,bt] = csboot(data,fname,50);
[n,d] = size(data);
bvals = zeros(B,1);
% Loop over each resample and
% calculate the bootstrap replicates.
for i = 1:B
% generate the indices for the B bootstrap
% resamples, sampling with
% replacement using the discrete uniform.
ind = ceil(n.*rand(n,1));
% extract the sample from the data
% each row corresponds to a bootstrap resample
xstar = data(ind,:);
% use feval to evaluate the estimate for
% the i-th resample
bvals(i) = feval(fname, xstar);
end
numless = length(find(bvals<thetahat));
z0 = norminv(numless/B,0,1);
% find the estimate for acceleration using jackknife
jvals = zeros(n,1);
BC
a
BC
a
2002 by Chapman & Hall/CRC
540 Computational Statistics Handbook with MATLAB
for i = 1:n
% use feval to evaluate the estimate
% with the i-th observation removed
% These are the jackknife replications.
jvals(i) =...
feval(fname, [data(1:(i-1));data((i+1):n)]);
end
num = (mean(jvals)-jvals).^3;
den = (mean(jvals)-jvals).^2;
ahat = sum(num)/(6*sum(den)^(3/2));
zlo = norminv(alpha/2,0,1); % this is the z^(a/2)
zup = norminv(1-alpha/2,0,1); % this is the z^(1-a/2)
% Equation 14.10, E & T
arg = z0 + (z0 + zlo)/(1-ahat*(z0+zlo));
alpha1 = normcdf(arg,0,1);
arg = z0 + (z0 + zup)/(1-ahat*(z0+zup));
alpha2 = normcdf(arg,0,1);
k1 = floor(((B+1)*alpha1));
k2 = ceil(((B+1)*alpha2)); % ???
sbval = sort(bvals);
blo = sbval(k1);
bhi = sbval(k2);
D.2 Adaptive Mixtures Density Estimation
First we provide some of the helper functions that are used in csadpmix.
This first function calculates the estimated posterior probability, given the
current estimated model and the new observation.
% function post=rpostup(x,pies,mus,vars,nterms)
% This function will return the posterior.
function post = rpostup(x,pies,mus,vars,nterms)
f = exp(-.5*(x-mus(1:nterms)).^2./...
vars(1:nterms)).*pies(1:nterms);
f = f/sum(f);
post = f;
Next we need a function that will update the mixing coefficients, the means
and the variances using the posteriors and the new data point.
% This function will update all of the parameters for
% the adaptive mixtures density estimation approach
2002 by Chapman & Hall/CRC
Appendix D: MATLAB Code 541
function [piess,muss,varss]=...
csrup(x,pies,mus,vars,posterior,nterms,n)
inertvar = 10;
betan = 1/(n);
piess = pies(1:nterms);
muss = mus(1:nterms);
varss = vars(1:nterms);
post = posterior(1:nterms);
% update the mixing coefficients
piess = piess+(post-piess)*betan;
% update the means
muss = muss+betan*post.*(x-muss)./piess;
% update the variances
denom = (1/betan)*piess+inertvar;
varss = varss+post.*((x-muss).^2-varss)./denom;
Finally, the following function will set the initial variance for newly created
terms.
% This function will update the variances
% in the AMDE. Call with nterms-1,
% since new term is based only on previous terms
function newvar = cssetvar(mus,pies,vars,x,nterms)
f=exp(-.5*(x-mus(1:nterms))...
.^2./vars(1:nterms)).*pies(1:nterms);
f = f/sum(f);
f = f.*vars(1:nterms);
newvar = sum(f);
Here is the main MATLAB function csadpmix that ties everything together.
For brevity, we show only the part of the function that corresponds to the
univariate case. View the M-file for the multivariate case.
function [pies,mus,vars] = cadpmix(x,maxterms)
n = length(x);
mus = zeros(1,maxterms);
vars = zeros(1,maxterms);
pies = zeros(1,maxterms);
posterior = zeros(1,maxterms);
tc = 1;
% lower bound on new pies
minpie = .00001;
% bound on variance
sievebd = 1000;
% initialize density to first data point
nterms = 1;
2002 by Chapman & Hall/CRC
542 Computational Statistics Handbook with MATLAB
mus(1) = x(1);
% rule of thumb for initial variance - univariate
vars(1) = (std(x))^2/2.5;
pies(1) = 1;
% loop through all of the data points
for i = 2:n
md = ((x(i)-mus(1:nterms)).^2)./vars(1:nterms);
if min(md)>tc & nterms<maxterms
create = 1;
else
create = 0;
end
if create == 0 % update terms
posterior(1:nterms)=...
csrpostup(x(i),pies,mus,vars,nterms);
[pies(1:nterms),mus(1:nterms),...
vars(1:nterms)]=csrup(x(i),pies,mus,...
vars,posterior,nterms,i);
else % create a new term
nterms = nterms+1;
mus(nterms) = x(i);
pies(nterms) = max([1/(i),minpie]);
% update pies
pies(1:nterms-1)=...
pies(1:nterms-1)*(1-pies(nterms));
vars(nterms)=...
cssetvar(mus,pies,vars,x(i),nterms-1);
end % end if statement
% to prevent spiking of variances
index = find(vars(1:nterms)<1/(sievebd*nterms));
vars(index) = ones(size(index))/(sievebd*nterms);
end % for i loop
% clean up the model - get rid of the 0 terms
mus((nterms+1):maxterms) = [];
pies((nterms+1):maxterms) = [];
vars((nterms+1):maxterms) = [];
D.3 Classification Trees
In the interest of space, we only include (in the text) the MATLAB code for
growing a classification tree. All of the functions for working with trees are
included with the Computational Statistics Toolbox, and the reader can easily
view the source code for more information.
2002 by Chapman & Hall/CRC
Appendix D: MATLAB Code 543
function tree = csgrowc(X,maxn,clas,Nk,pies)
[n,dd] = size(X);
if nargin == 4% then estimate the pies
pies = Nk/n;
end
% The tree will be implemented as a structure.
% get the initial tree - which is the data set itself
tree.pies = pies;
% need for node impurity calcs:
tree.class = clas;
tree.Nk = Nk;
% maximum number to be allowed in the terminal nodes:
tree.maxn = maxn;
% number of nodes in the tree - total:
tree.numnodes = 1;
% vector of terminal nodes:
tree.termnodes = 1;
% 1=terminal node, 0=not terminal:
tree.node.term = 1;
% total number of points in the node:
tree.node.nt = sum(Nk);
tree.node.impurity = impure(pies);
tree.node.misclass = 1-max(pies);
% prob it is node t:
tree.node.pt = 1;
% root node has no parent
tree.node.parent = 0;
% This will be a 2 element vector of
% node numbers to the children.
tree.node.children = [];
% pointer to sibling node:
tree.node.sibling = [];
% the class membership associated with this node:
tree.node.class = [];
% the splitting value:
tree.node.split = [];
% the variable or dimension that will be split:
tree.node.var = [];
% number of points from each class in this node:
tree.node.nkt = Nk;
% joint prob it is class k and it falls into node t
tree.node.pjoint = pies;
% prob it is class k given node t
tree.node.pclass = pies;
% the root node contains all of the data:
2002 by Chapman & Hall/CRC
544 Computational Statistics Handbook with MATLAB
tree.node.data = X;
% Now get started on growing the very large tree.
% first we have to extract the number of terminal nodes
% that qualify for splitting.
% get the data needed to decide to split the node
[term,nt,imp]=getdata(tree);
% find all of the nodes that qualify for splitting
ind = find( (term==1) & (imp>0) & (nt>maxn) );
% now start splitting
while ~isempty(ind)
for i=1:length(ind)% check all of them
% get split
[split,dim]=...
splitnode(tree.node(ind(i)).data,...
tree.node(ind(i)).impurity,...
tree.class,tree.Nk,tree.pies);
% split the node
tree = addnode(tree,ind(i),dim,split);
end % end for loop
[term,nt,imp]=getdata(tree);
tree.termnodes = find(term==1);
ind = find( (term==1) & (imp>0) & (nt>maxn) );
length(tree.termnodes);
itmp = find(term==1);
end % end while loop
D.4 Regression Trees
Below is the function for growing a regression tree. The complete set of func-
tions needed for working with regression trees is included with the
Computational Statistics Toolbox.
function tree = csgrowr(X,y,maxn)
n = length(y);
% The tree will be implemented as a structure
tree.maxn = maxn;
tree.n = n;
tree.numnodes = 1;
tree.termnodes = 1;
tree.node.term = 1;
tree.node.nt = n;
2002 by Chapman & Hall/CRC
Appendix D: MATLAB Code 545
tree.node.impurity = sqrer(y,tree.n);
tree.node.parent = 0;
tree.node.children = [];
tree.node.sibling = [];
tree.node.yhat = mean(y);
tree.node.split = [];
tree.node.var = [];
tree.node.x = X;
tree.node.y = y;
% Now get started on growing the tree very large
[term,nt,imp]=getdata(tree);
% find all of the nodes that qualify for splitting
ind = find( (term==1) & (imp>0) & (nt>maxn) );
% now start splitting
while ~isempty(ind)
for i=1:length(ind)
% get split
[split,dim]=splitnoder(...
tree.node(ind(i)).x,...
tree.node(ind(i)).y,...
tree.node(ind(i)).impurity,...
tree.n);
% split the node
tree = addnoder(tree,ind(i),dim,split);
end % end for loop
[term,nt,imp]=getdata(tree);
tree.termnodes = find(term==1);
ind = find( (term==1) & (imp>0) & (nt>maxn) );
end % end while loop
2002 by Chapman & Hall/CRC

You might also like