You are on page 1of 13

- Matlab Corner - http://www.matlabcorner.

com -
Matlab Tutorial 4: Data Analysis and Statistics with Matlab
This tutorial covers data analysis
[1]
and statistics
[2]
using Matlab.
Histogram Charts in Matlab
The elements of a vector can be displayed with bars or histograms. To create a histogram
[3]
you
need to divide the elements in to classes and count how many elements that belongs to each
class. Then present them as rectangular bars in a diagram. The height of the rectangle is equal to
the number of elements in that class. Read the vector.

Matlab Histogram Example
hist(x) Plots a histogram with 10 intervals (default) for the elements in vector x.
hist(x,n) Plots a histogram with n intervals for the elements in vector x.
hist(x,y) Plots a histogram with arbitrary intervals. These are given in vector x.
Introduce a histogram with 15 intervals for the vector x above. It is rather difficult to see how
long the intervals are. Maybe its better to introduce a histogram with 6 intervals since we know
the difference between the maximum and minimum value.
>> hist(x,6)
If we dont know the dataset, we can define the intervals that we are interested to have in the
histogram. Suppose we want the integer values between 0 and 10.
>> y=0:10;
>> bar(x),grid
>> title('bar for vector x')

Histogram Bars with Grid in the Background

Stem Plot
To make this a bit more complete we show some other plotting possibilities. In Matlab we can
also illustrate a discrete sequence (stem plot). This is done as:
>> stem(x), grid
>> title('Stem diagram for vector x')
See figure below. Notice that the values in vector x are plotted versus its index.

Matlab Stem Plot Example
Assume you want to produce a curve from sampled data. If you take sampled data from a sine
curve with an amplitude of 1. It becomes a discrete sequence of data. The sine curve looks like:
y=sin(x);
>> z=0:0.2:10;
>> yy=sin(z)
>> stem(yy),grid,title('Stem plot for a sine curve')
We can consider this to be a continuous sine wave sampled. This is how a computer system
would look upon the signal.

Stem Plot for Sine Curve
Staircase Plot and Pia Charts
What other presentations can we find? We will also take a look on pie- and staircase diagrams.
Below we have an m-file.
1
2
3
4
5
6
7
% M-file created by MatlabCorner.com
% M-file makes two presentations of vector x.
% File created by MatlabCorner.com
subplot(1,2,1)
pie(x),grid, title('Pie diagram for vector x')
subplot(1,2,2)
stairs(x), grid,title('Staircase plot for vector x')
See the result in the figures below.

Staircase Plot and Pie Diagram Example


Each element in the pie diagram is given in percent of the whole vector sum x. For a staircase
plot the elements in vector x is plotted versus the index.
pie(x,explode) is used to specify slices that should be pulled out from the pie. The vector
EXPLODE must be the same size as X. The slices where EXPLODE is non-zero will be
pulled out.
pie(x,labels) is used to label each pie slice with cell array LABELS. LABELS must be
the same size as X and can only contain strings.
>> pie([ 2 4 3 5],{'North','South','East','West'})
Several of the commands that we have presented here can also be used in three dimensions. The
commands are changed to: bar3(x), stem3(x) and pie3(x). Try them.
Statistics Commands in Matlab
We will now focus on some commands for statistics. These are needed to evaluate measured
data. There are also some functions that can be added in your figure window as well. Lets start
with some simple commands, but to use them we need to have some repetition of different
concepts.
Mean value:
Assume a vector x=[ 1 2 3 4 5]. Sum the elements and divide with the number of
elements. In Matlab: mean(x)
Median value:
Gives the element in the vector that is in the middle. For the vector x it should be 3.
There are equally many elements that are higher as well as lower than 3. If we have y=[ 1
3 4 6], then median(y) produces the answer 3.5 . Because if we have an even number of
elements in vector y. It calculates the mean value of 3 and 4. In Matlab: median(x)
Variance:
If we have a data vector x=[ 1 3 4 6]. The variance is defined simply as: Var(x)= ( (1-
3.5)+ (3-3.5) + (4-3.5) + (6-3.5) )/3 Notice that 3.5 is also mean value of the vector x
and divide with the number of elements-1. Here this is 3. If we didnt square the vector
the variance would become zero and nothing could be stated. A large variance means that
we have a large spread or deviation in the data compared to the mean value. If x has the
unit [Volt], then var(x) has [Volt] as unit. In Matlab: var(x)
Standard deviation:
Is simply the square root of the variance. This means that if we have a vector with voltage
measurements the standard deviation would also have the same unit. This is related to
some average deviation in vector x. In Matlab: std(x) or sqrt(var(x))
Correlation:
Suppose you have done several measurements and you are interested in finding out if
there are any dependence in the variation between two or more variables. This is called
correlation and is a number between -1 and +1. If the number is zero there exist no
correlation what so ever. For instance a positive correlation: more rain => more
umbrellas are sold. If we would have measured the rain in [mm] a certain region and the
number of sold umbrellas in the same region. A negative correlation could be between
number of sold umbrellas and sold sun protection. If we could measure both these
variables in a certain region during the summer we would probably discover a
covariation.
Assume an increase in the number of sold umbrellas and most likely we would find a
decrease in the number of sold sun protection. Even if we have two variables that produce
a non-zero correlation coefficient It doesnt necessarily mean there is a true dependence
between the variables. We can always find some very bizarre correlations among
variables, but it doesnt mean they can be described by a function. Probably it is possible
to that there is a co-variation between number of goats in Spain and the number of sold
umbrellas in England. But very few of us would then state there is a function that could
relate these two variables to each other.
If we have correlation and plot one variable versus the other this can be seen as
proportionality in the graph. In Matlab: corrcoef(x,y)
Normal distribution:
Many tests based on samples have this kind of distribution. They describe a well known
curve shape called normal distribution. It has two very important properties: the center
value (mean value) and the width of the curve (standard deviation). In the figure below
we illustrate an example with mean value 0 and standard deviation 1. The curve was
made by the following commands and created by 10 000 000 random numbers.
>> u=randn(1,10000000); % dont forget the semi-colon.
>> hist(u,100) % 100 intervals.

Mean Value 0 and Standard Deviation 1
Of course the mean value could be any value and width doesnt need 1. Suppose that we
put two vertical lines at 1 (standard deviation). This makes 68.3 % of the area. If we put
the vertical lines at 1.96 the area becomes 95 % of the area.
Confidence interval:
In the above figure it means that the true mean value is within the confidence interval
with 95% probability. In order to have a 95% probability we need to expand the interval
to + 2.58. We could say that the confidence interval is a measure of the uncertainty
coincidence contributes with in finding the true mean value.
Exercise 1: Read two vectors and try some of the statistics commands
>> X=1:5; Z=[ 01 4 7 12];
% Calculate the mean and median value of X and Z.
>> mean(X), mean(Z)
% The result should become 3 and 4.8.
>> median(X), median(Z)
% The result should be: 3 and 4.
% Standard deviation for the vectors.
>> std(X), std(Z)
% the answer is 1.5811 and 4.8683 respectively.
% This seems very logical, due to the large spread
% in the elements of vector Z.
% Now lets plot Z versus X. See figure below
>> plot(X,Z), grid

Plot Z versus X


Use the menu of the figure window and choose Tools-> Data Statistics. Now there will appear a
small box where you can choose: min, max, median, std and range both for X and Z values.
Mark mean and std for the Z vector. This will give three dotted horizontal lines on the plot. The
upper is mean value + standard deviation, in the middle we have the mean value and finally the
lower one corresponds to mean value standard deviation. See the figure below.

The Relationship Between Standard Deviation and Mean Value
In the figure above it seems plausible enough to believe there is a positive correlation between X
and Z. Id est when Z increases so does X. Lets use Matlab to calculate the correlation.
>> corrcoef(X,Z)
ans =
1 0.97435
0.97435 1
It becomes a 22 matrix. The element (1, 1) indicates that there is a correlation 1 between X and
X. Element (2,2) gives the same information for Y and Y. Element (2,) and (1,2) gives the
correlation between X and Y. This means we have a strong positive correlation (0.97435)
between X and Y. The commands we used for statistics can have a matrix as an argument.
Create the matrix A=[X;Z].
mean(A) Gives a vector, where each element is the mean value of a specific column.
median(A) Gives a vector, where each element is the median value of the column.
std(A) Gives a vector, where each element is the standard deviation of a specific column.
Try the commands. Are there any surprises in the table?
We will also repeat some previous matrix manipulating commands that can be used to calculate
sums, differences and products.
prod(A) Gives a vector with elements, that are the product of the column elements.
sum(A) Gives a vector with elements containing the sum of each column.
diff(A) Gives a matrix with elements that are the difference between two neighbouring
elements.
sort(A) Gives a matrix with elements in each column sorted in order.
Use the matrix A below and the vector X that was stated earlier.
>> A= 1 2 3
4 5 6
7 8 9
the commands in the above list can also be used on a vector. Try them on X.
>> prod(X), sum(X),diff(X)
% or
>> diff(X,2) % the same as diff(diff(X))
% Now try with the matrix A instead.
>> diff(A), prod(A),sum(A)
ans= ans= ans=
3 3 3 28 80 162 12 15 18
3 3 3
Finally create a new matrix F in order to find out how we can put a matrix together.
>> F=[ans; A]
F=
12 15 18
1 2 3
4 5 6
7 8 9
Use the command sort on the resulting matrix F.
>> [A,index]=sort(F) % results in two matrices: one sorted matrix A
% and one matrix containing the original position in the matrix A.
Notice that the sort command only operates within the column so therefore we only need one
index to keep track of the element position. We have so far in the course achieved simple text
display or very rudimentary tables. I will try to show some useful commands in order to create
better display of the output. The command fprintf can together with the use of flags specify the
output. How many positions that should be used and how many decimals and so on? We can use
it to write to the command window as well a text file.
Lets start by writing to the command window. Example: we would like to create a table
containing three columns. The first one has the numbers from 1 to 5, the second contains the
square root of the numbers and the third calculates the cube of the numbers. See below for a
suggestion of an m-file.
1
2
3
4
% Alt_1.m file created by MatlabCorner.com
% The m-file makes a table consisting of 3 columns.
% We also use format codes in order to control the output.
% \t=horizontal tab, \n=new line, %6.3f=6 positions and 3 decimals
5
6
7
8
9
x=1:3;
y1=sqrt(x); y2=x.^3;
Y=[x' y1' y2'];
disp(' x sqrt(x) x^3')
fprintf('%4.0f \t %6.3f \t %6.3f \n', Y')
The output in the command window will be:
x sqrt(x) x^3
1 1.000 1.000
2 1.414 8.000
3 1.732 27.000
A slightly changed m-file will more or less accomplish the same thing.
1
2
3
4
5
6
7
8
9
10
11
12
% Alt_2.m file created by MatlabCorner.com
% The m-file makes a table consisting of 3 columns. We also use format
% codes in order to control the output.
% \t=horizontal tab, \n=new line, %6.3f=6 positions and 3 decimals
disp(' x sqrt(x) x^3')
disp('---------------------------------')
for
x=1:3;
y1=sqrt(x); y2=x.^3;
Y=[x' y1' y2'];
fprintf('%1.0f \t %6.3f \t %6.3f \n', Y')
end
Finally we will use fprintf to write to a text file that we creates. We modify the m-file Alt_2.m.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
% Alt_2.m file created by MatlabCorner.com
% The m-file makes a table consisting of 3 columns. We also use
% format codes in order to control the output and write to text
% file:Alt_2.txt \t=horizontal tab, \n=New line,
% %6.3f=6 positions and 3 decimals
disp(' x sqrt(x) x^3')
disp('---------------------------------')
fid=fopen('Alt_2.txt','w') % creates a txt- file and writes to it.
for
x=1:3;
y1=sqrt(x); y2=x.^3;
Y=[x' y1' y2'];
fprintf(fid, '%1.0f \t %6.3f \t %6.3f \n', Y') % fid=file
identifier
end
fclose(fid) % closes the txt-file.
Run the m-file Alt_2. Then please check the file: Alt_2.txt
As you well can imagine now these formatting codes are very useful and there many others that
can be used together with the fprintf command , but there some other commands that use the
same formatting codes. Like fscanf can be used to read from text files or textscan that also reads
text and converts this to a cell array. See the table below for several format codes.
String Formatting Codes
n New line
r Carriage return
b Backspace
t Horizontal tab
f Form feed
%s String of characters
%e Exponential notation
%f Fixed-point notation
%u Decimal notation (unsigned)
%g The more compact of %e or %f
On the homepage we have a text file: name.txt. We shall now try to read with the use of
command textscan.
>> fid=fopen('name.txt','r'); % opens the file for reading.
>> C=textscan(fid,'%u%s%u%u'); % Gives a cell array.
>> fclose(fid) % closes the file for reading.
Take a look on the cell array C!
>> C{1,1},C{1,2},C{1,3},C{1,4}
As I have said earlier there are other commands that can do this equally well. Use the Matlab
help to find out what other possibilities there are to solve this.
Related posts:
1. Matlab Tutorial 2: Matrices in Matlab
[4]

2. Matlab Tutorial 1: Hello world, plotting, mathematical functions and file types
[5]

3. Matlab Tutorial 3: Strings in Matlab
[6]

4. Matlab Tutorial 6: Analysis of Functions, Interpolation, Curve Fitting, Integrals and
Differential Equations
[7]

5. Matlab Tutorial 7: Common Programming Structures and Conditional Statements
[8]

6. Matlab Tutorial 5: Linear Equations
[9]


Article printed from Matlab Corner: http://www.matlabcorner.com
URL to article: http://www.matlabcorner.com/matlab-tutorial-4-data-analysis-and-
statistics-with-matlab/
URLs in this post:
[1] data analysis: http://www.matlabcorner.com/tag/data-analysis/
[2] statistics: http://www.matlabcorner.com/tag/statistics/
[3] histogram: http://www.matlabcorner.com/tag/histogram/
[4] Matlab Tutorial 2: Matrices in Matlab: http://www.matlabcorner.com/matlab-tutorial-2-
matrices-in-matlab/
[5] Matlab Tutorial 1: Hello world, plotting, mathematical functions and file types:
http://www.matlabcorner.com/matlab-tutorial-1-hello-world-plotting-mathematical-
functions-and-file-types/
[6] Matlab Tutorial 3: Strings in Matlab: http://www.matlabcorner.com/matlab-tutorial-3-
strings-in-matlab/
[7] Matlab Tutorial 6: Analysis of Functions, Interpolation, Curve Fitting, Integrals and
Differential Equations: http://www.matlabcorner.com/matlab-tutorial-6-analysis-of-
functions-interpolation-curve-fitting-integrals-and-differential-equations/
[8] Matlab Tutorial 7: Common Programming Structures and Conditional Statements:
http://www.matlabcorner.com/matlab-tutorial-7-common-programming-structures-and-
conditional-statements/
[9] Matlab Tutorial 5: Linear Equations: http://www.matlabcorner.com/matlab-tutorial-5-
linear-equations/

You might also like