Professional Documents
Culture Documents
1 Recurrence Relations
I supposed that you have learnt the chapter Sequences & Series in Maths T
before you arrive at this chapter.
A recursive definition of a sequence specifies one or more initial terms and a
rule for determining subsequent terms from those that precede them.
Arecurrence relations for the sequence {an} is an equation that
expresses anin terms of one or more of the previous terms of the sequence,
namely, a0,a1, , an-1, for all integers n with n n0, where n0 is a non-negative
integer.
That was the formal definition of recurrence relations. When you say that
something isrecursive, it means that there is a repetition. So a recurrence
relation is basically just anequation which relates a term, with the term before
it. Lets take the arithmetic sequence 1, 2, 3, 4, 5, till infinity. So the term 2 is
derived from the term 1, by adding 1 to it. Similarly, there is the same
relationship for all the terms, which is to add 1 to it. We shall denote 1
as a0, which is the initial term. Then, we find that the term a1 which is related
to the initial term by the equation
a1 = a0 + 1
So after generalizing the sequence, we can conclude that the arithmetic
sequence can be represented by the recurrence relations
an = an-1 + 1
where n 0 (non-negative integer). Using this equation, and given the initial
condition a0, you can write down the rest of the terms by slowly adding all the
way up (just imagine if I asked you to find the term a109!). So now you know that
a recurrence relation is just an equation which has an and at least another
term an-x. Examples of recurrence relations are
an = 6an-2
an = 5an+4 - 2an+3 + n
In STPM, you will only be dealing with linear and 2nd order recurrence
relations, for bothhomogeneous and non-homogeneous.
Now that you know what a recurrence relation is, I will guide you with some basic
modelling. You need to learn how to use recurrence relations in a given situation,
or question. Let me start with 2 very famous examples, the Fibonacci
Numbers and theTower of Hanoi.
Now, here is the hard part. To solve this problem, you know that there are 2
initial conditions, a0 and a1, which are both 1 (a0 is the starting, which I will call it
as month 0, and a1 is for the first month). As we step into month 2, the amount
of pair of rabbits will be the number of pairs of rabbits in the previous month
(month 1) plus a new line of rabbit which it reproduced (which has the condition
of the rabbits in month 0). The progress goes on and every time we reach a new
month, we will add up the number of pairs of rabbits in the previous month with
the number of pairs of rabbits in the month before the previous month. So in the
end, we come up with the famous Fibonacci Sequence, which is represented
by the recurrence relation
fn = fn-1 + fn-2
I bet you got lost somewhere, but this is the best explanation I could come up
with. You can try reading the textbooks, and you might not even understand it at
all. We see that the Fibonacci sequence is a 2nd order homogeneous linear
recurrence relation. This chapter really needs you to think a lot.
Do you know that Fibonacci numbers also exist in sunflower patterns, pinecones,
and spiral seashells? Get to know more about Fibonacci Numbers in Nature
2 DISTINCT ROOTS
Given a recurrence relation an = 5an-1 6an-2, with initial conditions a0 = 1, a1 =
0. To start off with, we let an = rn. This is a smart guess which we will find
eventually that it is correct. We can then further deduce that an-1 = rn-1, and ann-2
. Substituting everything back into the equation, we have
2 = r
rn = 5rn-1 6rn-2
dividing the equation by rn-2 (which is the smallest power), we get
r2 = 5r 6
r2 5r + 6 = 0
which is a quadratic equation! This equation is called the characteristic
equation, and ris called the characteristic root. Solving the equation, we get r
= 2, 3. Again, using a smart guess, we deduce that the term an can be
represented by the equation
an = c12n + c23n
So you noticed that the 2n and 3n must have came from the characteristic roots
earlier on. This is the general solution of the recurrence relation. The
terms c1 and c2 are just 2 constants, which we will find by using the initial
conditions.
When a0 = 1,
a0 = c1 + c2 = 1
(1)
When a1 = 0,
a1 = 2c1 + 3c2 = 0
2c1 = 3c2
(2)
Now you have 2 simultaneous equations. Using the calculator, you can easily find
that c1= 3, c2 = 2. Substituting the constants back into the equation, you get
an = 3(2n) 2(3n)
which is what we called as the particular solution. This is the final answer that
we are looking for. Now that you substitute n = 109, you can get the answer
straight away for an! Now that you find the answer, try finding the first 5 or 6
terms, using both the recurrence relation an = 5an-1 6an-2 and the
equation an = 3(2n) 2(3n). Do they contradict one another? Congratulations,
you just learnt how to solve homogeneous recurrence relations!
2 EQUAL ROOTS
However, the above method is only true for 2 distinct roots in the
characteristic equation. Take another example, an = 4an-1 4an-2, a0 = 0, a1 =
1. You get a characteristic equation r2 + 4r + 4 = 0, r = 2. If you take the
general solution as an = c1(-2)n, then you are totally wrong. The correct answer
should be an = c1(-2)n + nc2(-2)n. Notice the extra multiplied n in the second
term. To summarize:
1. If the characteristic roots r1 and r2 are distinct, represent them as an =
c1r1n + c2r2n.
2. If the characteristic roots r are equal, represent them as an = c1rn + nc2r2n.
Distinct roots could be either real or complex. The method for both is the same.
I will not discuss the methods to solve higher order recurrence relations here.
However, the method is actually the same. Just represent an = rn, and you will
get linear, quartic or cubic equations, which you could eventually solve and get
an answer for it. Simple?
4.3 Non-Homogeneous Linear Recurrence Relations
Consider the following non-homogeneous linear recurrence relation:
an = { an-1 + an-2 } + { 3n + n3n + n2 + n + 3 }
(1)
(2)
Part (1) is the homogeneous part of the recurrence relation, which we now call it
as theassociated linear homogeneous recurrence relation. Part (2) is of
our interest in this section, it is the non-homogeneous part. Solving this kind of
questions are simple, you just need to solve the associated recurrence relation
(just like how you did in the previous section), then solve the non-homogeneous
part to find its particular solution. These two sections are solved separately,
which we will combine the results together in the end.
Now that you the details about these 3 inverse trigonometric functions, itll be
formulas and identities. Try to remember as many as you can. In fact, make sure
you know how to derive every single one of them.
Forward-Inverse Identities
Proving this one is not hard too. Make x = cos y, and make use of the
identity cos2 x + sin2 x = 1. The rest follows too. Just that probably the tan(cos1
x) one will be harder. Give it a try.
Prove the first one by letting x = cos (/2 y) = sin y. Try figuring out the rest
yourself.
sin-1 (-x) = sin-1 x
csc-1 (-x) = csc-1 x
cos-1 (-x) = cos-1 x
sec-1 (-x) = sec-1 x
tan-1 (-x) = tan-1 x
cot-1 (-x) = cot-1 x
This one is proven by letting sin y = x, and sin y = x. The rest follows.
I dont think this one will come out in exams. However, the proof requires you to
learn the inverse hyperbolic in the next section first.
This is one is the hardest to prove. Try proving using the formula
We now relate the hyperbolic functions with the hyperbola. The equation for the
hyperbola is
We let
x = a cosh u
y = b sinh u
We find that cosh2 u sinh2 u = 1, which is true (This can be proven by
substituting theex into the equation). Now that we have 2 hyperbolic functions,
we use it to further derive a few other functions following a similar convention
which the trigonometric functionuses:
All these 6 hyperbolic functions have their special pronunciation. sinh is read as
shine,cosh as cosh, tanh as than, sech as sheck, csch as co-sheck
and coth as cough.
Now we shall see the graphs of the 6 hyperbolic functions. Note that they are all
derived from the exponential function:
cosh x
x
sinh
tanh x
sech x
x
csch
coth x
Now that you know the basic information of these functions, its time to
memorize formulas. But before you start, I need to introduce a special rule which
makes the memorizing easier.
The Osbornes Rule states that to change a standard ordinary trigonometric
identities into the equivalent standard hyperbolic identity, change the sign of the
term which is the product of two sines, and substitute the corresponding
hyperbolic functions. This means that if you remember all the trigonometric
identities, you can remember the hyperbolic identities. Please note that all the
trigonometric formulas which have the periodic characteristics (for example,
the R formula and the phase shifts) do not apply to hyperbolic functions, as they
are not periodic.
For each case, you should be able to derive them. Proving them is simple, just
plug in theex relation into it and you are sure to get it.
The formulas and identities are as follows:
Double-Angle Formula
Besides all these formulas, you should also know the relations between
hyperbolic functions and trigonometric functions. Use the following to derive
those for tanh x, sech x, csch x and coth x too. Bear in mind that i i = 1.
cosh-1 x
sinh-1 x
tanh-1 x
sech-1 x
coth-1 x
csch-1 x
Note that due to the definition of functions, we only take the positive y values of
the functions
cosh-1 x and sech-1 x. The domain and ranges are as follows:
There are not much formulas and identities for this section. But there is one very
important thing that you are suppose to learn how to prove, which is
the logarithmic form of inverse hyperbolic functions.
Please promise me that you will learn how to prove the rest, this
is super important.
Here are some identities to remember. Note that they are quite similar to the
inverse trigonometric ones:
For all the above identities, please try to prove all of them. Refer to the
section inverse trigonometric functions for some hints on the proofs.
Thats all for this chapter. Just remember how to proof them, sketch their graphs,
and manipulate these functions. You will need to master this chapter before you
can proceed to the next one
6.1 Differentiability of a Function
3. a discontinuity
2. a vertical
4. at end points
or
exists.
You can go on to prove that both formulas are actually the same thing. Of course,
differentiability does not restrict to only points. We could also say that a function
is differentiable on an interval (a, b) or differentiable everywhere, (-, +). Ill
give you one example:
Prove that f(x) = |x| is not differentiable at x=0.
These 2 formulas can be used at different situations, so if one doesnt work, use
the other. Differentiability is not a common question in STPM, but you should still
be able to make use of this important information.
6.2 Derivatives of a Function Defined Implicitly or Parametrically
You probably have learnt how to differentiate and integrate functions implicitly
and parametrically, but only up to the first order. Here, we will be learning how to
continue on to the 2nd order. It is actually very easy and straight-forward, so
there is nothing too important in this section.
IMPLICITLY
I think I dont need to tell you how to do it. differentiating a function implicitly for
2nd order is just the same as 1st order. Ill show you an example:
Find the 2nd order derivative of the function x2 + y2 = 2.
Note the use of the product rule in this question. Just do more exercises, then
you will get used to these kind of questions.
PARAMETRICALLY
Probably theres something new in this section. Again, Ill show you an example:
Consider the parametric equations x = t + 1 and y = t3.
Differentiating each other with respect to t gives
To find d2y/dx2,
But we cannot differentiate 3t2 with respect to x. Therefore, using chain rule,
So as a result, we get
From here, you can further deduce that the derivations of the derivatives of
inverse trigonometric functions should follow the same rule, i.e., differentiating
the functions implicitly, then making use of their trigonometric identities. The list
of derivatives of all the inverse trigonometric functions are as follows:
where a is a constant. You should try to prove each and every one of them as an
exercise.
You should further try to differentiate these functions with complicated variables
using all the differentiation rules you learnt. For example,
while
To do this, you need to make use of integration by parts. If you followed the
formula in the Maths T formula sheet, it would be
However, I suggest that you use this formula which makes you remember easier:
Before I continue, let me explain this formula. Normally, you only use integration
by parts when you are trying to integrate a product of 2 functions, which are
most likelylogarithmic, exponential, polynomial and trigonometric
functions. So in any case, you let one function be u, and the other function
be v. Notice that v has to be a function that is easy to integrate, while u has to
be the other one which is hard to integrate / easy to differentiate. In words, this
formula can be read as
Integration of u v = [ u integrate v ] integration of (differentiate u
integrate v)
Never mind if you dont get it, as long you have your own version of I by P. So
continuing on integrating sin-1 x, we let u = sin-1 x, and v = 1. We have
Get it? So the important tips to this question is to put v = 1 (you might recall
that this is the method you use to integrate ln x). So the rest of the functions,
after integration gives
Try to derive all of them as an exercise. Note that the term ln [x + ( x2 1)] is
actually acosh-1 x function.
6.4 Derivatives & Integrals of Hyperbolic & Inverse Hyperbolic Functions
The derivatives and integrals of hyperbolic functions and inverse hyperbolic
functionsare very similar to those of trigonometric and inverse
trigonometric functions, just with a difference of a negative sign somewhere
within the formulas. There is no rule that we can tell where the minus sign has
changed, so this section requires a lot of memory work.
HYPERBOLIC FUNCTIONS
The derivatives of hyperbolic functions can be derived easily by converting the
functions into their exponential form. Ill leave it for you as an exercise to derive
all of them. The list of derivatives are as follows:
As you can see, the derivative of sinh x is cosh x, and vice versa, which is
different from trigonometric ones by a minus sign. The functions whose
derivatives have minus signs are the secondary hyperbolic functions, csch x,
sech x and coth x.
The integrals, again, are very similar to trigonometric integration.
The integrals for sech x and csch x may look a little weird. You should try to
differentiate the right hand side and see whether you get the expression on the
left. Again, you should do some homework to derive all of them.
INVERSE HYPERBOLIC
Again, the inverse hyperbolic functions have similar derivatives to what the
trigonometric functions have, and it is just a matter of a minus sign, with or
within the square roots. Deriving is similar: derive them implicitly and make use
of the hyperbolic identities (do not confuse with the trigonometric ones.
Remember Osbornes rule). Here you go
The integrals, as usual, are harder to do. You need to use integration by parts, as
I said in the previous section. Try doing them as how you did for the previous
section. As a matter of fact, the huge ln terms in the integrals of csch1
x and sech-1 x are just logarithmic forms of cosh-1 x and sinh-1 x.
so many kinds of derivatives and anti-derivatives. But thats not the end yet, as I
havent combine some results that can be obtained from both these sections,
you will see it only in the next section. Beware, the next section is not as
easy
6.5 Reduction Formulae
WARNING: You need to fully master integration and differentiation before you
continue on this section.
As you can see, there is a pattern that you can easily memorize. Its either of the
form a2x2, x2a2 or a2+x2, whether with the square root or not. You also see
that they are all quadratic expressions, in which you could use the method
of completing the squares to solve similar cases. For example,
Notice that if you didnt, you would have got a different answer.
cant be solved by normal ways. You might have learnt one trigonometric
substitution to solve this kind of questions in Maths T. But now that you have
learnt hyperbolic functions, your vocabulary of substitutions increases to 3 of
them. Whenever you face the integrals of this kind, you will:
This kind of integration makes use of the half angle formula. This applies to
hyperbolics as well.
b.
From here, you do integration by parts, with t2 as u and the term in the bracket
as v.
c.
Notice that it must be e2x. Here you use the substitution ex = sinh x. Similarly, if
the term in the square root was e2x 1 or e2x + 1, you substitute ex as cosh
x or sin xrespectively. Try and see whether it works.
d.
You might want to try proving this before you use it. This will be useful for the
next section.
e.
I actually learnt this in University. You should remember this by memory, it might
come useful.
Alright, lets get into the topic:
REDUCTION FORMULAE
A reduction formula is an expression of a definite integral in terms of n,
relating the integral to a similar form of itself. For example,
Notice that firstly, it is a definite integral, which means that it has upper and
lower limits. Then, it relates to itself, with a decrease of power or so. These
formulae can be very helpful, especially when you calculate high powers of these
functions. So if you want to find
we have
handing over the sinn x term from the right to the left, we get
and more
Hope you havent start to freak out yet. I seriously havent tried proving all these
Reduction Formulae, so if you have done so, I salute you. I can give you some
tips here though:
1. Break down cosn x = cos x cosn-1 x and tann x = tan2 x tann-2 x.
2. Try checking out the expressions on the right. When theres a n 1, you know
that the term with the power of n needs to be differentiated once, and n 2, will
be differentiate twice. m + 1 means that term will be integrated.
3. For those which are related to polynomials and roots, you will find the formula
d. above very useful.
6.6 Applications of Integration
You probably have learned how to find the area enclosed between the
function f(x) and the axes, or between 2 functions. You have also learned
the volume of revolution for a function f(x) with the x or y-axis as the axis of
rotation. In this section, youll be learning 2 new applications, which are the arc
length and the surface area of revolution.
ARC LENGTH
Consider 2 points, P and Q, on a curve. P is the point (x, y) and Q is the point (x
+ x, y +y). Let s be the length of the arc from a point on the y-axis,
and s the length of the arcPQ. Since s is very small, we can approximate the
arc PQ to a straight line. Hence, using Pythagoras theorem, we have
(s)2 = (y)2 + (x)2
Dividing by (x)2, we obtain
As x 0, this gives
Let A be the area of the surface formed by rotating the curve y = f(x), between
the lines x = a and x = b, about the x-axis. Let the curved surface area of
a blue ring shown be A. Treating the strip as being bounded by 2 cylinders, we
have
2y s A 2(y + y) s
As x 0, s 0, so we have
Again, differentiate the function, and substitute it into the formula to find the
surface area of revolution.
7.1 Taylor Polynomial
A power series is an expression of a function as a sum of infinite
polynomials. Every differentiable function f(x) can somehow be approximated
by a series of polynomials, such that f(x) = a + b(x-x0) + c(x-x0)2 + d(x-x0)3 +
e(x-x0)4 + + f(x-x0)n
When x is close to x0, and where a f are constants. If you remembered
the Binomial Expansion for real numbers, the function (1+x)r can be
represented by the series
Compare the Binomial Series above with the formula for f(x). You see that it is
just a special case of the above function, such that x0 is zero, and the constants
are defined in a special relation.
Our question is this: Since we could represent the above bracketed polynomial
function as an infinite series of polynomials, so is it possible that we represent
other functions, like sin x, ln x, ex or anything else? If it is doable, how do we
determine the constants a, b, c and so on as in the function f(x) above?
Let me explain this a little. The term a is used when we measure the f(x) close to
it. For example, when a = 0, we substitute it into the series, and the new
expression will be definitely quite accurate for estimating values x which are
close to a (of course, for certain functions, the value x is accurate for whatever
value a. Well discuss this in the later section). This means, we vary a to
approximate the different values of the same function.
Then, the term f(a), f(a) are the 1st and 2nd derivatives of the function f(x).
Note that the term f(n)(x), the n has a bracket, to tell us that it is not the nth
power of f, but the nth derivative of f. The entire series is what we called
as Taylor series. All those terms between the equal sign and the Rn are called
as the Taylor polynomial, and sometimes we denote this whole chunk of
polynomial as pn(x). Writing the whole equation in another form, we have
Now, the term Rn(x) is what we call as the remainder term. Since the Taylor
series is an infinite series, we wont possibly write down all the terms of the
series. So sometimes we just set our limits, for example, we want the series
corrected till the 6th order. So in this case, we see that Rn(x) is the difference
between f(x) and the sum of its first 6 polynomials.
The remainder term, could also be written as
Ill try to give you an illustration to make you understand how this Taylor
Series thingy work. By the way, we are not required to prove the formula for
Notice that the blue line sketches the exact graph of the function f(x). As I said
earlier, the Taylor series is only an estimation. This means that, the more Taylor
polynomial terms we keep, the more accurate the Taylor series estimates the
function f(x). Look and see that the graph of degree 1, and degree 2 are
actually quite far off from representing f(x), but is quite accurate for values
of x near 0. As the degree of polynomial increases, the graph of the Taylor series
will eventually be the same as the actual function f(x).
So now, we want to learn how to find the series for some functions that we know
of. Lets try ex. Since there can be an infinite amount of Taylor series expanded at
any a, we shall focus on deriving the Maclaurin series of functions.
Recalling the formula,
We find that ex will still be itself after infinite derivatives, and e0 = 1. So plugging
in what we have to, we get the Maclaurin series
Try finding the Maclaurin expansion for other functions, ln (1 - x), sinh x, and
any other functions you can think of. Note that not all Maclaurin series of
functions could have such beautiful series. Some might end up with non-ordered
coefficients.
Ignore the alien language first. Continuing from the previous part,
the remainder of the series is actually quite significant. When you use a Taylor
series to estimate something, you are interested in knowing the error you
estimate, or the difference between your estimate and the actual value. If you
remembered from the previous section, the remainder is given by the formula
The formula gives the exact error when f(x) is approximated by the nth Taylor
sum. The problem is that it is too difficult to evaluate it this way, so we are going
to find anoverestimate of the remainder instead. We look at the magnitude of
the (n + 1)th derivative of f(t) as t varies between a and x, and overestimate
that by a single number M(known as upper bound, as stated above). So here,
we are saying that the remainder is definitely smaller or equal to the upper
bound, and thus the formula above,
and then
To go on, we need to use the formula above. To find M, we need to first find f(n+1)
(x),which is 6x-4. Remember the part above which says | f(n+1)(x) | M, we find
that the maximum value of 6x-4 is 6 if we use values 1 x
2 (interval I containing a), so we have
(Note that if you are finding f(n+1)(x) = cosn x or sinn x, then M 1 instead.
Useful information.) Now, the different thing here compared to the previous
example is that we dont know n, so we cant substitute n for any value (in fact,
we are looking for n!). But we do have another piece of information, which is, to
5 decimal places. We take that decimal place, give it a 50%, and now the we
know that the remainder must be smaller than0.000005. So we have
By trial and error, we find that n = 9, then the equation holds. Therefore,
Which is
Note that the integration in the integrating factor doesnt need a constant,
because it will eventually cancel out later. So multiplying it both sides,
HOMOGENEOUS CASE
A second order homogeneous linear differential equation has the form
where a, b and c are constants. We first give a smart guess (ansatz) that the
solution has the form y = Aenx, where A is a constant, and n is an integer.
Differentiating it yields
and once we substitute all equations into the differential equation, and
eliminating Aenx, we get a quadratic equation of the form
which we call as the auxiliary equation. From here we can see that y = Aenx is
indeed a solution for the 2nd order differential equation, provided that the value
of n satisfies this equation. Once we find the values of n, we can thus write down
the general solution of the differential equation.
However, the equation will give you 3 outcomes, which is either it has 2 distinct
roots, 2 equal roots or 2 complex roots.
Case 1: 2 Distinct Roots
In this case, suppose the auxiliary equation gives you 2 roots n1 and n2. your
answer for ywill be in the form of
Remember that your initial guessed solution for the differential equation was y =
Aenx? Notice that if y = Aenx and y = Bemx both are solutions of the the
differential equation, then the sum of both the solutions, y = Aenx + Bemx is also
a solution for the differential solution. That is why, our solution for y is the sum of
both solutions. You may want to prove it. Given the differential equation
You find the auxiliary equation to have the values n = 1, 2 respectively. Do try
substituting y = Ae-x, y = Ae-2x and y = Ae-x + Be-2x into the equation. All of
them are consistent, arent they?
Case 2: 2 Equal Roots
Suppose your auxiliary equation gives you only one value of n. Your answer will
be in the form of
When there is a repeated root, you multiply it by x. Try recalling the connection
of this chapter with what you learnt in the chapter Recurrence Relations.
Case 3: Complex Roots
Suppose you get 2 complex roots, m + in and m in. Your answer will then be in
the form of
Notice the second line of the equation. Remember the fact that
e(m+in)x = emx(cos nx + i sin nx), and you get y = emx[ (A + B)cos nx + i(A
B)sin nx ], in which you represent the terms (A + B) and i(A
B) as C and D respectively. You will be surprised that D is actually a real
constant, so somewhere on the way, A and B must have been complex.
As I said, these are the forms of general solutions that you can get. To get
a particular solution, you need to have an initial condition, something like
when y = 1, x = 0 or so. The particular solution eliminates the constants ABCD,
and gives them in terms of real numbers instead.
NON-HOMOGENEOUS CASE
A second order non-homogeneous linear differential equation has the form
out of your syllabus, in which the solving of these kinds of differential equations
will require the Method of Variation of Parameters. Try google for it if you
want to know more.
The solving method is easy. First you separate the differential equation into 2
parts. You let the first part = 0,
and this is solved just as above, by finding the auxiliary equation and then
representing the answer in the form of y = g(x) = Aenx + Bemx. This solution is
called as thecomplementary function (CF). The other part f(x) will have the
solution y = h(x), which is called as the particular integral (PI). Remember
that the sum of solutions is also a solution, so our final answer will be
y = g(x) + h(x)
Since you already know what to do with the CF, we will introduce methods to
solve the PI below, which depends on what h(x) is.
Case 1: h(x) is a Polynomial Function
You should just substitute the PI as a polynomial function. For example,
You already know the CF from above, which is y = Ae-x + Be-2x. Then to find the
PI, you let
y = Ax2 + Bx + C, according to the degree of the polynomial. Differentiating,
you get
and the general solution, being the sum of the CF and the PI will be
Try not to get confused with the constants of the CF and the PI, in which here, I
have 2As and 2 Bs. I would suggest you that you should name the constants for
the PI as C, Dand E instead. This rule applies for any polynomial of degree n.
However, there is an exception, when your auxiliary equation has a root n = 0.
Since Ae0 = A, you already have a constant term in the CF. So for your PI, you
need to multiply your solution with an extrax. So if your
f(x) is 4x + 3, your PI should be Bx2 + Cx instead of Bx + C. Similarly, you can
guess that if the CF has a double root n = 0, you will then multiply your PI
with x2. Try relating this information with the chapter on Recurrence Relations.
Case 2: h(x) is an Exponential Function
This is easy. If f(x) = 5e2x, our PI will be just y = Ce2x. Just differentiate y to
get dy/dx andd2y/dx2, substitute it into the equation, and find A. Again in this
case, there are exceptions. If your CF already has a term Ae2x, then like the
above, you multiply x in front of the PI to give you y = Cxe2x. If your CF is y =
Ae2x + Bxe2x, then your PI will be y = Cx2e2x, multiplying x2 this time. Not hard I
think. If you are given
Your CF is the same, y = Ae-x + Be-2x. Your PI will be y = Cex + Dxe-2x, and you
should further solve the equation yourself.
Case 3: h(x) is a Cosine or Sine Function
If f(x) = 5sin 2x, or f(x) = 4cos 2x, or f(x) = 6sin 2x + 7cos 2x, your PI will
be the same, which is y = Ccos 2x + Dsin 2x. Notice that whether you have
only sines or only cosines, you still have to come up with both cosines and sines
for your PI. The reason is simple, if you only come up with one of them, your
solution is not solvable. Again, there is an exception, which is when your auxiliary
equation might have totally imaginary roots, which happens to give your CF a
sine or cosine function of the same form. As usual, just multiply an x in front of
your PI. For example,
SUBSTITUTION
If you could recall what you learned in Maths T, you have already learned how to
use the substitutions v = ax + by and y = vx to transform a complicatedlooking differential equation into one that is solvable. You can apply those skills
in 2nd order differential equations too. Other kinds of substitution include x =
u0.5, u = xy, but I want your attention on solving differential equations of the
form
From here, find dy/dx and d2y/dx2 by using the chain rule.
which is solvable.
PROBLEM MODELLING
Seriously, I have looked through many books, but none of them really teach us
about modelling for 2nd order differential equations. You should be familiar with
modelling of 1st order differential equations though. So here, I have no choice
but to introduce to you some university level stuff.
1. LRC Circuits
The potential differences of an inductor, a resistor and a capacitor are
denoted by
So this means that the total voltage across the 3 elements put in series is equals
to
where m is the mass, and k is the spring constant. Notice that this is a 2nd order
differential equation! Solving this makes you find x in terms of t. A damped
oscillator has an extra term in it,
mx + bx + kx = 0
where b is the drag constant. A forced oscillator, in turn would be
mx + kx = F(t)
where the force F is a function of time, probably a sine or cosine function. You
could have guessed it, that a forced damped oscillator would be
mx + bx + kx = F(t)
With these information, you are able to model a second order differential
equation once you know all the factors m, b, k and F.
There are a whole lot more of physics equations which requires differential
equations, like the famous Schrodingers Equation and other higher level
stuff, which requires higher level physics. I better stop here before I turn this into
a physics lecture instead.
9.1 Divisibility
Number Theory is considered one of the hardest sections in Mathematics. It is
the study of the very fundamentals of numbers, yet can be very complicated.
Information on this chapter for such a level of study is very rare, so I hope you
will appreciate everything that I have for you over here.
We have been learning division since standard 2. But today, we will look at it at
a different manner. If a and b are integers with a 0, then we say
that a divides b if there is an integer c such that b = ac. When a divides b we
say that a is a factor of b and that b is a multiple of a. The notation a |
b denotes that a divides b (which means, there is no remainder). We write a
b when a doesnt divide b. For example, 2 | 4, but 4 2. Take note that the
notation 2 | 4 and 2/4 are 2 different things. The former is the notation for
divisibility, while the latter is simply a fraction.
There are certain rules of divisibility that you should know. These are:
1. If a | b, b | c, then a | c.
You should know how to prove this. As above, the term a | b can be written as ak
= b, bl = c, and therefore akl = bl = c. Here, k and l are integers.
2. If a | b, a | c, then a | (b + c) and a | (mb + nc).
3. If a | b, then a | bc.
The above 2 can also be proven with the similar notation as 1.
Not every 2 numbers can divide each other. For example, 2 does not divide 7, as
it leaves a remainder of 1. Here we represent the above in an equation, which is
7 = 23 + 1
Here, 3 is the quotient, we denote the quotient as a div b, which in this case, 2
div 7 = 3. 1 is the remainder, which we denote as a mod b, and here we
have 2 mod 7 = 1. Note that a remainder has to be positive. For example, 7 =
2 3 1 is wrong, because it then gives us 2 div 7 = 3 and 2 mod 7 =
A prime number is a number that is only divisible by 1 and by the number itself.
A number which is not prime, is called as a composite number. The smallest
prime number is 2, and it goes on as 3, 5, 7, 11, 13, 17, 19 and so on. The
interesting thing about prime numbers is that, you are unable to write a formula
to determine the sequence or series of prime numbers. So therefore, if we want
to find a very huge prime number, we need to slowly divide the number by
almost every possible number before we say that it is prime. One very famous
example used in the past is the sieve of Eratosthenes, which is used to find all
the primes below 100. It is done by first listing down all the numbers from 1 to
100. Then, slowly cross out the multiples of 2, 3, 4 and so on, until you have
nothing to cross out. The rest of the numbers, are primes! Another one is The
Prime Number Theorem. You might wanna google about it.
So how do you know whether a number is prime, for a relatively small number?
There is a way to find out, at least a little faster than trying to divide the number
by any number smaller than itself. It is found that if a number is not
divisible by primes less than itssquare root, then it is a prime number. This
can be proven. If we have a composite number n such that ab = n, then if a >
n and b > n, then we have ab > n n > n, which is a contradiction.
Although it does speed up the process of finding primes, it is still quite a slow
method.
Prime numbers are the building blocks of all numbers. the Fundamental
Theorem of Arithmetic states that:
Every positive integer > 1 can be written uniquely as a prime or as the product
of 2 or more primes where the prime factors are written in order of nondecreasing size.
This is what we called as prime factorisation. For example, 4 = 22, 100 =
2252, 641 = 641 and so on. We can write down any number in terms of products
of primes, a = 2x3y5z7w and so on.
Theres a lot to talk about prime numbers. One famous argument was to prove
that there are infinitely many primes. Suppose you label every prime number
as p1, p2, p3 and so on. You found the greatest prime number in the world, called
as pn. So if we write a particular number a such that a = p1p2p3pn + 1, it must
have been a prime, since it couldnt be represented as the product of any primes
smaller than pn. This contradicts with what we said earlier on about finding the
greatest prime number, and therefore proves that there are indeed infinitely
many primes.
Another 2 interesting stuff on prime numbers are the Goldbachs
Conjecture and theTwin Prime Conjecture. Go look up on it if you are free.
Now, lets move on to the gcd and lcm. Try recalling whether this sounds
familiar to your Form 1 Mathematics. gcd is the greatest common
divisor (you are probably more familiar to the name highest common factor,
or HCF), while lcm is the lowest common multiple. Here we denote k = gcd
(a, b) to have the meaning of k is the greatest common divisor of the
integers a and b. Similarly, k = lcm (a, b) means k is the lowest common
multiple of the integers a and b. For example, gcd (4, 6) = 2 and lcm (5, 6) =
30.
Relating this back to prime numbers, for any 2 integers a and b, if gcd (a, b) =
1, we say that they are relatively prime. For example, 5 and 6 are relatively
prime.
Do you still remember the method to find your lcm and gcd in Form 1? You had to
draw out something like a ladder or so. But here, we will use another method,
which has something to do with the prime factorization. For example,
Find gcd (120, 500) and lcm (120, 500).
We first start by representing the numbers 120 and 500 in terms of primes.
120 = 23 3 5
500 = 22 53
Now, the formulas to find the gcd and lcm are easy, it is just
gcd (a, b) = p1min(a1,b1)p2min(a2,b2)p3min(a3,b3)pnmin(an,bn)
lcm (a, b) = p1max(a1,b1)p2max(a2,b2)p3max(a3,b3)pnmax(an,bn)
You first compare the primes present among the 2 numbers 120 and
500. p1max(a1,b1)means the maximum of the powers of that particular prime p 1 of
the 2 numbers a and b, while p1min(a,b) means the minimum. So plugging in the
numbers, we have
gcd (120, 500) = 2min(3,2) 3min(1,0) 5min(1,3) = 223051 = 20
lcm (120, 500) = 2max(3,2) 3max(1,0) 5max(1,3) = 233153 = 3000
From here, we obtain a new formula, as we can see that
ab = gcd (a,b) lcm (a,b)
The method described for computing the greatest common divisor of 2 integers,
using the prime factorizations of these integers, is inefficient. The reason is that
it is time consuming to find prime factorizations. Now I will teach you a more
efficient method of finding the gcd, called the Euclidian
Algorithm (also Euclids Algorithm). It is named after the ancient Greek
mathematician Euclid, who included a description of this algorithm in his
book The Elements. Lets start with an example.
Find gcd (91, 287).
First, we use the smaller term to divide the bigger term. Then, we take the
divisor of and the remainder of the equation, repeat the process, until we get no
more remainder. The last remainder is the gcd that we are finding. So we have
287 = 91 3 + 14
91 = 14 6 + 7
14 = 7 2
Consider how you read your time on the clock. Every time the short hand goes
one round, it will be 12 hours. So when the shorthand goes past another hour, it
will be 13 hours, and the time might be 13 o clock. We know, however that 13 o
clock is actually 1 o clock. Same to 25 o clock, it still means the same thing. We
say that the clock follows a modular system.
Modular Arithmetic, is the calculations of numbers in a modular system. In the
clocks system, it is of modulo 12. When two numbers a and b are congruent to
each other in the same modulo, we denote it by
a b (mod m)
This equation is read as a is congruent to b modulo m. For example, 13 1
(mod 12), this means that 13 is the same as 1 in a modulo 12 system. Note that
the main equation is the part on the left hand side, 13 1, while the right hand
side, (mod 12), tells you that this equation is valid only in modulo 12. This
modulo system also has another explanation for it. a b (mod m) means
that a and b give the same remainder when divided by m. Notice that 13 divided
by 12 gives remainder 1, while 1 divided by 12 also gives the remainder 1. Or
using the mod terminology, we say that
a mod m = b mod m
Take note that a b (mod m) and a = b mod m both bring different meanings.
The latter says that a is the remainder when b is divided by m.
Now, bringing divisibility in, we say that
a b (mod m) if and only if m | (a b)
Can you see that m divides a and b? And if that is the case, a and b actually
have a difference of a multiple of m. So this means that, 49 37 25 13
1 (mod 12). You just add 12 to the number, you get another number which is
congruent modulo 12.
If I convert this notation a b (mod m) into algebra, it can be written as a = b
+ km, where k is a constant (try verifying this with the divisibility notation
above). So to summarize things up:
When a b (mod m), then
a mod m = b mod m
m | (a b)
a = b + km
To summarize this rule, it means that a constant c can only be divided out
from a, b andm if it divides all of them. Provable too.
Heres another one not to be confused with the former, the cancellation law.
If gcd (c,m) = 1, then
9. ac bc (mod m) a b (mod m)
You can prove this too. Suppose ac bc = (a b)c = km. Since gcd (c, m) =
1, c and mhave no common divisors, and therefore c | k. Since c divides this
constant k, c can be cancelled out, and thus a b = nm for some integer n.
Here we see that a b (mod m), which was to be shown.
We have now shown the gcd (123, 2347) in terms of a linear combination of its
numbers. This is what we called as the extended Euclidean Algorithm. Here,
we find that the inverse of 123 modulo 2347 is 706. We see that
-706 123 86838 1 (mod 2347)
Note that every integer congruent to 706 modulo 2347 is also the inverse of
123, which we find it best to represent the inverse of 123 as 1641, a positive
integer less than 2347.
I havent tell you why this works. Since gcd (a, m) = 1, and we know that it can
be represented as a linear combination 1 = m n + a b, we can show that
m n + a b 1 (mod m)
You should understand this equation. If 1 = 3 2, then 1 3 2 for whatever
modulo, and that make sense. Here, since m n 0 (mod m), as this is
obvious, since m divides itself completely, in whatever given n. So in the end, we
have a b 1 (mod m), which was what we used just now. Note that not all
integers have inverses in a particular modulo. It is only in the case where gcd (a,
m) such that there will be an inverse.
By the way, the inverse could also slowly be found by trial and error for small
moduli. For example, 2 mod 3. Try multiplying the numbers between 1 to 3 to
the number 2, and you find that 2 2 4 1 (mod 3). And thus, 2 is the
inverse of 2 modulo 3.
solution. Lets try to solve the linear congruence 2x 6 (mod 8). You can solve
it as follows:
Using the simplification law, you see that 2 divides 2, 6 and 8 and therefore
x 3 (mod 4)
which is in another modulo system. If you want the solution to be in the same
modulo system, then you need to do some modification. By looking at the
equation, you know that
x 3 (mod 8)
is one solution. The other solution is by adding 3 to the new modulo system you
get above, which is 4. You get another solution,
x 7 (mod 8)
So your solution for the linear congruence 2x 6 (mod 8) is x 3 (mod 8), x
7 (mod 8).
This same method applies: When there are 10 solutions, you keep on adding the
new modulo system integer value to the existing answer, until you get 10
solutions.
Lets try another one, 2x 6 (mod 9). Using rule number 9 above, you can
quickly see that gcd (2, 9) = 1, and therefore x 3 (mod 9). Try not to confuse
this one with the one above.
For this system of congruences to have a solution, there must be an inverse for
the matrix. This means, that ad bc must not be zero, and must exist. Lets
multiply the left and right hand side with its adjoint matrix:
and for such linear congruences to have solution, again we must make sure that
the equation gcd (ad bc, m) = 1 holds. With that you can solve the above 2
linear congruences for x and y. This kind of question came out in STPM 2009, my
year. Try solving it with the method I just showed you.
MODULAR EXPONENTIATION
I dont think this is in the syllabus, but it is good for you to know. Modular
exponentiations are of the form an mod m. You are normally asked to compute
it with a very big value of n. For example,
Find 3101 mod 100.
First, do you still remember what are binary numbers? Express the term n in
binary form, by keep on dividing the number with 2, writing the remainder by the
side. Recall your Form 4 Maths:
This theorem is here for you to identify if a congruence can be solved easily.
Similarly, I wont prove it, so just keep this theorem in mind and use it if needed.
10.1 Graphs
In mathematics and computer science, graph theory is the study of graphs,
mathematical structures used to model pairwise relations between objects from
a certain collection. Agraph, G = (V, E) consists of V, a nonempty set of vertices
/ nodes and E, a set of edges. In other words, a graph is a discrete structure
consisting of vertices, and edges that connect these vertices. Each edge has
either one or two vertices associated with it (endpoints). An edge is said
to connect its endpoints. A graph looks something like this:
As you can see, a and b are vertices, while e and f are edges. the edge g is
called a loop. The vertex set V = {a, b}.
In this section, there will be many terminologies which you should remember,
and should be able to write down their definition in your exam. Here we will be
learning the different kinds of graphs and their names:
An infinite graph is a graph with infinite vertex set (or rather, an infinite
number of vertices). The definition of a finite graph is just the converse.
Throughout this section, we will only be learning about graphs with finite amount
of edges and vertices.
A simple graph is a graph in which each edge connects two different
vertices and where no two edges connect the same pair of vertices.
A multigraph is a graph that hasmultiple edges connected to
the same vertices, while a pseudograph is a graph that may
include loops, multiple edges connecting the same pair of vertices. The 3
pictures below illustrate a simple graph, a multigraph and a pseudograph:
Notice for the multigraph, there are 2 edges connecting both a to b and a to c,
while 3 edges connecting e to f. As for the pseudograph, there exist loops at the
vertices e and f.
The complement of the graph, GM, has the same amount of vertices as
graph G but whenever there is a edge between vertices a and b, there wont be
an edge, and whenever there isnt an edge between vertices a and b, an edge is
added to it. This only applies to simple graphs. For example, below is the graph
and its complement:
All the above graphs are undirected, that means that one can traverse an edge
in both directions. A directed graph (or digraph), consists of a nonempty set of
vertices and a set of directed edges (or arcs). Each directed edge is associated
with an ordered pair of vertices. The directed edge associated with the ordered
pair V = (u, v) is said to start at uand end at v. In other words, we say
that u is adjacent to v, while v is adjacent from u. Notice thee different uses
of { } and ( ) brackets for undirected and directed graphs. Below is a directed
graph:
For the ordered pair of vertices (u, v), we say that u and v are adjacent, and we
say that the edge is incident / connects u and v. u is known as the initial
vertex and v being theterminal vertex. Using the similar naming convention,
we can describe a simple directed graph as a directed graph in which each
edge connects two different vertices and where no two edges connect the same
pair of vertices. Then similarly, a directed multigraphcan be defined.
An underlying undirected graph is the undirected graph that results from
ignoring directions of edges. It is just the same graph without the arrows.
A mixed graph, is a graph with both directed and undirected edges.
A converse of a directed graph, is the graph in which its arrows are reversed.
For every graph, we could come up with subgraphs, which are graphs that are
subsets of the initial graph. For example, the graph
An exercise for you here is that you can try to figure out whether you can
determine the total amount of subgraphs, given the values of V and E.
A bipartite graph is a simple graph such that its vertex set V can be partitioned
into 2 disjoint sets V1 and V2 such that every edge in the graph connects a
vertex in V1 and V2. Consider the bipartite graph below:
Notice that I coloured the vertices with 2 colours, red and blue. The blue vertices
will not connect to any other blue vertex, and the red vertices too, they dont
connect to any other red vertex. The graph is partitioned such that there are two
sets or parties of vertices which can be grouped together. To identify a bipartite
graph is simple: As long as you can colour adjacent vertices with only 2 colours,
then it is a bipartite graph. For example, you colour the first vertex blue. The
vertices adjacent to the first vertex must be coloured red, and if you can fit all
the vertices with 2 colours such that no two adjacent vertices have the same
colour, then it is a bipartite graph. Notice also, that a graph is bipartite if and
only if it has no odd cycles. We will learn about cycles in the next session.
For the K4 graph, it has 4 vertices, and every vertex is connected to the other 3
vertices. By simple calculations, a Kn graph has n vertices, and n(n-2)/2 edges.
2. Cycle Graph Cn
This graph, where n 3, consists of n vertices and edges.
Strictly speaking, C2 is not a Cycle graph, as n < 3. Notice that every vertex is
only connected to two other vertices. It looks like a regular polygon with n sides.
3. Wheel Graph Wn
This graph looks like a wheel with n sides. We obtain the wheel when we add an
additional vertex to the cycle Cn, for n 3, and connect this new vertex to each
of the n vertices in Cn, by new edges.
This graph has 2n vertices and 2n-1 edges. Try proving this if you are free.
5. Complete Bipartite Graph, Km,n
This graph is just a bipartite graph, in which there is only 1 edge between each
pair ofdistinct vertices across V1 and V2. Note that the number of edges, |
Now that we know everything about the structure of graphs, we shall now get
into the a little calculations. The degree of vertex is the number of edges
incident with it, except that a loop at a vertex contributes 2 times to the degree
of that vertex. The degree of a vertex is denoted by deg (v). When deg (0), we
say that the vertex is isolated, and whendeg (1), then we say that the vertex
is pendant.
We now want to find the relationship between the sum of degrees of vertices &
number of edges. The Handshaking Theorem states that the sum of degree of
vertices is double the amount of edges. In equation form, we have
This theorem has many implications. One of them is that we know that a graph
cannot exist if the sum of degree of vertex is odd.
In the case for directed graphs, we denote deg+ (v) as the out-degree,
meaning the amount of arcs pointing away from the vertex, while the indegree is denoted by deg- (v), which is the amount of arcs pointing towards the
vertex. Modifying the handshaking theorem, we have
11.1 Transformation
A transformation is a correspondence between 2 sets of points in a plane. A
transformation M is described as a linear transformation of n-dimensional
space when it has the properties
T(x) = T(x), and
T(x + y) = T(x) + T(y)
where and are arbitrary constants.Recalling your Form 4 Mathematics, you
learned how to find the image of points on the Cartesian plane under a certain
transformation. Here you will further learn how to use matrices and some simple
linear algebra to represent transformations in 2 dimensions only.
will determine how the point (x, y) will transform into its image (x, y). The
matrix M is easy to compose. Basically,
where (1, 0) and (0, 1) are the unit vectors of directions x and y respectively (or
rather, you can treat these 2 vectors as points on the x and y plane). For
example, if I want to transform the point (1, 0) to (2, 0), and the point (0,
1) to (0, 2), then my matrix of transformation will be
So if you want to find the transformation of a unit box, (0, 0), (1, 0), (0, 1) and (1,
1), just use this matrix and pre-multiply with the points, then you will get the
image of the transformation. An example will be given in the next section.
Translation is just the moving of coordinates, moving of an object from one point
to another, without altering its size, shape and orientation. The matrix below will
represent a transformation
where a and b will be the amount of shift of the object. (1, 2) will translate the
point (x, y)one step right and 2 steps upward and vice versa.
2. Rotation
Note that this rotation restricts to rotation about the origin only. We will discuss
later what to do if the point of rotation is not zero. The area and the shape of the
object is unchanged, and once rotated about 360 o, the object gets back to its
initial position.
3. Reflection
For a reflection, you need a line which acts like a mirror, such that the whole
image reflects to the other side of the the line, equidistance and perpendicular to
that line. This line, in this case, must pass through the origin. Again, the shape of
the object doesnt change, and so is the area. A few common reflection matrices
are as follows:
along x-axis
along y-axis
It is actually a little tedious to find the matrix of reflection with only given a line
in the form of y=mx. First, you find the normal line, y = m-1x + c. Substitute
the points (1, 0) and (0, 1) to find two parallel normal lines, which passes
through these 2 points. Next, you find the intersection point of these 2 lines, with
the line of reflection. Taking that intersection point as the mid point, you
probably know how to figure out where the reflected points of (1,
0)and (0,1) are, and thus completing your matrix.
But there is a faster way. Let the line of reflection y = mx be written in the form
of y=(tan )x. We see that the gradient m = tan . With this information, we
find , and the reflection matrix is just represented by
You can try figuring out why this is true. This has something to do with the angles
subtended from the point to the origin, then the angle of the line, the uses of
cosine and sine and etc. To find cos 2 and sin 2, you could either calculate ,
or you might want to make use of some trigonometric identities.
4. Scaling
Scaling does not preserve the size, but it preserves the ratio of the object. This
scaling starts from the origin. Scaling can be represented by the matrix
5. Stretch
A stretch looks similar to an enlargement, but this time, the ratio of the sides and
shape is not preserved. It can be a stretch along the x-axis, along the y-axis, or a
stretch along both axis, with different proportions. A stretch is represented as
below:
along x-axis
along y-axis
You probably could have guessed that for values of |a| < 1 turns the stretch into
a compression, while a negative value of a stretches the object the other way.
For a stretch, it really doesnt matter whether it stretches from the origin or some
other point, as they are the same anyway.
6. Shear
parallel to x-axis
angles
parallel to y-axis
The angle is calculated from the opposite axis. For example, the box above
undergoes a shear parallel to the x-axis, and the angle is calculated clockwise
from the positive y-axis. If the angle was 45o, we say that it is a shear
of 45o parallel to the x-axis. Conversely, it can be a shear of xo parallel to the yaxis, which looks like the one below:
In the case of a reflection, as I said earlier, the reflection matrix above applies
only for lines passing through the origin, y = mx. Now that we want to find the
reflection of an object across the line y = mx + c, we take (0, c) as the point of
reference to be subtracted and added in this case. The transformation will
become
You can try it out and see whether this is true. You will find that translating any
point (a, b)will be correct, as long as the line translates such that it passes
through the origin.
SIMILARITY TRANSFORMATION
Two square matrices A and B that are related by A = P-1BP where P is a square
non-singular matrix are said to be similar. A transformation of the form P-1BP is
called asimilarity transformation, or conjugation by P. Try recalling what you
learnt about similar triangles in Maths T. Similarity transformation simply means
that the 2 transformation Aand B are similar to each other, just that they
probably changed their basis, coordinate or are multiplied by a different factor. I
dont have much information on this, so I wouldnt elaborate much here (please
share with me if you have good information on this, I will add it in here some
day). However, if you are asked to find whether 2 matrices A and B are similar,
just make use of the formula above, and if the equations are consistent, that it is,
if not then otherwise.
11.2 Matrix Representations
Knowing all the different types of transformation, we shall now get to do the
algebra of transformations. Lets begin with a simple example:
Find and describe the image of the triangle ABC where A(1, 0), B(2, 0)
and C(2, 3) under the transformation matrix
.
Plotting the new coordinates OA, OB and OC, we find that the transformation
is a reflection in the x-axis (or reflection in Ox).
The first one maps all shapes to the line y = x. The second matrix maps all
points to the x-axis, while the last one maps everything to the origin. You will
know that a matrix M is a singular matrix when | M | = 0. There is a way to tell
whether a matrix maps to a line or to a point. Consider a singular matrix
If the column vector (a, b) = (c, d), then the matrix maps all shapes to a point.
If the column vector (a, b) (c, d) but (a, b) // (c, d), then the matrix maps all
shapes to a line.
Invariant points are points which map to themselves after the transformation.
Note that the variable x maps to another variable X, but not to itself. Ill show
you an example:
Find the invariant lines of the transformation
TRANSFORMING LINES
Knowing how to transform points, we shall now learn how to transform lines. As
in the part on invariant lines, we substitute the parametric equation of x and y,
then we solve the equation in terms of X & Y, as the equation below
Example,
Find the image of the line y = 2 2x under the transformation
2x + 2 2x = X = 2
4x + 4 4x = Y = 4
The line transform into the point (x, y) = (2, 4).
Notice that in this case, the line is transformed into a point. In other cases if it
transforms into another line, remember to find an equation that relates X with Y.
You should be aware that this is the very same method you will do if you were to
find the transformation of circles, parabolas, hyperbolas, ellipses or other curves.
Make use of their parametric equations and substitute them into the equation.
Recall the parametric forms of these curves.
INVERSE TRANSFORMATION
I think I dont need to elaborate too much on this. An inverse
transformation helps us to find the object if the image is given. You find the
inverse of the matrix of transformation, and the equation will become
From here you should recall that a singular transformation has no inverse. In
other words, you cant find a matrix that transform a single point to 4 other
points, or transform a line into a pentagon.
I think you should be familiar with this rule in Physics. When your fingers point
in the direction in the x-axis, and make it curl towards the y-axis, then your
thumb will be pointing to the z-axis. Try to get used to this setting: with the z-axis
pointing upwards, x on the left, y on the right.
A point P in space can be represented by an ordered triple (a, b,
c) where a, b and c are projections of the point P onto the x-, y- and z-axis
respectively. The three dimensional space is also called the xyz-space.
You probably should know how we represent a vector in 3D. Using the same
conventions of unit vectors i and j, we just add one more k to represent the unit
vector in the z direction (e.g., 2i + 3j 5k). Everything about a vector in 2D
works about the same in 3D. The length of a vector P(a, b, c) follows the
Pythagorean relation
And similarly, the distance between 2 position vectors A and B can be found by
the equation
a+b=b+a
a + (b + c) = (a + b) + c
a+0=a
a + (a) = 0
k(a + b) = ka + kb
(k + h)a = ka + ha
(kh)a = k(ha)
1a = a
SCALAR PRODUCT
Scalar product, also known as the dot product, is a multiplication of 2 vectors
(a, b, c)and (d, e, f) such that
The scalar product yields an answer in the form of a scalar, which is a value
instead of a vector. In trigonometry, it can be represented by the equation
a b = |a||b| cos
I believe all these are not new to you, as you have studied it in Maths T.
However, in this section, we will be going quite detail on the algebra of vectors,
unlike in Maths T where you focused more on the applications, namely
the resultant force / velocity and relative velocity. Let us look at the
properties of scalar products. Given a, b and c are vectors, d being a constant,
we have
(i) a b = b a (commutativity)
(ii) a (b + c) = a b + a c (distributive law)
(iii) (da) b = d(a b) = a (db)
(iv) 0 a = 0
(v) a a = |a|2
We say that two vectors are orthogonal to each other when they are
perpendicular to each other. Two vectors a and b are orthogonal if and only if a
b = 0. In 3D, we say that a vector a is orthogonal to vectors b and c if a is
perpendicular to both b and c.
The component of b onto a (or scalar projection) is the resolved part of a in
the direction of b. This means that when we have 2 vectors a and b pointing at 2
different directions, with their tail of the arrow connected to each other, the
respectively.
The angle between the vector and the z-axis can be found using the equation
and therefore you can deduce the angle between the vector and the x-axis & yaxis respectively.
Recalling that the dot product of 2 vectors, a b = |a||b| cos , we can easily
find theangle between 2 vectors,
VECTOR PRODUCT
Also known as cross product, the vector product is something new for you, as it
cannot exist in a 2D plane. We define the vector product of 2 vectors (a, b,
c) and (d, e, f) to be
The cross product yields a vector (it has a magnitude and a direction), which is
orthogonal to both the original vectors. In trigonometry, the cross product a b
= |a||b| sin .
You can use the right hand rule to determine the direction of the cross product.
Point your fingers to the direction of a, curl it towards the direction of b, then
your thumb points in the direction of a b. This information is very important
we come to the section on planes.
Different from the dot product, any vector cross itself yields zero.
i i = 0, j j j j = 0, kj kj = 0
Or in other words, the cross product of 2 parallel vectors is zero. You can use
your right hand rule to verify this. For the unit vectors, you could also get the
following results:
We shall now see the properties of the cross product. If a, b and c are vectors
and d is a scalar, then
(i) a b = b a
(ii) (da) b = d(a b) = a (db)
(iii) a (b + c) = a b + a c
(iv) (a + b) c = a c + b c
(v) a (b c) = (a b) c
(vi) a (b c) = (a c)b (a b)c
(vii) (a b) a = 0
Probably (vi) is hard to remember. (vii) is just the definition of the dot product,
where the dot product of 2 orthogonal vectors equals to zero. Also take note that
the cross product is not commutative. Reversing the as and bs will result in an
extra minus sign.
The cross product has many applications, especially in physics. You use the cross
product to find the torque, magnetic force and etc. In geometry, we see that
the area of a triangle made up by 3 vectors a, b and c is
Where a = (a1, a2, a3), b = (b1, b2, b3), and c = (c1, c2, c3) respectively. We use
the scalar triple product to find the volumes of various solids. Since b c is the
base area of a solid, when dotted with another vector a, it multiplies the area
with the cosine of the height. So the formulas for different solids are as below:
abc
2. volume of tetrahedron:
4. volume of pyramid
EQUATION OF A LINE
Let r be a line in xyz space, we let a and b be 2 vectors and t be an arbitrary
constant. The vector equation of a line can be represented by the equation
The vector a (x0, y0, z0) is a position vector. It is a point in space in which the
line passes through. Then the vector b is a direction vector. This vector
determines the direction of the line. The constant t is there, meaning that any
scalar multiplication of the direction vector, is also the same direction vector.
Summarizing it up, you actually get this:
You need some visualization here. Look at the diagram below. The green line L
first needs a point a in space. Then you need a direction vector b to tell you
where the line extends too. So if you analyse carefully, an equation of a line is
not unique. You can put in an infinite amount of different position vectors, or use
an infinite amount of direction vectors of the same ratio to construct different
line equations, which actually refers to the same line. This is unlike lines in 2D,
where a line only has one representation.
You might have also noticed that the vector equation of a line is actually
a parametric equation of a line. If you break it down,
This is where
is the position vector a, and
is the direction
vector b. Probably now you figure out why the line is not unique, since
parametric equations are not unique. By the way, we can also write the vector
equation as r = ai + bj +ck + t(pi + qj + rk). I dont like this method as we
waste too much time writing the ijks and +/- signs.
Now if we try to modify the 3 parametric equation, such that it is t in terms of
something else, we get the cartesian equation, as below:
We normally write this whole chunk of equalities without the =t, I only show it
here for clarity. A line in 3D space has 2 equal signs. So what if p, or q, or both
are 0? An example of such lines are
You might want to substitute it back into the vector equation to check this out.
You probably could have guessed why we prefer to use the vector equation
instead of the cartesian equation. WIth all these information, you should be able
to know how to construct a line equation, given only 2 points it passes through.
are parallel, because they have the same direction vector. You can further check
whether the lines coincide (or, whether they are just both the same line). To do
this, we take the point (1, 2, 3) and substitute into (x, y, z) in the second
equation. Doing some algebra, we find that the value of s for the 3 parametric
equations are not consistent. Therefore, it does not coincide, and is a parallel
line. This method also tells us whether a particular point lies in the line. So here
we see that the point (1, 2, 3) does not lie in the second line.
To show that 2 lines intersect, we let line 1 equal line 2. We get 3 equations.
Consider the two lines below:
We have
-3 + 4t = s
-5 + 3t = -9 + 2s
-4 + t = 13 3s
If we could find a value of s and t such that it satisfies all the 3 equations, the
lines intersect. If the value of s and t contradict one another, then the lines
are skewed. We can further find the point of intersection. By using the values
of s and t, substituting them back into the initial equations, we get the
intersection point. In this case, the point of intersection is
(5, 1, 2).
to find the distance from the point to the line, we want to make use of the sine of
the angle between the line r1 and the line (r2 a). Look at the diagram below.
Recalling that |a b| = |a||b| sin , the distance between the line and the
point r2 is
Given the two lines, we can make use of what we learnt from the part above, and
find that the distance between these 2 lines are just
Given 2 lines, the shortest distance between 2 skewed lines can be found
through the equation
where k is a constant. Let me explain this a little. The distance between the two
lines is r2 r2. It is parallel to the normal vector (b d), and that is why we
multiply it with k. So after setting up the equation, we get the equation c +
sd a tb = k(b d), which is actually 3 parametric equations in terms of 3
variables t, s and k. From here, we solve for s, t and k, and we multiply k to the
magnitude of b d,
and thus you get the shortest distance between 2 skewed lines.
You use this formula to find the angle between two lines, by
substituting a and b as the direction vectors of both lines. Shouldnt be a
problem for you, I think.
12.3 Planes
A plane is simply just a flat surface in space. We first start by introducing
the vector equation of a plane,
We need to have at least 2 direction vectors to show the direction of the plane,
and then a point to know where does the plane lie exactly. We multiply the 2
direction vectors with different constants, to show that any direction vector
proportion to that ratio is also a direction vector. Similarly, this form of the plane
equation is not unique. Again, this form can be written in the ijk form, in which
looks ugly and long.
There is another vector equation of the plane. Though not named properly, I call
it the normal form. We first find the normal vector of a plane, i.e., a vector
which is normal to both the direction vectors. You obtain the normal vector by
getting the cross product of band c. Suppose that the normal vector is (a, b, c),
the normal form of the equation will be
This cartesian form is unique, unlike the other forms. This is the most common
form of the equation of planes used. You can see that this equation is linear, and
that the equation
y = mx + c, or x = a are all equations of planes in 3 dimensional space.
So to sum up, to construct a plane equation, you need one of these information:
1. 3 points lying on the plane.
We first find whether the direction vector of the line is parallel to the plane. In
other words, we want to know whether the direction vector of the line
is perpendicular to the normal vector of the plane. By taking b n, if the
answer is zero, then the line is parallel to the plane. We might want to know
whether the parallel line actually lies in the plane. We can do this by
substituting the position vector of the line into r2, and if LHS = RHS, then indeed
the line lies in the plane, and is otherwise if the equality doesnt hold.
So if b n 0, this means that the line definitely intersects the plane.
The point of intersection can be found by letting r1 = r2, that is,
You should be able to solve for t, which satisfies all the 3 parametric equations.
Then finally, to find the point of intersection, we substitute t back into the line
equation to find (x, y, z).
this will be the direction vector of the intersecting line. To find a position
vector of the line, we make use of the cartesian equation of both planes,
We need to solve this system of linear equations to find x, y and z. Recall the
Chapter on Matrices, this system of equations have infinitely many solutions. As
usual, let one of them be t, solve for x, y and z in terms of t, and then just
substitute a value for t to get a random position vector. The line equation is thus
found.
I will explain why this makes sense. Firstly, you should recall that the values d/|
n| ande/|n| are the perpendicular distances from the planes to the origin. Also
remembering that the distance really depends on whether both the planes lie on
the same side of the origin, or the other (same sign or different sign). You
subtract them, then take the modulus because distance is never negative.
Consider a line with direction vector a and a plane with normal vector n. The
angle between the line and the plane can be found by using the equation
Note that if you used cos , you would have gotten the angle between the line
and the normal vector instead.
The angle between 2 planes is actually the same angle between the 2 normal
vectors. So given 2 planes with normal vectors m and n respectively, we can find
Recall that this is the same formula to find the angle between 2 lines.
Now that you know how to construct planes, you might be curious as in how 3D
shapes are constructed. Again, you could make use of the applet I shared with
you in the previous post, from the drop down menu of new graph, choose z =
f(x, y) surfaces. Fiddle around it and have fun creating awkward shapes. This is
obviously out of your syllabus, but let me just give you some equations for some
very common shapes in 3D:
cylinders,
paraboloid,
x2 + y 2 = r 2
2
y
elliptic paraboloid,
ellipsoid,
cone,
hyperboloid
ax2 + by2 + cz2 = 1
x2 + y 2 z2 = 1
hyperbolic
z = x2 + y 2
z = x2
elliptic
x2 + y2 z2 = 0
Before you start sampling, you need to do a few things. First, you need to identify
thetarget population, as in where and who do you want to interview. Next, you
determine the sampling units, the people / item to be sampled. If your
population is all the primary schools in Malaysia, is your sampling unit the
student, the teacher, or the canteen waiter? You have to make it clear. Then, you
need a sampling frame. You need a list in which the sampling units within a
population are individually named or numbered. Of course the list cannot be
complete, or sometimes just couldnt be generate, as the list of units will change,
move in and out, or maybe if they are fish in a pond, they couldnt be listed
down!
Once you are done, you can start your survey.
Knowing that we can start surveying, we need to know the possible sampling
methods. We shall not focus on census in this chapter (the title says it). Now we
shall look into a few types of sampling methods:
1. Random Sampling
I believe you are familiar with the term random. It means that you do not
choose a sample on purpose, you just simply pick one. There are 3 kinds of
random sampling:
Simple Random Sample
As its name suggest, it is simple, you dont need to do any homework to get
that sample. You could draw lots, use a random number to choose which unit you
want to take the survey. You can make use of a random number table to
choose your units. It acts as a large dice, and looks something like the one below:
You can use numbers from left to right, following the numbers given. Or you
could also close your eyes, and use a pencil to point on a number on the table.
For example, in a group of students numbered 1 to 100, you want to choose 5
random students. You can take 2 digit numbers starting from the left of the table,
namely 82, 03, 14, 58 and 21 to be the students you want.
You could actually use your calculator as a random number generator. On
yourCASIO fx-570MS, press shift - Ran#, then you will get a random number,
3 decimal places, between 0.000 to 1.000. You can use multiplication or division
to manipulate the random number to the range you want.
Note that there exist 2 kinds of simple random samples, one with replacement,
one without replacement.
Systematic Random Sample
In systematic sampling, you make use of a certain pattern, a certain sequence
to find your samples. For example, in a list of 1000 people, you take every kth
person to take the survey, depending on your sample size.
Stratified Random Sample
In a stratified sample, there are many distinguishable layers. For example, in a
population of people, they have different age groups, they have different
occupations and etc. We take a few units from different age groups, and combine
them in one sample in the end.
2. Non-Random Sampling
I think I dont need to elaborate much on this. It is not random, and therefore you
choose a unit with a solid and particular reason. There are 2 kinds over here:
Clusters
Clusters are like natural sub-groups of a population. For example, in a primary
school, there are 6 classes in standard 1, with all the kids having the same
status. Note that this differs from stratified random sample, since stratas are
different, and classes are alike. You choose to study on one cluster, which means
that you didnt randomly pick students from any class in the school. You save a
lot of effort, time and money, as you dont need to pick the survey forms from
every class or so.
Quotas
Quota sampling is widely used in market researches where the population is
divided into groups in terms of age, sex, income level and etc. Then when you
are about to survey, you already have your plans in mind: I want to survey one
person who has high income, has a big family, and another one with low income,
with a small family and etc. You already set specific requirements for the
members of the population that you are about to interview or collect data from.
All these sampling methods have their pros and cons. I summarize them in the
table below:
In every survey, there will sure be some sources of bias. Obviously, when you
are collecting data from a population, you want it to be as accurate as possible,
and thus should eliminate any bias in the process of sampling. These biases will
cause the survey or data collection to be very inaccurate, and give a wrong
picture of what the population really is. Examples of sources of bias are:
1. lack of good sampling frame
Its like using a list of friends generated from your Twitter account. You will miss
out those friends who dont use Twitter. You need a good sampling frame in order
that everyone has an equal chance of being sampled.
2. wrong choice of sampling unit
In surveying on who has a car at home, you chose the wrong sampling unit
people, since a better sampling unit would be household, since children dont
drive.
3. no response by some chosen units
Some people just choose to answer your survey questions for God-knows-what
reason. Then, your questionnaire might have some questions in which they dont
have much choice to answer with. For example, they dont respond the question
do you like Subway Sandwiches? Yes / No when they dont even know that such
outlet exist.
4. introduced by the person conducting the survey
The person conducting the survey might already have a conclusion in mind, and
tries to make his survey results to suit his mindset. For example, on the question
Which party will do a better job in the next General Elections? If the surveyor is
a Pakatan Rakyat supporter, he might influence the person taking the survey to
agree with his stand.
It has a value x and a frequency. Lets say, I would like to generate a sample of
size 6 from this population. For data like this, we could not just simply use a
calculator to randomly get the numbers 1 to 4 as our sample. It has a frequency,
or rather a weightage of how we should randomly choose the numbers. So what
we can do is we can tabulate a table, making use of its cumulative frequency.
Using this table, we can finally tabulate the random sample. For example, now
that we have a random number as 04938581365399, so we can get the numbers
4, 93, 85, 81, 36, 53, which corresponds to the values of x being 1, 4, 3, 3, 2,
3 respectively. We have finally got our random sample from the frequency
distribution.
2. Probability Distribution
The method is the same as the above, we create a cumulative frequency, and
change the base to be over 1, then use the generated random numbers to find
the random samples. There are a few kinds of probability distributions:
probability distribution
Poisson distribution X ~ P0 ()
The formula is
which is actually the same value as the population mean. What this means is
that the sample mean estimated should have the same value of the population
mean. We will then find that the sample variance has a different value from the
population variance. Using the fact that
which we call as the standard error of the mean. However, remember that
this standard error is for samples with replacement. For samples without
replacement, the variance would be
Where N is the size of the finite population, and n being the sample size. I do not
know how to derive this, and I dont think it will appear in exams. I put it here for
your reference.
So now, for every time when we have a normal distribution X ~ N(, 2), we
have a sampling distribution of
Notice that the sample size affects the sampling distribution. So now to answer
questions, unlike Maths T, you have to be very particular as in whether it is
talking about a population or a sample. Let me give you an example:
The volume of wine in bottles are normally distributed with a mean of
758ml and a standard deviation of 12ml. A random sample of 10 bottles
is taken and the mean volume found. Calculate the probability that the
sample mean is less than 750ml.
Let X be the volume of wine in bottles.
X ~ N(758, 122)
Since X is normally distributed, then the sampling distribution with n =
10,
XM ~ N(758, 122 / 10)
XM ~ N(758, 14.4)
P(XM < 750) = P(Z < 2.108)
= 0.0175
I assume that you have fully studied the chapters Discrete Probability
Distributions & Continuous Probability Distributions in Maths T. So now you know
the difference between samples and populations, the final answer will be
different if you used the wrong distribution.
We were assuming that the sample was taken from a population which follows
the normal distribution. So what if it isnt? Maybe, the sample was taken from a
Binomial, Poisson or even a Uniform distribution?
Lets do a little experiment. Suppose you have an unfair coin, such that every
time you toss it, it has 25% chance of getting a head. So if you toss it 10 time,
you get a binomial distribution, X ~ B(10, 0.25). We plot the probability graph
below. The red bars are the Binomial plots, while the blue line is the normal
approximation.
Again, we get into serious investigation to see how many monkeys appear
everyday, and we get the means for 30 times, and we find the sampling
distribution of XM to be as follows:
Once again it is close to the normal blue curve. Remember that the y-axis stands
for probability. So this sampling distribution simply tells us the probability of the
mean monkeys seen on the road daily, with a sample size of n.
We try now for a uniform distribution. A uniform distribution X ~ R(a, b) means
that X is uniformly distributed with a range of a x b. It has the following
expectation and variance:
then again, we find the sampling distribution of XM. We do 30 sample, and we find
that actually, it looks like a normal distribution!
All these graphs are done with this applet. So after doing all these, we find that
the sampling distribution taken from distributions not normally distributed,
the sampling distribution takes the normal shape as the size increase. In other
words, for large sample size n, it is approximately normal. And here, we
introduce the central limit theorem:
When samples are taken from a non-normal population with known
variance2 then for large values of n, the distribution xM is approximately normal
such that
500 5% = 25
P(X 25) = P( X > 24.5) = P(Z > 2.491) = 0.0064
If I were you, I would choose to do the second method. However, in exam
questions, if you were asked to find the proportion, then you better do the first
method to avoid deduction of marks. Note that in either cases, this sample of
proportion can only be used for large sample size n.
13.3 Point Estimates
To define a certain distribution, be it Binomial, Poisson or Normal, you need to
know their population parameters. And of course, if you dont know the
parameters before hand, you would want to use sampling to estimate it. This
estimate is unbiased if the average (or expectation) of a large number of values
taken in the same way is the true value of the parameter. The best way to
estimate these parameters is by using one with the smallestvariance.
So here in this section, we are focusing on point estimates. We estimate that
the parameters are those points or data that we collected through the samples.
Look at the 3 equations below.
That is all you need to know about this section. Let me give you a short example:
The concentrations, in milligrams per litre, of a trace element in 7
randomly chosen samples of water from a spring were
240.8, 237.3, 236.7, 236.6, 234.2, 233.9, 232.5
Determine the unbiased estimates of the mean and the variance of the
concentration of the trace element per little of water from the spring.
To answer this question, we need to make use of our calculator. Set your CASIO
570MS to SD mode, and input all the data into it, by pressing the individual
numbers in, every time followed by the DT button, until you finished inputting
everything. Next, you press
shift+S-VAR. It gives you the option of xM , xn and xn-1. The first one gives
you the unbiased estimate of the mean, while the last one will give you the
unbiased estimate of the standard deviation. Just show them a little working
even though you know the answers straight away:
Okay, I need to explain this a little. If you would have observed closely, the
confidence interval is constructed by the unbiased estimate of population
proportion, the standard error. The term a determines the percentage of
interval you wanted. This value a, can be obtained from the normal tables (or the
Buku Sifir given in STPM). It looks something like this:
Ill teach you how to read this table, in the example below:
In order to assess the probability of a successful outcome, an
experiment was performed 200 times. The number of successful
outcomes was 72. Find a 95% confidence interval for p, the population
proportion of success.
We start by listing down the important values: ps, qs and n, and the distribution.
ps = 72 200 = 0.36, qs = 0.64, n = 200
Ps ~ N (0.36, 0.001152)
To find a, we refer to the table. Note that the table was written for lower tail
probability
P (Z a), but we are looking for P ( a Z a). So a central 95% of the
distribution, should have an upper and lower tail of 2.5%. This table might help
to explain a little:
The diagram on the left shows the lower tail probability, which is what the table
in your Buku Sifir gives. We want to find the one on the right, in which by looking
at the position of the red lines, you know that definitely are different. So here,
the value of a comes from the column 0.975, which is 1.960. So your
confidence interval shall be
( ps 1.960.001152, ps + 1.960.001152 ) = (0.622, 0.738)
You might have probably noticed that the continuity correction is omitted. Yes,
this is indeed the case. You need to get used to reading the table to prevent
yourself from using the wrong value of a. A 90% interval means that it has a
lower tail probability of 95%, a80% interval means that it has a lower tail
probability of 90% and etc. To make things faster, I suggest you memorize the 4
most common percentage intervals:
90%
95%
98%
99%
confidence
confidence
confidence
confidence
level
level
level
level
1.645
1.960
2.326
2.576
same as above,
Notice that we only use the t-distribution when the sample size is small, and
therefore, when t tends to infinity, it will look like a normal curve. In other words,
nothing much has changed, we are just using a new distribution for small sample
size. After knowing that our sample size is small, we use the t-distribution using
(n 1) degrees of freedom, use the unbiased estimates for both the mean and
the variance, and our new formula will be
1. interval width
The width of a confidence interval can be obtained from the expression
To summarize this section, I made a chart for you to remember things easier.
Take note that this is the most important section of this chapter. Be sure you are
clear with all the distributions, dont confuse the sample size n with the number
of trials n in a Binomial distribution, and practise more on population
proportions.
14.1 Hypotheses
Lets imagine this story.
One day in town, you met this awkward looking Mathematics tuition teacher. He
brags that 95% of his pupils get As for their Mathematics T in STPM every year.
Since you love Mathematics so much, you thought that maybe you might want to
take his tuition class. But being sceptical in nature, you were wondering whether
95% of his students getting As, is a little too much. So you decided that you
want to put this teacher to a test. You managed to get some information from 15
of his ex-students, and find out that 11 of them got A for Maths T in the previous
year.
Now your question is: is the Maths tuition teachers claim, a little bit overboard?
Is 11 out of 15, 95%? Obviously it isnt, but since you are only taking a sample,
you cant be sure that you are right. What if there were 13 or 14 students got
As? You know that if 2 or 3 students got As, he is definitely lying. Then how
about 10 students? 8 or 9 students? There must be a cut off point, such that you
are VERY SURE that he is lying, or not. Isnt it?
Or lets think of another story. Suppose you are an athlete, participating in the
MSSM 400m race. You find that every time, your running speed follows a normal
distribution with a mean of 40km/h. Bored of running everyday, you decided to
test whether drinking 2 cups of milk in the morning everyday helps improve your
running. So after drinking milk for 5 days, you find your mean speed turned out
to be 40.9km/h.
Again you question yourself: did you really improved? Well, it might so happen
that you run a little faster this time, and has nothing to do with the milk. You
might also be wondering, how much increase in speed is considered as
improve? You need a cut off point, again.
respectively. Try not to confuse this with what you learned in the previous
chapter, which was confidence intervals, in terms of 90+%.
Lets say, in a test of 10 true-false questions which were written in Hindi, your
friend got 6 questions correct, and you want to know whether he was guessing,
or he really studied Hindi. You formulate the hypotheses as below:
H0: Your friend is guessing. He makes use of the 50% luck.
H1: Your friend seriously studied Hindi before. He scores more than
50%.
Mathematically, this is a binomial problem again, X ~ B(10, 0.50).
H0: p = 0.50
H1: p > 0.50
Notice that the expression for H0 always has an =' sign, while H1 should have
either <, > or signs. To start our test, we need to define our significance level.
We can say, for example, that we want to test at the 5% level, that he could have
obtained this score by guessing all the answers. We can also choose to test at
1% level or 10% level, and obviously, you might get different results.
So from here you can see that in the last section, you cant get any answer if you
dont set a significance level. You cant say how much you have improved in your
running, unless you state that an increase in 5% is significant, or if I run faster
by 10%, then there is significant improvement. With this significant level, then
only our hypothesis could be done. For the example above, say, we want to test
it at 5% level. We first need to find out the probability of how many questions he
get correct. We plot a cumulative binomial distribution
X ~ B(10, 0.50).
This curve tells us the probability he gets n questions correct. So we see that,
there is 99.9% probability that he gets at least one question correct, and 62.3%
probability that he gets at least 5 questions correct etc. Even if your friend gets 8
questions correct, there is 5.5% probability that he is guessing, which is still
above our required significance level. So here, if he gets 9 questions correct, it
must be really a rare event, as he has only 1.1% probability of getting this score
if he was guessing. We say that the numbers 9 and 10 lie in the critical
region, which is the group of observations that are considered to be unusual or
unlikely (rare) events. We also say that number 9 is the critical value, or cut-off
point, since anything above it is considered a rare event.
So what can we conclude from here? We can see that if your friend got 0 to 8
questions correct, we have no evidence, saying that he did studied Hindi, as
these are not rare events (they are > 5% probability). We say that the null
hypothesis H0 is not rejected, which is the case. But if he gets 9 or 10
questions correct, we say that there is evidence, at 5% significance level, that
your friend did study Hindi. In other words, the null hypothesis H0 is rejected
in favour of the alternative hypothesis H1.
Notice that if we did a 10% significance level test, number 8 now lies in the
critical region! So this is actually very subjective, and it really depends on you (or
the question in your test paper) to determine what is considered significant and
what is not.
rejected. If not,
H0 is accepted.
This is only a rough idea of how a hypothesis is about. You might still be a little
confused about what is happening, I have to apologize for that, because I break
down this chapter in quite a weird way. 14.3 Tests Of Significance
A Hypothesis Test is a Test of Significance. In this section, we will be looking
at all the possible types of hypothesis tests that can be made in STPM. Before we
start, every hypothesis test follow a general rule. You need to state these 7 steps
(or workings) in your answer sheet:
1. Define the variable X.
Let X be ,
X ~ B(n, p) / X ~ N(, 2) / X ~ P0()
2. Define H0 and H1.
H0: p / / / 1 2
=
?
H1: p / / / 1 2 <, >, ?
3. Write down the case if H0 is true.
If H0 is true, then p / / / 1 2 = ?
and X ~ B(n, ?) / X ~ N(?, 2) / X ~ P0(?)
4. Define your type of test and significance level.
Use a upper / lower / two tailed test, at ?% level.
5. Set the criteria to reject H0.
Reject H0 if P(X x) < ? / P(X x) < ? / z < ? / z > ? / |z| > ? / T < ?
6. Do the calculations.
P(X ?) = ? / P(X ?) = ? / z = ? / T = ?
7. Conclude your results.
Since P(X x) = ? / P(X x) = ? / Z = ? / T = ?, x lies / doesnt lie in the
critical region.
H0 is rejected in favour of H1 / not rejected. We conclude that . at
?% level.
If you have all these 7 steps on your answer sheets, then you will probably get
90% percent of the marks. Dont make calculation mistakes though.
[85.5, continuity correction, such that it lies in the critical region, that means you
correct it such that the value is nearer to the critical region.]
Since z = 2.396, z lies in the critical region.
H0 is rejected in favour of H1. There is evidence that the proportion is lesser, and
therefore the manufacturers claim is not accepted, at 5% level.
3. Poisson Mean
The number of white corpuscles on a slide has a Poisson distribution with mean
3.5. After treat, a sample was taken and the number of white corpuscles was
found to be 8. Test at the 5% level of significance, whether the number of white
corpuscles has increased.
Let X be the number of white corpuscles on a slide, X ~ P 0().
H0: = 3.5
H1: > 3.5
If H0 is true, then = 3.5, and X ~ P0(3.5).
Use an upper tail test, at 5% level.
Reject H0 if P(X x) < 0.05.
P(X 8) = 1 P(X < 7) = 1 0.9733 = 0.0267 = 2.7% [I hope you remember the
Poisson formula. In some formula booklets, there are Poisson cumulative
probability tables, they help too.]
Since P(X 8) = 2.7% < 5%, x lies in the critical region.
H0 is rejected in favour of H1. There is evidence, at 5% level that the number of
white corpuscles increased.
Not a hard one, I suppose. Remember that if > 5, you can actually make an
approximation to the Normal distribution, X ~ N(, 2).
4. Population Mean (Normal, 2 known)
A machine fills cans with soft drinks so that the volume of liquid in the cans
follow a normal distribution with mean 335ml and standard deviation of 3ml. A
setting on the machine is altered, following which the operator suspects that the
mean volume of liquid discharged by the machine into the cans has decreased.
He takes a random sample of 50 cans and finds that the mean volume of liquid
in these cans is 334.6ml. Does this confirm his suspicion? Perform a significance
test at the 5% level and assume that the standard deviation remains unchanged.
Let X be the volume of liquid in the cans, X ~ N(, 3 2)
H0: = 335
H1: < 335
The sample size is 50, Xx ~ N(, 32/50) [recall what you learned in the previous
chapter]
If H0 is true, then = 335, and Xx ~ N(335, 9/50)
Use a lower tail test, at 5% level.
Reject H0 if z < 1.645
Family packs of bacon slices are sold in 1.5kg packs. A sample of 12 packs was
selected at random and their masses, measured in kilograms, noted. The
following results were obtained: x = 17.81, x 2 = 26.4357
Assuming that the masses measured in kg packs follow a normal distribution
with variance 2 unknown, test at the 1% level whether the packs are
underweight.
Let X be the mass of packs of bacon slices, X ~ N(, 2)
H0: = 1.5
H1: < 1.5
Since 2 is unknown, and n < 30, a t-distribution is used, T ~ t(n 1)
If H0 is true, then = 1.5, T ~ t(11), where
By the way, you can also create confidence intervals for situations like this
too. Try it out yourself. There can be 2 tail, upper tail and lower tail tests as well.
where n1 and n2 are the sample sizes and s12 and s22 are the variances of the 2
samples respectively. The distribution will be
This is, however, not always the case. When both the samples are small, we
should use the t-distribution instead. The test statistic will now be
whether the results provide significant evidence at the 5% level that the sunnyside flowers grow taller, on average, than the shady-side sunflowers.
Let X1 be the height of sunflowers in the shady side, X 1 ~ N(1, 2)
Let X2 be the height of sunflowers in the sunny side, X 2 ~ N(2, 2)
where 2 is unknown.
H 0 : 1 2 = 0
H 1 : 1 2 > 0
Consider the distribution of the difference between the means XM1 XM2, with n1 =
n2 = 36.
If H0 is true, then 1 2 = 0
and therefore
When you perform a significance test, you tend to make errors. If H 0 is correct
and you accept it, or if H0 is false and you reject it, then youve made a correct
decision. However, there are 2 kinds of errors that you will made:
1. A Type I Error, which is made when you reject H0 when it is true
2. A Type II Error, which is made when you accept H0 when it is false.
Questions are usually interested to know the probability of making these errors.
The first one is easy, P(Type I error) = level of significance. For the type II
error, things are not so straight forward. A specific value of H1 is stated in order
to find the probability of this error. Ill show you an example below:
A random observation is taken from a binomial distribution X ~ B(20, p) and
used to test the null hypothesis p = 0.8 against the alternative hypothesis p>
0.8. The significance level of the test is 7%. Find the probability of making a Type
I error. Find also the probability of making a Type II error if in fact p = 0.85.
The probability of making a Type I error is 7%. [same as the level of significance]
You make a Type II error if you accept H0 when p is the value specified in H 1.
For Type II error,
H0: p = 0.80
H1: p = 0.85
P(X = 20) = 0.012 = 1.2%
P(X 19) = 0.069 = 6.9%
P(X 18) = 0.206 = 20.6%
So the critical region is X 19.
So P(Type II error) = P(accept H0 when H1 is true)
= P(X < 19 when p = 0.85)
= P(X < 19 when X ~ B(20, 0.85))
P(X 18) = 1 - P(X = 20) - P(X = 19) = 0.824 = 82.4% [Note that in this part of
the calculations, you are using p = 0.85, but not 0.80 as when you were finding
the critical region above.]
The probability of making a Type II error is 82.4%.
Let me summarize how you find the probability of a Type II error:
1. Define your new H1
2. Find the critical region
3. Find the probability of the new value in H1 that lies outside your found critical
region.
By the way, the expression 1 P(Type II error) is known as the Power of the
Test.
Study hard, make sure you dont make mistakes in this section, which is meant
to score.
We will be going ALL INTO it in the next section, where the calculations come
it.