Quantitative Techniques

QUANTITATIVE TECHNIQUES
Contents
Section I
UNIT 1 R O L E OF MATHEMATICS AND S T A T I S T I C S IN BUSINESS DECISIONS Basics of Decision Making Decision Contaminants Managerial Decision Making System Managerial Decision Making Environment Managerial Decision Models and Algorithms Quantitative Models The Decision Summary Keywords Review Questions Further Readings T H E O R Y OF SETS Introduction The Concept of a Set Notations Representation of a Set Some Basic Definitions Theorem on Subsets Venn Diagram Set Operations Laws of Union of Sets Laws of Intersection of Sets Law of Complement of a Set Theorem (On Symmetric Difference) De-Morgan's Laws Applications of Venn Diagrams Summary Keywords Review Questions Further Readings UNIT 3 L O G A R I T H M S & PROGRESSIONS Introduction Logarithms Laws of Operations Compound Interest Arithmetic Progressions Geometric Progressions Annuities, Loans and Mortgages
UNIT 2
Methods of Investment Evaluation Perpetual Annuities and Infinite Series Depreciation Summary Keywords Review Questions Further Readings UNIT 4 EQUATIONS Introduction Equations Applications of Linear Equations in Business Supply and Demand Functions Irregular, Unequal and Discontinuous Functions Quadratic Equations Fitting a Quadratic Cost Curve Summary Keywords Review Questions Further Readings UNIT 5 MATRIX ALGEBRA Introduction Vectors Multiplication of Vectors Matrices Use of Matrices for Production Planning Solving Linear Equations Determinants Cramer's Rule Applications in Management Summary Keywords Review Questions Further Readings UNIT 6 MATHEMATICAL I N D U C T I O N Introduction Induction and Deduction Principle of Mathematical Induction Summary Keywords Review Questions Further Readings 112 73 57
Section II
UNIT 7 DATA A N A L Y S I S Introduction Data Collection and Presentation Frequency Distribution Measure of Central Tendency Mathematical Averages Positional Averages Commercial Averages Measure of Dispersion Skewness Kurtosis Summary Keywords Review Questions Further Readings UNIT 8 C O R R E L A T I O N A N D REGRESSION Introduction Correlation Analysis Scatter Diagram Covariance Karl Pearson Coefficient of Linear Correlation Spearman's Rank Correlation Regression Analysis Fitting Regression Lines Summary Keywords Review Questions Further Readings
Section III
UNIT 9 TIME S E R I E S A N A L Y S I S AND INDEX NUMBERS Introduction Lime Series Analysis Graphical Method Method of Averages Nonlinear Analysis Measuring Periodic Variations Index Numbers Construction of Index Numbers Price Index Numbers Nature of Weights Laspeyres Index Paasche Index Fisher Index
Dorbish and Bowley Index Marshall and Edgeworth Index Walsh Index Summary Keywords Review Questions Further Readings U N I T 10 PROBABILITY THEORY Introduction Probability Concepts Permutations Combinations Objective and Subjective Probabilities Revised Probabilities Random Variables and Probability Distribution Discrete Random Variables Continuous Random Variables Binomial Distribution Poisson Distribution Normal Distribution Summary Keywords Review Questions Further Readings THEORY OF ESTIMATION A N D T E S T OF HYPOTHESIS Introduction Theory of Estimation Point Fstimation (Properties of Good Estimators) Methods of Point Estimation Interval Estimation Sampling Distributions Sampling Theory The Quantitative Models of Inferential Decisions Statistical Approaches to Inferential Decision-making The General Inferential Decision Algorithm Specific Decision Areas Chi-square Distribution The Z and t Distributions The One-Sample Mean Problem The F-Distribution Concluding Comments Summary Keywords Review Questions Further Readings
U N I T 11
>
SECTION-I
Unit 1 Role of Mathematics and Statistics in Business Decisions Unit 2 Theory of Sets Unit 3 L o g a r i t h m s & Progressions Unit 4 * Equations Unit 5 Matrix Algebra Unit 6 Mathematical Induction
Quantitative
Techniques
Role of
Notes
Mathematics and Statistics in B u s i n e s s Decisions

Unit Structure Basics of Decision Making Decision Contaminants Managerial Decision Making System Managerial Decision Making Environment Managerial Decision Models and Algorithms Quantitative Models The Decision Summary Keywords Review Questions Further Readings Learning Objectives After reading this unit you should be able So: Define decision contaminants Describe managerial decision making system Explain managerial decision making environment Apply and interpret managerial decision models and algorithms Apply and interpret quantitative models
B a s i c s of Decision Making
Decision-making comes into play, sometimes voluntarily and other times involuntarily, when one needs to take an action or a stand in a situation and when one does not know what that would be. It implies therefore, that decision-making is primarily a reasoning process. Reasoning is subjective by nature, which can he rational or irrational. Moreover it is almost always based on assumptions - explicit or tacit.
All types of human decision-making are essentially intellectual processes. This process has its roots in both the conscious as well as the subconscious mind and always involves three stages described below: 1. Cogtiition stage: It is the starting point for the mind that has searched for facts in the environment in order to make a decision. Cognition means discovery or recognition of data that are assembled into an information system. Assembly stage: The assembly of recognized facts obtained in the first stage into usable information systems represents the second stage. The mind may employ convergent or divergent thinking properties in the assembly process. Convergent means the conventional grouping of data into a system, whereas divergent signifies unusual or new ways of relating the data. Testing stage: At this point the decision maker evaluates the cognites in terms of their relevancy to a given problem. Either a decision is made or not and any number of managerial action programs are the outcomes of this intellectual process.
2.
3. 4.
There might be one or more subconscious intellectual components that affect the decision-making process negatively. While discussing about managerial decision controls, especially when they are of a quantitative nature, we usually do not address ourselves to this problem. The subtle yet powerful ability of these components to redirect the decision-making process should be borne in mind by the decision-maker and decision-analvst alike. The two most common flaws in decision making are inertia and impatience. This is a paradoxical situation. Inertia is often due to a fear of change. Impatience, if regarded superficially, may appear to be somewhat of an opposite to inertia. But it has the same roots.
Decision Contaminants
In the preceding section we have discussed decision making in terms of three major intellectual stages, that is, cognition, assembly, and test. We have also discussed the effect of certain subconscious contaminants A checklist of subconscious contaminants is given belovv to help you avoid them. Of course, the list is not exhaustive. N Note that this list is not exclusive and you may experience different or additional symptoms in different situations. Dishonesty : trying to obtain someone else's decision; trying to anticipate the outcome without actually going through the three intellectual stages,.... : skipping the study of facts because it is "not important" promising oneself to come back to it later; being "confused" because certain things mav not be clear,...
Inertia
Quantitative
Techniques
Impatience
; Skipping the process of analysis and reaching a conclusion without any backup reasoning,... : Doing everything as asked without questioning the logic,... : Trying to fit decision variable configurations in the process of decision making by trial and error; assuming that the problem cannot be solved,.... : Calling the decision making situation totally absurd, very tough, too much time-consuming,...
Notes
Acquiescence Gambling
Student Activity 1. Cite one example of decision-making where falsification makes the decision unfit for action. 2. What are the effects of acquiescence and semantics on a decision making situation? Semantics
Falsification : not being able to solve the problem, illegible recording of calculations and final outcomes,... Analogy, tabloid thinking, over generalization, etc. can also be included in the above list. Guarding against these containments is one of the major tasks of a good decision maker.
M a n a g e r i a l Decision M a k i n g S y s t e m
Reasoning consisting of logic and contaminants is part of any human decision, making process. After becoming aware of the contaminents let us now concentrate on the quantitative aspects of managerial decision making and the peculiar environment in which the managerial decision maker operates. The decision making task may be conceptualised as an input-output system as shown in Figure 1.1.
Figure 1.1: Decision Making System Every decision making task results in an output which is the evidence of the decision taken. In industry it is ultimately some kind of product, that is, a good service or on idea. The reasoning takes place in the Decision Making rectangle which is sometimes referred to as, quite appropriately, the black box. Here a transformation of the inputs takes place that results in the output. The transformation process has both physical and mental properties. On the input side a large number of variables may be listed. These variables can be classified in terms of the traditional factors of production, i.e., land, labor and
Selj-Instructional Material
^ ^ ^ ^ ^ l ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
capital as well as the more recently emerged complex variables related to systems, technology and entrepreneurship. Underlying this input-output system is a feed-back loop identified as managerial control systems. Its function is to optimize the transformation of inputs into the desired output, Seen in a nutshell, in industry optimization means the minimization of costs and the maximization of profits subject td legal, social and ideological constraints. The computer has forced the decision maker to very carefully delineate and quantify the variables that makeup the building blocks of the decision task. What is needed and how much is needed for decision optimization have become the important questions. In addition, the proper time sequencing of the decision variables within the decision process had to be understood. And all answers had to be unequivocally quantified. It soon became apparent to every decision maker that quantified variables had different properties and specific quantitative control mechanisms had to be designed. Not only was the decision maker confronted with variable-inherent properties, the decision tasks themselves have such peculiar quantitative properties. A variable, the building block of the decision task, may be seen as a small piece of a complex behavior. Buying a house, manufacturing a product, spending money on a show are examples of variables. Each variable represents a distinct dimension of the decision making task. So the decision space is always multidimensional, and it is a major task for the decision maker to find out which variables make up that space. If an important variable is overlooked, obviously the decision will be less than optimal, furthermore, the quantitative impact of the variable must be ascertained. And here the special variable-inherent properties come into play. The following illustrations may show the differences among the three types of variables. Deterministic variables can be measured with certainty. Thus, equal measures have equal cumulative impact, or, to use a simple illustration, a+a = 2a. Stochastic variables are characterized by uncertainty. Thus, a+a=2a+X, where X is a value that comes about because of the uncertainty that is associated with the variable. Heuristic variables are those that exist in highly complex, unstructured, perhaps unknown decision making situations. The impact of each variable may be explained contingent upon the existence of a certain environment. For example, a+a=3a but only if certain conditions hold. Actual industrial decision making situations in each case may involve the number of gallons of aviation fuel obtained by cracking a barrel of crude oil (deterministic), projected product sales given amount spent on advertising the product (stochastic) and the construction of a platform in outer space (heuristic). teelsion Making Environment
Role of Mathematics and Statistics in Business Decisions
Notes
Student Activity 1. Differentiate clearly between deterministic, stochastic and heuristic decision variables using appropriate examples. 2. Give two examples of decision-making situations where uncertainty is inherent.
The reason for the existence of a managerial hierarchy, that is, lower, middle and top management, finds itself in different parameters in which an organization operates. There are industry-wide and market-wide decisions that have to be made. Often these decisions must transcend domestic
Punjab Technical University 5
Quantitative
Techniques
Notes
considerations to incorporate international aspects. Such decisionsusually made by top managementoccur in a broad-based, complex, ill-defined and non-repetitive problem situation. Middle management usually addresses itself to company-wide problems. It sees to it that the objectives and policies of the organization are properly implemented and that operations are conducted in such a way that optimization may occur. Lower management is responsible for the conduct of operationsthe firing line so to speakbe this in production, marketing, finance or any of the staff functions like personnel or research. This decision environment is usually well-defined and repetitive. Obviously, with reference to a given decision making situation, the distinction between top, middle and lower management may become blurred. In other words, in any on-going business there is always a certain overlapping of the managerial decision making parameters. The study and analysis of the existence and interaction of these parameters is of great importance to the management systems designer or communication expert. From the quantitative managerial decision making point of view, their importance lies in recognizing their peculiar constraints and then to build the appropriate decision models and to select the best suited quantitative decision tools. A brief discussion of each environment in this light may enhance the understanding of the tools that are discussed later on.
Figure L2: Top Management Decision Environment The top management decision environment is shown in Figure 1.2. The company's approach to the domestic or international market is filtered through industry-wide considerations. What does the market want, what does the competition already supply? Where is our field of attack? Do we have the know-how, do we have the resources? What is the impact of our actions upon the market, our own industry and other industries? These are some of the questions that have to be asked, defined and answered. The problems are unstructured and complex. Thus, often a heuristic decision making process can be utilized to good advantage. Forecasting is of major importance and hence stochastic decision making is widely employed in this uncertain decision environment. But even a deterministic toolusually intended for decision making situations that assume certaintyinput-output analysis, can be effectively used in this environment.
Seq-lmmn usual Materia!
Middle management decisions are primarily company-wide in nature. As mentioned before and shown in Figure 1.3, these decisions steer the organization through its life cycle.
Notes -J oejfCTivE
rr
D I S S O L U T I O N
Figure 1.3: Middle Management Decision Environment Major features of a firm's life are objectives, planning, operation and the ultimate dissolution. The objectives are general and specific in nature. Obviously top management establishes the objectives, but middle management functions as their guardian. Indeed, as Figure 1.3 shows, every decision at this level must provide feed-back control for each of the other components. Planning refers to both policy execution as well as policy development. Scale of production, pricing of the product, product mix, in short the orderly and efficient arrangement of the input factors shown in Figure 1.2 is to be decided at this point. Making these factors into a product is the job of operations. It may appear somewhat odd that the decision environment includes attention being paid to the dissolution of the firm. The life cycle concept has been mentioned, and it will be encountered again as one of the major underlying conceptual aids in forecasting. It is well known that business organizations are born, live and die like natural organisms. Therefore decision making should always be cognizant of the possibility of dissolution. The lower management decision making environment represents a specialized, narrowly defined area within a company's total decision or operational field. Supervisory personnel of all types are operating in this environment. The decision tasks are normally well defined and repetitive.
Student Activity 1. Compare and contrast between different levels of decision making environment. 2. Is it possible to develop a single algorithm to suit all the decision making situations? Give reasons.
Managerial Decision M o d e l s and Algorithms

It is highly important that every decision maker, has a firm understanding of the philosophy upon which quantitative decision making is based. Under no circumstances is it sufficient to just know how to perform a certain quantitative analysis and to obtain a solution to be able to make a decision
njab Technical University 7
Quamimtiif
1 edmiques
Notes
To turn to the specific aspects of the quantitative decision making process, it is possible to recognize three distinct phases in every decision situation. Given a carefully defined problem, a conceptual model is generated first. This is followed by the selection of the appropriate quantitative model that may lead to a solution. Lastly, a specific algorithm is selected. Algorithms are the orderly delineated sequences of mathematical operations that lead to a solution given the quantitative model that is to be used. The algorithms generate the decision which is subsequently implemented by managerial action programs. The entire process is shown in Figure 1.4. P R O G R A NM S
D E F N IE D P R O B L E M
C O N C E P T U A L M O D E !
>
Q i l M i T I W M W J D E t
A L G O R T IH M S
D E C I S I O N
Figure 1.4: Phases of Decision Making
Problem Definition
Problem definition is a cultural artifact which is especially visible in a society's economic and industrial decision making process. Obviously if such cultural determinants are operative in the first phase of managerial decision making, their effect can be noticed at various stages in the process irrespective of the quantitative, thus hopefully objective, methods that are used in the design of the models and algorithms as well as the decision itself. In the private sectors of free enterprise economies, however, a manager's ability to recognize problems and even to anticipate problems that may emerge at some future time is vital to the survival of the firm. Those managers that make effective decision concerning a known problem are good administrators; those that in addition can recognize and anticipate problems are creative. It is known that creativity is partially inborn and partially acquired. Thus, the quantitative decision maker will not only try to master the methodology but also attempt to sharpen his or .her problem identification skillshis or her creativity. -
The Design of C o n c e p t u a l Models

The conceptual model represents the logic that underlies a decision. Based on this logic the quantitative model, and specific algorithms are constructed. The logic ma)' be a priori or empirical in nature. When shooting craps in a casino, a gambler has pre-established a conceptual model concerning the odds of the game. On a priori groundufing only his or her intellectin determining the odds of every roll of the dice, the concept dictates that the win of a seven or eleven on the first roll has likelihoods of V and /
36 2 3 6
, respectively. (There are
6 possible combinations of spots showing on 2 dice thai yield a seven and 2
combinations that yield an eleven with 36 combinations for all spots from two through twelve.) Given this conceptional model, quantitative models and algorithms can be designed that facilitate the betting decision. Now suppose that our gambler stumbles across a floating craps game in some dark alley. After observing the action on the pavement for a while, he notices that seven's and eleven's do not occur on the first roll with the likelihood dictated by his conceptual model. Rather there seems to be a preponderance of two's, three's or twelve'swhich he knows are losses. Crooked dice he may very quietly think to himself. For crooked dice an a priori logic which is based on the ideal situation in which every spot on a dice has an equal probability of occurring ( / ) and any spot on two dice as well (' / x / = V ) according to the multiplication theorem) is unsuitable. Rather, he will now ascertain by observation (by experiment) the empirical probabilities which are determined by the weights that have been cleverly or crudely (it is a dark alley) concealed in or on the dice. Once this empirical conceptual model has been generated, our gambler may continue the betting decision process in terms of the amount of the bets at each roll, etc. He may also redefine the problem and leave.
] 1 6 6 6 3 6
Notes
In the design of the conceptional model, it is important to observe that the decision maker clearly delineates the interrelationships that make up the realityor the systemsin which the problem occurs. But in the model building process it is virtually impossible to include all variables that have a bearing on the decision. The model includes only the major variables (endogenous variables) as seen from the decision maker's vantage point. Once the conceptual model has been designed and its logic expressed in terms of some systems configuration such as the graph or matrix or perhaps network or flow diagram, the quantitative models are simply superimposed by quantifying the logic. Once that has been accomplished a relatively minor task remains in the selection of the algorithms and the computerization of the process. This is shown in detail for every type of quantitative decision that is discussed in the chapters that follow. It may be surmised that in the "model" and Conceptual Model that is shown in Figure 1.4, the Defined Pre IS true. Many a decision components are the most important ones. Indee uiously to some extent, process has been needlessly and most of tim poor conceptual model commenced because of faulty problem definiti mfceome. building. Then there is no optimal or even saris:
Q u a n t i t a t i v e Student Activity
Given that 3 out of 10 electric bulb? fuse within the last month, would vou buy 3 new electric bulbs for the current month in advance? Explain and justify your decision. Describe a decisionmaking situation and identify various endogenous and exogenous variables for decision making.
Models
7%
Once the conceptual model has been properly designed, the quantitative model and its algorithms should almost "flow" out of it. The transition is natural, smooth, almost automatic. The quantitative model is selected from the many such models that have been designed by mathematicians. So while the decision maker will always build a conceptual model, the quantitative model is typically selected from an available pool of such decision making tools. The selection is made on the basis of either predominantly stochastic, deterministic or heuristic nature of variables. There are available quantitative models for each kind as d tssed in the following chapters, and the decision makers task is to select appropriate one for a given decision situation. "Know thy tools" shou be inscribed on every
!
Quantitative
Techniques
decision maker's desk. As it is possible to build a wall with a spade when the trowel would be the more appropriate tool, decision makers may sometimes misuse quantitative tools. The scope of quantitative managerial decision is vast, indeed. Any industrialized society's economic and industrial decisions are the most complex and important ones made by that society. The reader already has a good understanding of what these decisions entail. In order to provide a very brief overview of decision tasks, some examples may be cited of specific criteria for selecting quantitative models and algorithms. With respect to the former, the decision maker needs to establish a time frame. There are static and dynamic models. Static models are used when the decision process focuses on a single time period f. When there are several time periods (f, t t ,...) over a planning horizon, dynamic models should be selected. Secondly, the decision maker must analyze the degree of certainty vs uncertainty in the decision environment. Remember that uncertainty calls for stochastic models. Certainty calls for deterministic models. Any type of inferential decision, forecasting studv, quality control problem, waiting line or network analysis and simulation involves a high degree of uncertainty. Therefore, stochastic models are used. On the other hand automated production processes, allocation or inventory or transportation problems, and returns on investment or input-output analysis involve lesser degrees of uncertainty or sometimes, although rarely, complete certainty. In these decision situations, deterministic models may be employed. In some situations the decision maker does not know the variables or does not understand their characteristics. Deep sea mining or outer-space flight may come to mind. So, here heuristic models can be used.
u+p n+2
Notes
Student
Activity
The criteria for algorithm selection, once the quantitative model has been decided upon, rest with the methodological efficiency of the algorithm but also, at times, with the decision maker's training and preference for one over another method. In general, it may be said that the algorithm must render itself applicable to computerization; for, the period in time is very quickly disappearing when the managerial decision process is not based on the computer in some form. Furthermore, the algorithm must result in optimization. As an illustration for these three criteria of algorithm selection, the familiar, widely used linear programming quantitative model may be mentioned. This deterministic decision tool is used in manufacturing operations where the input iactors (see Figure 1.1) are transformed into a product, subject to wellknown and well-quantified constraints.
1. Lis! the various tools available (or quantitative decision making. 2. What are the bases on which you will setecl a particular model for decision making?
T h e Decision
The decision is the end product of a sequence of mental activities as illustrated in the preceding pages. To m a k e a decision does not necessarily mean that it gets carried out. In order to accomplish that, numerous managerial action programs are necessary They represent the physical extension to the decision
making process. This book stops at the point when the decision is rendered. The action programs, the physical component, cannot be discussed because they must be specifically designed for each situation. A good decision maker, however, will try to place the seeds for proper implementation into the decision.
Notes
Summary
Decision making occurs in all fields of human endeavor. Human decision making are intellectual processes involving both conscious and subconscious efforts comprising of three stages cognition, assembly and testing stage. Inertia and impatience are the two most common flaws in decision making. Dishonesty, inertia, impatience, acquiescence, gambling, semantics and falsification are some of the most prevalent contaminents in a good decision making. The conceptual model represents the logic that underlies a decision. A decision involves choice among several alternatives. In the most basic sense a decision always involves the answer to the question "to do or not to do?" Not to do (inaction) determines that decision. To make a decision does not necessarily mean that it gets carried out.
Keywords
Cognition stage: It is the starting point for the mind that has searched for facts in the environment in order to make a decision. Assembly stage: The assembly of recognized facts obtained in the first stage into usable information systems represents the second stage. Convergent: Convergent means the conventional grouping of data into a system. Divergent: divergent signifies unusual or new ways of relating the data. Testing stage: At this point the decision maker evaluates the cognites in harms of their relevancy to a given problem. Impatience: Skipping the process of analysis and reaching a conclusion without any backup reasoning. Acquiescence: Doing everything as asked without questioning the logic. Gambling: Trying to fit decision variable configurations in the process of decision making by trial and error; assuming that the problem cannot be solved. Semantics: Calling the decision making situation totally absurd, very tough, too much time-consuming.
Quantitative Technique*
Falsification: not being able to solve the problem, illegible recording of calculations and final outcomes.
Notes
R e v i e w Questions
1. 2. 3. 4. How does a manager use mathematics and statistics in decision making? Explain with the help of an illustration the decision making system. How do environmental factors play a role in decision making? Draw a step by step decision making algorithm taking upon illustration of any business decision.
F u r t h e r Readings
E R Tufte, The Visual Display of Quantitative Information, Graphics Press Anderson , D.R., D.J. Sweeney, and T.A. Williams, Quantitative Methods for Business, 5th edition, West Publishing Company R.S.Bhardwaj, Business Mathematics, Excel Books
Theory of Sets
Unit 2 T h e o r y of S e t s
Notes
Unit Structure
Introduction T h e Concept of a S e t Notations Representation of a S e t S o m e Basic D e f i n i t i o n s Theorem on S u b s e t s e V e n n Diagram Set Operations L a w s o f Union o f S e t s L a w s o f Intersection o f S e t s Law of Complement of a S e t T h e o r e m (On S y m m e t r i c D i f f e r e n c e ) De-Morgan's L a w s Applications o f V e n n D i a g r a m s Summary * Keywords Review Q u e s t i o n s Further Readings
Learning O b j e c t i v e s
After reading this u n i t y o u s h o u l d b e a b l e to: W e c " * s Define a set Represent a set in d i f f e r e n t n o t a t i o n s Identify different t y p e s of s e t s Use Venn d i a g r a m s to r e p r e s e n t a n d m a n i p u l a t e sets Apply various set o p e r a t i o n s Use set theory to s o l v e p r a c t i c a l p r o b l e m s
Introduction
fundamental notion. It has a great contribution in the The theory of sets i study of different branches of mathematics and has, a great importance in modern mathematics. The beginning of set theory was laid down by a German mathematician George Cantor (1845-1918).
S t u d e n t ActivityHow many elements there in each of following sets? are the
a. Set of the presidents of free India. b. Set of all the numbers which are both odd and even. c. Set of all the numbers which are prime and even. d. Set of all the numbers greater than 1. e. Set of all the MB As who are under-graduates.
The Concept of & S e t

A collection of well-defined objects is called a set. The 'objects' are called elements. The elements are definite and distinct. By the term 'u'ell-defined', we mean that it must be possible to tell beyond doubt, whether or not a given object belongs to the collection (set) under consideration. The term 'distinct', means that no element should be repeated. Following are the (ij The set of habet = ta, e, i, o, u}
'uniai> Technical University 13
Quantitative Technique
(ii) (iii)
T h set of all straight lines in a plane.

e
The set of odd numbers between 3 & 19 is |5, 7, 9 , 1 1 , 1 3 , 1 5 , 1 9 }
Notes
Notations
Sets are usually denoted by capital letters A, B, C, D, ... and their elements are denoted by corresponding small letters a, b, c, d, .... Note that it is not necessary that the elements of a set A are denoted by a. If a is an element of set A, then this fact is denoted by the symbol a e A and read as "a belongs to A". If a is not an element of A, then we write a i A and read it as "a does not belong to A."
R e p r e s e n t a t i o n of a S e t
A set can be represented in the following two ways: (i) Tabular form (or Roster method) and the tabular form is also called the listing method. If all the elements of a set are kept within {} and elements are separated from one another by comma (,), then this form of the set is called tabular form. E.g.= (1, 2, 3,...} = the set of natural numbers. (ii) Set builder form (rule method) Set builder form is also called the rule method. In this form, we specify the defining property of the elements of the sets, e.g., if A is the set of all prime numbers, we use a letter, usually x, to represent the elements and we write A = |x : x is a prime number) Note: When the number of elements in a set is smail, we use listing method, but when the number of elements in the set is large or infinite, we use the set builder form. Example 1 Write the set of the letters in the word CALCUTTA. Solution Since no element enters the set more than once. Therefore A, C and T will occur only once. Hence the required set is (C, A, L, U, T) Answer
Example 2 Write the following sets into the tabular form: (i) (ii) A = |x I 0 < x < 5, x is the set of integers).
Theory ofSets
Notes A = {x I 3 x - 12x = 0, x is a natural number).

:
Solution (i) The integers satisfying the inequality 0 < x < 5 are 1, 2, 3, 4 A = (1, 2, 3, 4} (ii) 3x -11% = 0 3x(x-4) = 0 =>x = 0&x = 4 Since x is a natural number. Therefore x - 4 Hence A = \4)
2
Student Activity
Write the following into the tabular form: sets
i. X = |t I 0 > t < 5, t is the set of integers]. ii. X = (t I 3 x - 12x < 18, t is a natural number|.
2
S o m e B a s i c Definitions
1. 2. Singleton set: A set containing only one element is called a singleton set. E.g. The sets (0), fx), { < ) > } all consist of only one element. Empty set: The set having no element is called the empty set or the null set or the void set. It is denoted by < > | or {}. E.g.: A - (x: x + 1 = 0, x is a real number). A is a null set. Since there is no real number satisfying the equation x + 1 = 0. So the set A is null set. 3. Subset: A set A is called a subset of B if every element of the set A is also an element of the set B. We write it as AcB, and read as 'A is a subset of B or A is contained in B'. In symbols, if A & B are two sets such that xeA=>xeB. Then A is a subset of B. Note: for every set A , A c A, i.e., A is itself a subset of A. Also note that the empty set < > j is always subset of every set. If A is not a subset of B. We denote A eg B. E.g.: If A = (a, b, c) and B - |a, b, c, e). Then clearly A c B. 4. Proper subset: If A is a subset of B and if there is at least one element in B which is not in A, then A is called the proper subset of B. E.g.: if A = |1, 2, 3) and B = (1, 2, 3, 4). Then A is a proper subset of B, and is denoted by A c B read as 'A is a proper subset of B'.
( 1 2
We can also say, B is a super set of A, and write it as BD A.
Quantitatii* Techniques
5.
Comparability: Two sets, A & B, are said to be comparable if either of the following conditions is satisfied: AcBorBcA.
Notes
E.g.: I. II. 6. If A = {1, 2, 3}, B = (1, 2 , 3 , 4, 5}. Then A c B s o that A and B are comparable. If A = {a, b, c} and B = (a}. Then B c A, and hence A & B are comparable.
Equal sets: Two sets are said to be equal if they contain the same elements, i.e., if every element of A is an element of B, and every element of B is also an element of A. It is denoted by A = B. Two sets are equal if and only if A c B and B c A. => A = B. E.g.: If A = (1, 2}, B = {1, 2} and C = {x: x - 3x + 2 = 0). Then A = B = C.
2
7.
Equivalent sets: Two sets are called equivalent sets, if and only if there is one to one correspondences between their elements. IfA = {a,b, cj and B = { 1 , 2 , 3 } . Then correspondence in the elements of A and of B is one to one, A is equivalent to B, we write it as A - B. Note: If two sets are equal, they are equivalent but two equivalent sets are not necessarily equal.
8.
Finite set: A set is said to be finite set if in counting its different elements, the counting process comes to an end. Thus a set with finite number of elements is a finite set. E.g.: I. II. The set of vowels = {a, e, i, o, u}. The set of people living in Delhi.
Infinite set: A set, which is neither a null set nor a finite set is called an infinite set. The counting process can never come to an end in counting the elements of this set. E.g.: The set of natural numbers^ {1, 2, 3, 4...}.
to
Set of sets: If the elements of a set are sets, then the set is called a set of sets. E.g.: [[a), |a,bl, {a, b, e } | Power set: The set of all possible subsets of a given set A, is called the power set A. The power set A is denoted by P (A). If the number of a set is n then No. of elements in power set is 2 .
n
E.g.: If A - {a, b. cj. 1 hen its subsets are
i {ah fb}, {c}, {a, b),{a, d {b, c), {a, b, c} P (A) = {*, {a}, (b), (c), {a, b}, {a, c), {b, c}, {a, b, c}} 12. Universal set: A set, which contains all sets under consideration as subsets, is called the universal set. It is denoted by U. Note: Different universal sets are used in different contexts. The choice of the universal set is not unique. E.g.: In the study of different set of letters of English alphabet, the universal set is the set of all letters of English alphabet. 13. Index set and Indexed set: Let A be non-empty set for each r in a set A.
r
Theory of Sets
NOTES
STUDENT ACTIVITY 1. If N(A) = m T H E NW H A T IS N(P(P(A)))? 2. If A, B AND C are T H R E E SETS SUCH T H A T : A. A C B B. FIEF AND
= <(>
In this case the sets A,, A , A . . . A , are called indexed sets and the set
2 y n
A = (1,2,3...n} is called index set. Here the suffix r e A of A is called

r
index. Such a family of set is denoted by {A ) r e A. e.g.: Let A, = (1, 2,3), A = { 3 , 4 , 5 , 6}, A, = {6, 7, 8), A = {1,4,5,12}, A =
2 4
C. A
nB
THEN WHAT C A N YOU SAY ABOUT N(C)?
{a, b, c, d, e}. Here we find that E a non-empty set Aa, cce A. Hence Ais called index set and the sets A, A A A A, are called indexed sets.
2 3 4
Theorem on Subsets
Theorem 1 The empty set is the subset of every set. Proof Let A be any set and < > | be the empty set. It is clear that < > j has no element of A. Thus t is a subset of A. Proved Theorem 2 Every set is the subset of itself. Proof Let A be any set. Therefore everv element of A is an element of the set A. By definition of the subset, the set A is the subset of the set A. Hence Ac A Proved Theorem 3 If A c B and B c A . then A = B. Proof
Techniques
From (1) and (2), x e A o x e B. => A = B.
Notes Theorem 4 If A c B and B c C . then A c C .

P r o o f
Proved
AcB .. x e A => x e B Again B c C . .-. x e B => x e C From (1) and (2), it is clear x e A =>x e C=> A c C Proved Theorem 5 If a set A contains n elements. Then P (A) contains 2" elements. Proof Let a a a ... a , be n elements of a set A. Then the number of subsets of A
v v y n
(1)
(2)
having one element each of the type { , } , {<7,},(rtJ...(rtJ isn = "C, The number of subsets of A having two element each of the type {ii a,},{a,,a^..\a
h nh
a)
is n
"C,
The number of subsets of A having three element each of the type \a a, a},\n.n,, a.}... is n = C,
r
The number of subsets of A having n element each of the type is = C Also, the null set < j ) is the subset of A, its number is = C .
0 n
Hence, the total number of subsets of A P(A) = "C+ "C,+ "C ... + "C,
2 :
Or Or
(A) = (1 + 1)" (A) = 2".
[by Binomial theorem]
Thus, the number of subsets of a set with n elements is 2". Example 3 Are the sets*)), |<t>), |0{ different?
Solution < j ) is an empty set, i.e., it contains no element. The set { < { > } is a singleton set, for it contains an element cj). The set {0} is a singleton set, for it contains an element 0.
Notes
Theory of Sets
Venn D i a g r a m
Sometimes, relationship between the sets is easily expressed by means of diagrams called Venn Diagrams. Euler was the first mathematician who used circles to represent sets. Universal set U is always shown by a rectangle. Venn diagram has represented some sets in the following different conditions: 1. Universal set U = | 1 , 2, 3...25)
-v. U ,
Student
Activity
1. If X is the set of all the letters in ABRACADABRA, what is the total number of proper subsets of X? 2. Enumerate the power set of the set A where,
Subset A = { 1 , 2 , 3,10,14).
A
1 4
2.
Universal set U = {1, 2, 3...25) A = { 1 , 2 , 3 , 1 0 , 14), 13 = {2, 3) Clearly B c A.
3.
Universal set U - { 1 , 2 , 3...25) No common elements A = { 1 , 2 , 3) B = {4, 5, 6)

/
4.
Universal set U = 11,2, 3...25) A = {3, 4, 6, 7} B = {4, 6, 10, 12)
A 3
/ 4\
B 10
y
yL
The elements 4, 6 of the sets A & B are common.
Set O p e r a t i o n s
1. Union of Sets: The union or join of two sets A and B, written as A u B (read as A cup B), is the set of all elements, which are either in A, or in B, or in both.
Punjab Technical Unr;:ers;<y 19
Quantitative
Techniques
u
" X / Notes B \
)
(A O B )
Thus A u B = (x: x e A or x e B). Here 'or' means that x is in atleast one of A or B, and may lie in both. The common elements of A and B are taken only once in A u B. The Venn-diagram for A u B is given by the shaded area in the above figure. Clearly A and B are each a subset of A u B. E.g. and Then Remark i ii. iii. iv. When Be A, then A u B = A A and B are both subsets of A u B, i.e., A c ( A u B ) and Bc(AuB) AuA = A If n,be the number of elements in A and n be the number of elements in B, then the number of elements of A u B cannot exceed n, -t n ; for the elements common to A and B are to be counted only once in AuB.
2 2
If
A = |1,2, 3,4, 5} B = {2, 4, 5, 6,11,12) A u B = |1,2, 3, 4, 5, 6 , 1 1 , 1 2 ) .
Intersection of Sets: The intersection or meet of two sets A and B written as A n B (read as A cap B), is the set of all elements that belong to both A and B. Thus A n B = l x : x e A and x e B).
The Venn-diagram for A rTf54's^given shaded area. Clearly A n B is a subset o
>ove figure by the id B. Also An A =
E.9.: If A = ( 1 , 2 , 3 , 4 , 5 )
20 Self Instructional Material
B = (2, 4, 5, 6 , 1 1 , 12} T h e n A n B = {2, 4 , 5 } . Disjoint Sets: Two sets A and B are said to be element in common, i.e., A n B = < > f
disjoint,
if they have no
(A n B = * f > ) E.g.: The set of all even integers and the set of all odd integers are disjoint sets. Remarks i. ii. When A and B are two disjoint sets, then A n B = f Each of the sets A and B contains A n B as a subset, i.e., (A n B) c A and ( A n B ) c B.
Difference of two sets: The difference of two sets A and B denoted by A - B (read as A minus B), is the set of all elements of A which are not inB. Thus A - B = (x : x e A, x B} Similarly B - A = {x : x e B, x ( A}
The shaded area in the above Venn - diagram represents the sets A - B and B - A
E.g.:
If A = {a, b, c, d, e} & B = (d, e, p, qj T h e n A - B - |a, b , cj.
Complement of a set: The complement of a set is defined as another set consisting of all elements of the universal set which are not elements of the original set.
Quantitative Technique}
7 h complement of the set A for the universal set U is generally

e
denoted by A', and thus A' = {x : x < A) Notes or A' = {x : x e U but x t A}
(A' = U - A ) E.g.: Also it is clear that (intersection form) i. If U = {1,2, 3, 4,5} and A = {1,2,4} Then A' = {3, 5} ii. If U is the set of all letters of English alphabet and A the set of vowels then A' is the set of letters of the English alphabet other than the vowels.
6.
Symmetric difference: If A and B are two sets, then the set (A - B) u (B - A) is called the symmetric difference of A and B, and is denoted by A A B.
Thus, the symmetric difference of the sets A and B is the set of all elements of A and B, which are not common to both A and B. Thus A A B = {x : x e A and x i B or x e B and x < A| Eg.: If Then and A = { 1 , 2 , 3}, B = {2, 3,4,5J
A -B ={1,2, 3} - { 2 , 3 , 4, 5} = {1} B - A = {2, 3, 4,5} - { 1 , 2 , 3} = {4,5} ML
.-. A A B = (A - B) u (B - A) = { l } o {4, 5} = {T4,5}.
Student Activity 1. Write a set expression for each of the following Venn-diagrams, (i) I I Notes
Theory of Sets
(iii)
(iv)
(v)
2. If A = |1,2,3| AuB=|l,5,2,3,6,7,8) B n C = 11,6,81 Can you determine the set C? As a human resource manager of your company you deal with marketing executives and accountants. What inferences can you deduce from the following scenarios? A M
(i)
fi)
Q0 .
(iii) (iv)
Where A is the set of accountants and M that of the marketing executives.
we Techniques
L a w s o f Union o f S e t s
If A, B, C are any three sets, (|> the empty set and U the universal set. Then their union follows the following laws: 1. 2. 3. 4. Idempotent law: A u A - A Commutative law: A u B = B u A Associative law: ( A u B ) u C = A u ( B u C ) Identity law: (a)Au<j) = A, (b)AuU = U
Notes
Proof
1. Let x e A u A =>xe A o r x e A =>xeA i.e., A u A c A Let y e A. Then y e A => y e A or y e A => y e A u A i.e., AcAuA (2) (1)
From (1) & (2), we have A u A = A. 2. T o prove A u B = B u A Let x e A u B =>xeAorxeB
=> x e B or x e A => x e B u A i.e., A u B c B u A (1)
Let y e B u A => y e B or y e A => y e A o r y e B => y e A u B i.e., B u A c A u B (2) S
From (1) & O, we have A u B = B u A . 3. T o p r j v e ( A u B ) u C = Au ( B u C ) Letxe(AuB)uC

=5>
= c > x e (A u B) or x e C
(x e A or x B) or x e C
=> x e A or (x e B or x e C) => x e A or (x e B u C) 24 Self-Instructional Material
i.e., x e A u ( B u C ) Let y A u (B u C) => y e A or (y

F
(1)
B u C)
y e A or (y e B or y e C) => (y e A or y e B) or (y e C) ^(yeAuB)or(yeC) =>ye(AuB)uC i.e., Au(BuC)c(AuB)uC (2)
From (1) & (2), we have (AuB)uC = Au(BuC) 4. To prove A u < ) ) = A and A u U = U Let x e A u =>xeAorxe<)) => x e A i.e., A u < j > c A But A c A u < t > From (1) &(2), we have Auij) = A Again let y e A u U => y e A or y U =jyeU i.e., A u U c U But U c A u U From (1) and (2), we have AuU-U (1) (2) (1) (2)
Laws of Intersection of Sets

If A, B, C are any three sets, then their intersection follows the following laws: 1. 2. 3. 4. 5. Idempotent law: A n A = A Commutative law: A n B = B n A Associative law: ( A n B ) n C = A n ( B n C ) Identity law: A n U = A, A n<j>= Distributive law: A u (B n C) = (A u B) n (A u C) An(BuC)= (AnB)u(AnC)
antitative Techniques
Proof
1. Notes
To prove A N A = A Let x A
N
=> x e A and x eA
=> X 6
A
N
i.e. A Let y A
AcA
(1)
=> y e A y => y
G G
A and y e A A
N
A (2)
i.e., A c A n A From (1) and (2), we have A 2.

N
A=A
To prove A N B = B N A Let X
G
ANB
=> X G => X G => X G
A and
X G G
B A
B and X BnA
i.e., A n B c B n A Let y
G
(1)
BnA => y => y => y

G G G
B and y e A A and y AnB (2)

E
i.e. B n A c A n B From (1) and (2), we have AnB=BnA 3.
To prove (A n B) n C = A n ( B n C ) Let
X
e (A n B) n C
=> X G = > (X G X
(A n B) and A and
X E E
X G
C
X
B) and B and
eC C)
e A and (X A and
(X
X G
X G => X
e B n C)
e A n (B n C)
i.e., ( A o B ) n C c A n ( B n C ) Let y e A n (B n C) => y

E
(1)
A and y
(B n C)
=> y e A and (y e B and y e C) = > ( y e A and y G B) and y e C =>y G ( A n B ) n C i.e. A n ( B n C ) c ( A n B ) n C From (1) and (2), we have (A n B) n C = A n (B n C) To prove A n U = A, An<j) = (j) Let x A n U ^ x e A and x => x A i.e., A n U c A Again, let y eA => y G A and y e U i.e. A c A n U from (1) and (2), we have AnU = A Again to prove A n $ = < j ) The set A n tj) is the set of all those elements, which are common to both A and the empty set (j). But < j ) contains no elements. Therefore the set A n (j) contains no elements, Thus A n < > j =< j > . (i) To prove A u ( B n C ) = ( A u B ) n ( A u C) LetxeAu(BnC) => x e A or (x e B n C) x e A or (x e B and x e C) (x & A or x => => i.e.,
G G
(2)
U [Because A c U] (1)
[Because A c U] (2)
B) and (x
G
A or x e C)
(x e A u B) and (x
A u C)
x e (AuB)n(AuC) Au(BnC)c(AuB)n(AuC) (1)
Again, let y e ( A u B ) n ( A u C )
Techniques
( y e A u B ) a n d (y e/A u C )
=>
N
(y e A or y eB) and (y e A or y e C) y e A or (y e B and y e C) y e A or (y e/B n C) yeAu^nC) (2)
, .
; e
=> =>
Student Activity 1. Prove that A n (B u Q = (AnB)u(AnC) 2. Simplify the expressions. (0 Au(AnB) following
=> i.e.,
(AuB)n(AaQcAu(BnC) From ( 1 ) and (2), we have A u (B njC) = (A u B) n (A u C).
(ii) A u A n A (iii)Uuf (iv) U n ^ (v) UuU
(ii)
T o prove A n ( B u C ) = ( A n B ) u ( A n C ) . Let x e A n (B u C) x e A and (x e B u C) => => => x e A and (x e B or x e C) (x e A and x e B) or (x e A and x e C) (x e A n B) or (x e A n C) x e (A n B) u (A n C) i.e., An(BuC)c(AnB)u(AnC) (1)
(vi) U n U (vii)A u $ (viii) A n < > j (ix) 0 n # (x) 0u^ the following
3. Obtain sets. (xi)
AwB
(xii) A u B (xiii) A nB (xiv) A n B (xv) A wB Where, A= 1 1 , 2 , 3 , 4 , 5 , 6| B= 11,3,5)
Similarly, let y e ( A n B ) u ( A n C ) => => => => => i.e., y e (A n B) or y e (A n C) ( y e A and y eB) or (y e A and y e C ) y e A and (y eB or y e C) y e Aand(yeBuC) y e An(BuC) (AnB)u(AnQcAn(BuC) (2)
From (1) and (2), we have A n ( B u C ) = ( A n B ) u ( A n C).
L a w of C o m p l e m e n t of a Set
If < > | is the empty set and U the universal set and A be any one of its subsets. Then the component of A, i.e., A' follows the following laws: 1. 2. 3. Proof 1 To prove A u A' = U AuA' = U A n A' = < ) ) (A')' = A
7ft
Îf-lrtKtrurtinna}
M/itrri/il
Since every set is a subset of a universal set, therefore the set A u A' c U Again, let x e U, then Notes
x e
Theory of Sets
(1)
U => x e A or x A'. => x e (A u A')
i.e.,UcAuA' From (1) & (2), we have A u A' = U Proof 2 To prove A n A' = < ) > Since the null set < > | is a subset of every set, therefore it follows that $ c A n A' Again, let x e A n A' => x e A and x e A' ^ x e A and x i A => x e < > | i.e. A n A' = $ From (1) and (2), we have A n A' = < > j Proof 3 To prove (A')' = A. Let x e (A')' Then x e (A')' => x e A i.e. ( A ' ) ' c A Again, let x e A Then x G A
=> X G
(2)
(1)
(2)
Student Activity
1. Prove that ((A')')' = A' 2. Prove that AnA' = ty
^xf?A'
(1)
=> x A'
(A')' (2)
i.e. A c (A')' From (1) and (2), we have (A')' = A.
Punjab Technical University
29
Quantitative
Techniques
T h e o r e m (On S y m m e t r i c Difference)
If A and B are any two sets. Then
Notes
A A B = (A u B) - (A n B). Proof Let x A A B. Thenx e ( A - B ) u ( B - A ) o (x e A and x

t
B) or (x e B and x ^ A )
< = > (x e A and x e B) or x e B and [x e A and x e B] or x e A o < = > [x

G
B or (x
A and x B)] and [x A or (x

G
e A
and x or x
G G
B)] A) and (x
i A
[(x e B or x e A) and (x xB)]
B or x g B)] and [(x
<t A
or
< = > < = > o
[ x e A u B and x G U ] and [x G U and x t A n B)] x e ( A u B ) and x g ( A n B) XG [ ( A u B ) - ( A n B ) ]
i.e. A A B - (A u B) - (A n B).
De-Morgan's L a w s
If A, B and C are three sets then 1. 2. 3. 4. Proof 1. To prove (A u B ) ' = A ' n B ' . Let x
G
(AuB)' = A'nB'. (AnB)' = A'uB\ A-(BuC)= A-(BnC)= (A-B)n(A-C). (A-B)u(A-C).
(A u B ) '
=> x => x => x
t A
uB
or x $. B. L
G
A' and x
B'
=>xeA'oB' i.e. ( A u B ) t A ' n B ' Again, Let y e A ' n B ' ye A'nB' => y
6
(1)
A' and y e B'.
=>VAorygB
=>ys?(AuB)
=> y i.e. A ' n B ' c ( A u B ) ' From (i) and (ii), we have (A u B) '= A' n B'. 2.
(A (2)
B)'
Theory of Sets
NOTES
To prove (A n B ) ' = A ' u B ' . Let x e (A n B)' => x

t
A h B
0
=> x A or x => x => x

E E
B.
A' and x e- B' A'

U
B'
(1)
i.e. ( A n 8 ) ' c A ' u B ' Similarly, we can show A' u B ' c (A n B ) ' From (i) and (ii), we have (A n B) '= A' 3.
U
(2)
B'.
To prove A - (B u C) = (A - B) n ( A - C ) . Let x e A - (B u C) => => => => x 6 A and x ( B u C ) x e A and ( x ? B and x ? C ) (x e A and x ? B ) and ( x e A and x e C) x e (A - B) n x e (A - C) xe (A-B)n(A-C). i.e., A - ( B u C ) c ( A - B ) n ( A - C ) Similarly, we can show (A-B)n(A-QcA-(BuC) From (i) and (ii), we have A-(BuC) = (A-B)n(A-C) (2) (1)
STUDENT ACTIVITY 1. Prove that A - (B n C ) = (A-B)o(A-C) 2. Use Venn diagram to prove the above identity.
A P P L I C A T I O N S O F V E N N
D I A G R A M S
Some important results from Venn diagrams are, if A and B are two non empty intersecting sets, then (1) (2) (3) (4) n ( A u B ) = n(A) + n ( B ) - n ( A n B ) n(AuB) = n(A-B) + n(AnB) + n(B-A) n (A - B) + n (A n B) = n (A) n (B - A) + n (A n B) = n (B).
P - i i n i n h T e r h n i r a t University
31
Quantitative Techniques
Example 4
If A = { 1 , 2 , 3 , 4 } , B = { 2 , 3 , 5 , 6} and C = { 4 , 5 , 6 , 7 } . Then verify that

N o t e s
i) ii)
Au(BnC) = (AuB)r>(AuC). An(BuC) = (AnB)u(AnC)
Solution (i) B n C = {2,3,5,6} n {4, 5, 6, 7} = {5, 6}. A u ( B n C ) = { 1 , 2 , 3 , 4} u {5, 6} = | 1 , 2 , 3 , 4 , 5 , 6 ) A u B = { 1 , 2 , 3 , 4 } u {2, 3, 5, 6} = {1, 2 , 3 , 4 , 5 , 6} A O C = | 1 , 2 , 3 , 4 } U {4, 5, 6, 7} = {1, 2 , 3 , 4 , 5 , 6 , 7 } . (A u B) n (A u C) = (1, 2, 3, 4, 5,6} n (1,2, 3,4,5,6, 7} = {1,2,3,4,5,6} From (1) & (2), we have A u ( B n C ) = ( A u B ) n ( A u C). (ii) B u C = { 2 , 3 , 5 , 6 } u {4, 5, 6, 7} = {2, 3 , 4 , 5 , 6 , 7}. A n ( B u C ) = { 1 , 2 , 3 , 4 } n {2, 3, 4, 5, 6,7} = {2,3,4} Again A n B = {1, 2, 3, 4} n {2, 3, 5, 6} = {2,3} A n C = { l , 2 , 3 , 4 } n { 4 , 5, 6, 7} = {4} (A n B) u (A n C) = {2,3} U {4} = {2,3,4} From (1) & (4), we observe A n (B u C) = (A n B) u (A n C) Example 5 If is the set of complex numbers and A = {x e c : x - 1 = 0), B = {x e c : x - 1 = 0). Find A - B and A u B. Solution We have x - 1 = 0 =>(x + l ) ( x - l ) = 0 => x = 1, i Since, x e C => x = i A = {- i,i} Again x - 1 = 0 ^ ( x - l ) ( x + x + 1) = 0
2 3 2 2 4 3 4
(1)
(2)
(1) (2) (3)
(4)
32
Self-Instructional Material
=> x = 1 and x = (-1 Vl-4) / 2 =>x= 1 a n d x = (-l + ^ 3 i ) / 2 Since, x e C => x = (-1 W3) / 2 .-. B = {(-1 + iV3) / 2, (-1 - W3) / 2 A - B ={-i, i] - {(-1 + iV3) / 2, (-1 - W3) / 2} Notes
Theory of Sets
= Ki.
Answer A u B = {-i, i} u {(-1 .+ bb) I 2, (-1 - W3) / 2}
=Ki (-l+W3)/2,(-l-W3)/2}.
/
Answer Example 6 Sets A and B have 3 and 6 elements respectively. What is the least number of elements in A u B? Solution The number of elements in the set A u B is least, when A c B . Then all the 3 elements of A are in B. Since B has 6 elements. Therefore, the least number of elements is 6. It is clear from the fact that A c B =)AuBcB .-. n (A u B) = n (B) = 6 Answer Example 7 In a group of athletic team in a certain institute, 21 are in the basket ball team, 26 in the hockey team, 29 in the football team. If 14 play hockey and basketball, 12 play football and basket ball, 15 play hockey and foot ball, 8 play all the three games. (i) (ii) How many players are there in all? How many play only football? Student Activity
1. Shade the area represented by the following pairs of expressions and verify that they are equal.
(i) (AuB)nC a n d (AnQUBnQ

(ii) A u A
and
U
Solution Given: No. of basket ball players B = 21, No. of hockey players H = 26, No. of football players F = 29, No. of players playing hockey & basket ball both H n B = 14
2. In a group of athletic team in a certain institute, 21 are in the basket ball team, 26 in the hockey team, 29 in the football team. If 14 play hockey and basketball, 12 play football and basketball. 15 play hockey and football. 8 play all the three games.
33
No. of players playing football & basket ball both F n B = 12 No. of players playing hockey & football both F n H = 15.
Notes
Therefore (i) The no. of total players = n ( B U H U F) = n (B) + n(H) + n(F) - n(B n H) - n(H n F) - n(B n F) + n(F n B n H) = 21 + 26 + 2 9 - 1 4 - 1 2 - 1 5 + 8 = 43
No of players playing football, basket ball but not hockey = 1 2 - 8 = 4. No of players playing foot ball, hockey but not basket ball = 15-8=7 No of players playing football only = No. playing foot ball - (no. Playing football and hockey + No. playing foot ball, basket ball & hockey) = 2 9 - ( 7 + 4 + 8) = 10. Answer
Summary
The beginning of set theory was laid mathematician George Cantor (1845-1918). down by a German
A collection of well-defined objects is called a set. The 'objects' are called elements. The elements are definite and distinct. A set can be represented in the following two ways - tabular form (or Roster method) and set builder form. A set containing only one element is called a singleton set. The set having no element is called the empty set or the null set or the void set. v A set A is called a subset of B if every element of the set A is also an element of the set B. If A is a subset of B and if there is at least one element in B which is not in A, then A is called the proper subset of B. Two sets are said to be equal if they contain the same elements Two sets are called equivalent sets, if and only if there is one to one correspondences between their elements. A set is said to be finite set if in counting its different elements, the counting process comes to an end.
The set of all possible subsets of a given set A, is called the power set A. The difference of two sets A and B denoted by A - B (read as A minus B), is the set of all elements of A which are not in B. The complement of a set is defined as another set consisting of all elements of the universal set which are not elements of the original set.
Theory of Sets
Notes
H)
Keywords Tabular form (or Roster method): If all the elements of a set are kept within {J and elements are separated from one another by comma (,), then this form of the set is called tabular form.
iswer '=4.
Set builder form (rule method): Set builder form is also called the rule method. In this form, we specify the defining property of the elements of the sets. Singleton set: A set containing only one element is called a singleton set. E.g. The sets {0}, {x}, jf} all consist of only one element.
ry +
Empty set: The set having no element is called the empty set or the null set or the void set. It is denoted by f or {}. Subset: A set A is called a subset of B if every element of the set A is also an element of the set B. We write it as A eB
swer
Equal sets: Two sets are said to be equal if they contain the same elements Equivalent sets: Two sets are called equivalent sets, if and only if there is one to one correspondences between their elements.
'man are h (
o r
Finite set: A set with finite number of elements is a finite set. Infinite set: A set, which is neither a null set nor a finite set is called an infinite set. Set of sets: If the elements of a set are sets, then the set is called a set of sets. Power set: The set of all possible subsets of a given set A, is called the power set A.
'r the
Universal set: A set, which contains all sets under consideration as subsets, is called the universal set. It is denoted by U. Intersection of Sets: The intersection or meet of two sets A and B written as A n B (read as A cap B ) , is the set of all elements that belong to both A and B. Complement of a set: The complement of a set is defined as another set consisting of all elements of the universal set which are not elements of the original set. Review 1. Questions
yo an Is not
) one the
Is the set A = )x: x + 5 = 5} null? Write down all the subsets of the set |1, 2, 3j.
2.
35
3. 4.
How many subsets of the letters of the word ALLAHABAD will be formed? Are the following sets equal? (i) (ii) A = (x: x is a letter in the word WOLF}, B = {x: x is a letter in the word FOLLOW}.
Notes
5. 6. 7. 8.
If A c B, B c C and C c A. show that B = A. If A = {1, 2, 3, 4}, B = { 2 , 3 , 4 , 5 } & C = {4, 5, 6, 7}, find A - (B - C). If A = (I, 3, 6,10,15,21}, & B = {15,3,6}, find (A - B) n (B - A). If X = {1, 2 , 3 , 4 , 5 } & Y = { 1 , 3 , 5 , 7 , 9 } , find the values of X n Y and (X-Y)u(Y-X).
9. 10. 11. 12. 13.
If A = {1, 2, 3 , 4 , 5} & B = { 1 , 3 , 5 , 7 , 9 ) , find the symmetric difference of A&B. If A = {a, b, c, d), & B = {e, f, c, d}, find A A B. If A = A u B, show B = A n B. If A & B are two sets, find the value of A n (A u B ) . If A, B are subsets of a set S, and A', B' are the complements of A& B respectively. Prove that A c B B' c A'.
14.
Prove that for any two sets A & B , (A - B) u (B - A) = (A u B) - (A n B).
Further Readings
P. N. Mishra, Quantitative Techniques for Managers, Excel Books D.R., D.J. Sweeney, and T.A. Williams Anderson, Quantitative Methods for Business, 5th edition, West Publishing Company E R Tufte, The Visual Display of Quantitative Information, Graphics Press
36 Self-lnstnutional
Mnterinl
Logarithms & Progressions
Unit 3 L o g a r i t h m s &
Notes
Progressions
Unit Structure Introduction Logarithms Laws of Operations Compound Interest Arithmetic Progressions Geometric Progressions Annuities, Loans and Mortgages Methods of Investment Evaluation Perpetual Annuities and Infinite Series Depreciation Summary Keywords Review Questions Further Readings Learning Objectives After reading this unit you should be able to: Define logarithms Prove and use logarithmic operations Compute compound interest Define and use arithmetic progressions Define and use geometric progressions Calculate annuities, loans and mortgages Evaluate investments by different methods Compute perpetual annuities using infinite series
Introduction
Mathematical tools have been developed time to time to simplify mathematical computations. Logarithms are mathematical tools that convert multiplication, division, exponentiation and root operations into addition, subtraction, multiplication and division operations respectively.
Logarithms
When we have two numbers such as 4 and 16, which can be related to each Dther by the equation 4 = 16, we define the exponent 2 to be the logarithm of 16 to f he base of 4, and write
2
log4 16 = 2
it is clear from this example that the logarithm is nothing but the power to which a base (4) must be raised to attain a particular number (16). In general, we may state that y = b' o f = ^ y which indicates that the log of y to the base b (denoted by logb y) is the power to which the base b must be raised in order to attain the value y. For this reason, it is correct, to write
This implies that any positive number y must posses a unique logarithm f to a base b > 1 such that the larger the y, the larger its logarithm. As y is necessarily positive in the exponential function y = b ; negative number or zero cannot a logarithm.
l
The base of the logarithm, b > 1, does not have to be restricted to any particular number, but in actual log applications two numbers are widely chosen as bases - the number 10 and the number e. When 10 is the base, the logarithm is known as common logarithm, symbolized by logio (or if the context is clear, simply by log). With e as the base, on the other hand, the logarithm is referred to as natural logarithm and is denoted either by log or by In (for natural log). We may also use the symbol log (without subscript e) if it is not ambiguous in the particular context.
e
Common logarithms, used frequently is computational work, are exemplified by the following: logio 1000 logio 100 logio 10 logio 1 logio 0.01 logio 0.01 = = = = = = 3 2 1 0 -1 -2 [because 1 0 = 1000] [because 10 = 100] [because 10 = 10] [because 10 = 1] [because 1 0 == 0.01] [because 10" == 0.01]
2 -1 1 2 3
There is a close relation between the set of numbers immediately to the left of the equals signs and the set of numbers immediately to the right. From these, it should be apparent that the common logarithm ofâ number between 10 and 100 must be between 1 and 2 and that the common logarithm of a 1 and 10 must be a positive fraction, etc. The exact logarithms can easily be obtained from a table of common logarithms or electronic calculators with log capabilities. In analytical work, however, natural logarithms prove vastly more convenient to use than common logarithms. Since, by the definition of logarithm, we have the relationship. y = c
!
&t=log y(ort = lny)

e
it is easy to see that the analytical convenience of e in exponential function Logarithms & Progressions will automatically extend into the realm of logarithms with e as the base. The following example will serve to illustrate natural logarithms: lne = log e = 3
e 3 3 N o t e s
In e = loge e = 2 In e = loge e = 1 In 1 = loge e = 0 In -= log e e

e _ 1 1 1
=-1
The general principle emerging from these examples is that, given an expression e", where n is any real number, we can automatically read the exponent n as the natural log of e". In general, therefore, we have the result that In e" = n. Common log and natural log are convertible into each other; i.e., the base of a logarithm can be changed, just as the base of an exponential expression can. A pair of conversion formulas will be developed after we have studied the basic rules of logarithms.
Student Activity
1. If l o g 6 4 = 3 , what is the value of x?
x
2. If log3X = 21og9X, what is the value of x?
L a w s of Operations
Logarithms are in the nature of exponents; therefore, they obey certain rules closely related to the rules of exponents. These can be of the great help in simplifying mathematical operations. The first three rules are stated only in terms of natural log, but they are also valid when the symbol In is replaced by logb. R u l e I ( l o g of a p r o d u c t ) in{uv)-lnu + lnv Example 1 /(eV)=/c Example 2 In(Ae )= In A + In e + In A + 7 R u l e II ( l o g of a q u o t i e n t ) ln(u/v)=lnu-lnv Example 3 ln[e
2 7 7 6
(u,v>0\
+lne +lne =6 + 4 = 10
> (u,v>0)
/c)=lne
-lnc = 2-lnc
Example 4 ' ln(e /e )=lne -Ine

2 5 2 5
=2-5
-3
humtitative Techniques
Rule III (log of a power) Inu" =alnu (u>0)
Notes Student Activity

1. What is the value of
Example 5 InA
3
=3/nA
lne ?
2. Show that
15
log^ x.log v.log

;
These three rules are useful devices for simplifying the mathematical operations in certain types of problems. Rule I serves to convert, via logarithms, a multiplicative operation (uv) into an additive one (In u + In v); Rule II turns a division (u/v) into a subtraction (In u - In v); and III enables us to reduce a power to a multiplicative constant. Moreover, these rules can be used in combination. Example 6 ln(uv")-lnu + lnv" -lnu + alnv
3. What are the values of the following logarithms? (a) log 10.1000
10
(b) logio 0.0001 (c) l o g j S l (d) logs 3125 4. Evaluate the following: (a) l n e (b)l
0 g e 2
Your are warned, however, that when we have additive expressions to begin with, logarithms may be of no help at all. In particular, it should be remembered that ln(uv)*lnu*lnv
e-4
3
(c) l n f l / e ) (d)l
0 g e
(l/e )
Let us now introduce two additional rules concerned with changes in the base of a logarithm. Rule IV (conversion of log base)
(e) ( e ) ! (f) l n e " - e * 5. Evaluate the following by application of the rules of logarithms: (a) l o g ( 1 0 0 )
10 14 l n
ln3
The rule, which resembles the chain rule in spirit (witness the "chain"
e
I (b) log,
0
~~^ b ~ ^ 7 ) ,
e
1 1
enables us to derive a logarithm log u (to base e) from the

e
100 (c) ln(3/B) (d) In Ae

2
logarithm log u (to base b), or vice versa. Rule IV can readily be generalized to
W lnABe-
ty
w)
(f) (log4e)(log,64) 5. Which of the following are valid?
where c is some base other than b. Rule V (inversion of base) log e =

b
w
(a) In u ~ 2 In
1
log b
e
(t>) 3 + In v = In
v
(c) In u + In v
uv
In v v = In
w
This rule, which resembles the inverse-function rule of differentiation, enables us to obtain the log of b to the base e immediaLely upon being given the log of e to the base b, and vice versa. (This rule can also be generalized to the form log c = yiog b).
b c
(d) In 3 + In 5 = In 8 Prove that
In ( / V) = In u - In u .
From the last two rules, it is easy to derive the following pair conversion formulas between common log and natural log:
Iog N = (log eXk)g N) = 0.4343log N

10 10 e e
logio N = (loge 1 0 X l o g N) = 2.3026log N

l 0
1 0
for N a positive real number. The first equals sign in each formula is easily justified by Rule IV. In the first formula, the value 0.4343 (the common log of 2.71828) can be found from a table of common logarithms or an electronic calculator; in the second, the value 2.3026 (the natural log of 10) is merely the reciprocal of 0.4343, so calculated because of Rule V. Example 7 log, 100 = 2.3026(/o 100)=2.3026(2)=4.6052. Conversely, we have
10
N o t e s
Student Activity
1. At what rate Rs 5,000 must be invested compounded annually so that amount receivable after 20 years is Rs. 20,000? 2. Solve the equation. following
log 100 = 0.4343(Jo 100)=0.4343(4.6052)=2.

w c
Compound Interest
If we are getting a return of 10 % in one year what is the return we are going to get in two years? 20 %, right. What about the return on 10 % that you are going to get at the end of one year?; If we also take that into consideration the interest that we get on this 10 % then We get a return of 10 + 1 = 11 % in the second year making for a total return of 21 %. This is the same as the compound value calculations that you must have learned earlier. Future Value = (Investment or Present Value) * (1 + Interest)"" The compound values can be calculated on a yearly basis, or on a half-yearly basis, or on a monthly basis or on continuous basis or on any other basis you may so desire. This is because the formula takes into consideration a specific time period and the interest rate for that time period only. To calculate these values would be very tedious and would require scientific calculators. To ease our jobs there are tables developed which can take care of the interest factor calculations so that our formulas can be written as: Future Value = (Investment or Present, Value) * (Future Value Interest Factor ) where n = no of time periods and i = interest rate.
20.5*-13.8 = 0
Student Activity
ABC Transformers has produced 780 transformers in 199S and is decreasing the annual production by 40 transformers per year because of the competition by Class Transformers. Class Transformers has produced 100 transformers in 1998 and is increasing the annual production by 30 transformers. In which year Class Transformers will become the larger producer? A plant measures 3 inches at present. What will be its height after 10 years if the height increases at a steady rate of 1.5 inches a year?
Arithmetic Progressions
A series of quantities form an arithmetic progression if each subsequent term is obtained by adding to the previous term a constant amount, whicji is called the common difference. An arithmetic progression always has the form: a, a+ d, a+ 2d,
, a+ (n-1) d
here a is the first term, d is the common difference and n is the number of terms. The first application would be a regular increase in salary, say by Rs X per year. Another application of arithmetic progressions is in the depreciation of machinery and other fixed assets. In the financial accounts of manufacturing firms, it is necessary to make a deduction from gross profits to allow for the decrease in value of the machinery due to wear and tear and perhaps also
Notes
Student Activity ABC Transformers has produced 780 transformers in 1998 and is decreasing the annual production by 40 transformers per year because of the competition by Class Transformers. Class Transformers has produced 100 transformers in 1998 and is increasing the annual production by 30 transformers. In which year Class Transformers will become the larger producer? A plant measures 3 inches at present. What will be its height after 10 years if the height increases at a steady rate of 1.5 inches a year?
obsolescence. This reduces the claim of the owners on the profit, so that cash is made available for the eventual replacement of the machinery. The increase in cash is matched in the balance sheet by a decrease in the 'book value' of the machinery. If the depreciation is assessed as a fixed amount each year, then the model for the changing value of the machinery is an arithmetic progression. The common difference as in this case is always negative. Illustration 3.1 A machine is bought for Rs 90,000 and the depreciation on it is assessed at Rs 7,200 per year. What will be its book value at the end of eight years? Solution This method of depreciation is called the straight line method and the depreciation value is assessed by dividing the machine cost less the cost of the scrap by the number of useful years it is in operation. It is important to note that in this type of problem the number of terms in the arithmetic progression is 1 more than the number of years. This is because the first term is the book value at the beginning of the first year and the final term is the book value at the end of the final year. Putting a = 90000, d = -7200, and n = 9, the eight term works out as: 90000 + 8 x (-7200) = 32400 Sum of an Arithmetic Progression S may represent the sum of the n terms of an arithmetic progression, where: S s a+(a+ d)+(a+ 2d)+ +[a+ ( n - 2)d]+[a+ (n - l)d]
The easiest way to find the formula for S is to write the series again in reverse order: S = [a+ (n-l)d]+[a+ ( n-2)d]+
Student Activity Find the annual salary increment of a man who retires after his 3 7 year, having tamed Rs 61,700 in his fifth year and an average of Rs 1,00,900 whole career. over his
m
-.+(a+
2d)+(a+ d)+ a
Then the two equations are then added together to form a third equation. In this equation the left-hand side becomes 2S; on the right-hand side the sum of the first terms of the equations is I2a+ (n - 1) d] and the sum of the second terms and of each succeeding pair of terms is exactly the same. Since there are n pairs in all: 2S = n\2a+ (n - l)d]
\
>
What is the sum of first 100 even numbers?
hence, S = n[a+ (n -l)d] Illustration 3.2 The salary of a company secretary is increased by a fixed increment each year. If his total earnings over nine years are Rs 23,40,000 and his salary in the final year is Rs 2,95,000, what was his salary in the sixth year?
42
Solution In this example, n~9,S = 23,40,000, [a + (n - 1) d] = 2,95,000 and it is required to find the sixth term of the series which is (a + 5d). Using the formula to express 5 in terms of the unknowns, a and d: 23,40,000 = 9[a+4d] a+4d a+8d d a+5d = 2,60,000 = 2,96,000 = 87.5 = 2,68,750 '
r
Notes
And so the salary in the sixth year was Rs 2,68,750 per annum.
Geometric Progressions.
A series of quantities form a geometric progression if each terms is obtained by multiplying the previous term by a constant, which is called the common ratio. A geometric progression always has the form: a, ar, ar , ar ,
2 3
, ar"'
where a is the first term, r is the common ratio and n is the number of terms. Note that in the last term r has the power (n-1) and not (n). This is because in the first term r has the power (0) and therefore the total terms are n. Illustration 3.3 A small water pump costs Rs 6,200 and is expected to last for 14 years and then have a scrap value of Rs 740. If depreciation is to be calculated as a fixed percentage of the current book value at the end of each year, what should the percentage be? Solution This illustration deals with the same situation as the illustration 3.1, but the depreciation is now to be calculated as a fixed proportion of the current book value instead of a fixed amount each year. This method of depreciation is called the Written Down Value (WDV) method. The model for the changing value of the water pump is in this case a geometric progression; the book value is reduced each year to r tinges its previous figure, where r is less than 1. The number of terms is 1 more than the number of years, and so n = 15, a = 6200, andar"' = 740.
1
Student Activity Rs 7,000 are invested at 5% per annum compound interest. What will be the amount after 20 years? The number of rabbits becomes three times in 3 days. In how many days it will become 200 times the original number?
6200r
r
14
= 740
14
740 6200
,14
= 0.11935
Now this equation can be easily solved using scientific calculator or logarithmic tables can be used. As the log tables would permit only four
decimal places, there would be a difference between the value calculated! through it and the value calculated through the calculator. Both the values are| OK for normal purposes. For solving this equation through log tables, it is necessary to divide the above] equation into a negative logarithm: Logr = Log 740-log 6200 14 -0.9231599 14 = 0.0656399
Notes
The r is 0.8592, which means that depreciation is reducing the value and tr depreciation percentage can be found by subtracting 1 (which stands for] 100%) from the value found. The value comes out to be -0.1408, whic converted back into percentage from results in 14.08%/ which may be rounde to 14.1% or 14% depending on the accounting practice and the requirements! Note that negative sign did not came in this case as depreciation mea reduction itself so the negative sign is understood. In other cases we have use it explicitly. Sum of Geometric Progression If S represents the sum of the n terms of a geometric progression, the easie way to find the formula for S is to write out the series and then multiply throughout by r: S Sr = a+ar+ar +ar + - ar+ar +ar +
2 3 2 3
+ar ' +ar" +ar ' +ar"

n 1
n 2
The first equation is then subtracted from the second and on the right-har side almost all the terms cancel out to leave: S(r-l) = ar -a "(r"-l) (r-1)
n
hence, S
Illustration 3.5 Student Activity

Sum of the first 3 terms of a GP is 8 times sum of the next 3 terms. Find the 7 term of the GP.
t h
A company sets aside Rs 5,00,000 each year out of its profits to form a reservd fund, which is invested at 4% per annum compound interest. What will be tha value of the fund after ten years? Solution It is assumed that the first Rs 5,00,000 earns interest only from the end of tl year in which it is set aside, which is for nine years. Successive annua instalments earn fewer years of interest, and the final sum for the tenth yea earns no interest at all. So this is the sum of a geometric progression with thfj terms written in reverse order so that the last term is written first. With a = Rs 5,00,000, r - 1.04 and n = 10, the terms being in the reverse of tl usual order. The formula for S may be applied to these values. It will be note
A man invests Rs 5,000 at the beginning of each year, and compound interest is added at 5W/o per annum. How much money would he have accumulated after 15 years?
44
that n is the number of years, since the model represents the number of instalments and not the beginnings and ends of years. 5,00,000(l.04" {104-1) -l) = 1,25,00,000(1.04
w
-1)
By logarithms, S is 59,87,500. The possible error due to the use of logarithms is about 20,000 which is not a small sum and so the answer is best quoted as about Rs 60,03,052. Annuities, Loans and Mortgages
The examples in above sections either involved a single investment, or a series of regular payments. We will now consider problems that involve both a single initial payment and a series of regular equal payments. A payment Rs a, which is to be made in n years in the future has a present value of Rs where r is the compound interest ratio. Since an annuity
consists of a payment Rs a at the end of each year for n years, the present value Rs P of the annuity is the sum of the present values of the individual payments:
This is a geometric progression with first term / , common ratio V

r
and n
terms, and the sum is:
r (r-l) The only difference between this formula and the one which gives the sum of GP is that there is another term r" in the denominator. You can derive this formula yourself by substituting the first term and common ratio given above in the formula which gives the sum of a GP. Illustration 3.9 Find the cost of an annuity of Rs 25,000 for 15 years, if compound interest is allowed at 3% per annum. Solution >
Using the above formula and using the values a = 25000, n = 15, and r = 1.03 and using four-figure logarithms gives P = 2,97,800. The cost is the present value, and so is approximately Rs 2,98,000. Finding the Annuities When we need to find the annuities and are given the present value P, the only necessity is to rearrange the formula of the present value: aPr"{r~l)
Illustration 3.10 A firm borrows Rs 10,00,000 and repays it by three equal sums at the ends of the following three years. What is the amount of each repayment, if compound interest at 4Vi% is allowed? Solution On a loan, equal repayments are nothing but a form of annuities, the amount of the loan being the present value of the annuity. So it is only necessary to directly apply the formula already obtained: Pr"(r-1)
W,000,000xl.045
x(l.045-l) -l)
= 3,63,800
(l.045
The company needs to pay Rs 3,63,800 every year to repay me loan and the interest in the three years. Some people find it surprising that this method really does give the correct annual repayment within the terms of the question. While the proof by algebra ought to be sufficient, the number of years is here small enough to permit the luxury of a numerical check: 10,00,000 45,000 10,45,000 3,63,800 6,81,200 30,700 7,11,900 3,63,800 3,48,100 15,700 3,63,800 3,63,800 0000000
Student Activity
1. What sum should be paid for an annuity of Rs 25,000 per annum, to be paid half-yearly commencing in six months' time, if compound interest is allowed at 1 4 % per annum and compounded half-yearly and the annuity is to last (i) 10 years (ii) 20 years (iii) 30 years 2. What is the amount payable in the above case in perpetual case?
Interest 4.5% Repay Interest 4.5% Repay
Amount outstanding after 1 year
Amount outstanding after 2 years
Interest 4.5% ^ No amount outstanding after 3 years
Repay
Methods of Investment Evaluation Any commercial or industrial investment can be regarded as the purchase of an annuity. The amount of the annuity is the income or the saving in costs directly attributable to the investment, such as the saving in running costs when replacing an old machine by a new one.
Illustration 3.7 A new machine is expected to last for eight years and to produce annual savings of Rs 23,000. What is its present value, allowing interest at 7% per annum? Solution The present value is obtained in the same way as found earlier in this unit:
Notes
'
r"(r-l)
= > P s
23.000(1.07--l) 107" (107-1)
The same result can be obtained using appendix 2, the figure 5.9713 then being multiplied by Rs 23,000. The assumptions that are made here are that the savings are effectively obtained at the end of each year and that the rate of interest quoted is the rate at which money is available to buy the machine. The answer indicates that the machine is worth buying if it costs less than Rs 1,37,340 and not worth buying if it costs more than that. A refinement on this method is to subtract the cost of a machine from its present value to give the net present value. If the machine, just considered, has a cost price of Rs 1,25,000, the net present value is Rs 1,37,340 - Rs 1,25,000 = Rs 12,340. Any investment is worth making only when it has a positive net present value. This net present value or the NPV, as it is called in short form, is used extensively in appraising investment proposals, about which you would read more in your finance textbooks. If the new machine replaces an old machine, the selling price of this machine (which will usually be its scrap value) must be added to the net present value of the investment. The book value of the old machine may be different from it selling price, but this is irrelevant. Some accountants argue that the 'loss on book value' of the machine to be scrapped must be included, as part of the cost of the proposed investment, but this is completely erroneous. This raises the question as to the best method of calculating depreciation. It is usually simpler to depreciate by a fixed amount each year, but more>realistic to depreciate by a fixed percentage so that the decreasing amounts written off in successive years helps to balance the increasing costs of repairs and maintenance. Maximum simplicity is achieved by writing off a fixed percentage of the total value of all machines, irrespective of their expected life. However, no efficient company will be satisfied with either fixed amount or fixed percentage depreciation, since they do not take account of unexpected deterioration or obsolescence. The only sound policy is to have each machine revalued annually by experts who are fully aware of new inventions and technology change. This would be an expensive and irrelevant procedure if it were intended solely for reporting purposes, but it also ought to form part of
47
Quantitative
Techniques
Notes
the routine for technological appraisal and investment planning. Many companies have gone bankrupt because they found themselves with obsolete! machinery which was overvalued in the books and were then persuaded by| incompetent accountants that the amount of the overvaluation would be lostj if they invested in new machinery but not otherwise! Since tax authorities have their own rules for assessing depreciation, aj separate computation will be necessary in any case for tax purposes.
latter 38,0C< maxim unlirK evenj
The of their Another concept, which is sometimes useful in investment evaluation, is tha^ and i i of years of return, also known as the payback period. This is the number of assumet years the machine must last in order to be just worth buying; that is, the value! outsiv of n which makes the present value equal to the cost C. Writing C for the Finar present value and rearranging the formula:
Per!
Loga-log[a-C{r-l)] n Illustration 3.8 logr The find
i
For exan annun
A new lathe machine costs Rs 58,000 and will replace existing lathe whos Thept scrap value is Rs 7,000. The saving in running costs using the new machir will be Rs 6,700 per year. Allowing interest at 6 A% per annum, what is thel payback period?
l
Solution Putting C as the net cost Rs 51,000, a = Rs 6,700 and r = 2.065 gives n = 10.8. common way of expressing this result is to say that the investment 'will pay for itself in 10.8 years'.
Since
To sc that r" anc are s^ also use This method of appraisal is useful when it is difficult to get an accurate! that ^ estimate of the expected life of the machine, but easy to get opinions froml technical staff as to whether or not the machine will last for a named period,] First Technical staff would not understand the term 'net present value', but if they above indicate that the actual life will exceed the years of return this means that the| hence, net present value is positive and so the machine is worth buying. The main defect in this method of appraising an investment appears whenl two or more alternative investments are under consideration. If the machine! in above illustration is expected to last for 12 years it is worth buying; bui should it be preferred to another new machine, also costing Rs 58,000, whiclf is expected to last for eight years and yield annual savings of Rs 9,000? The] years of return on the later machine work out at 7.3, so the former machir seems better irrespective of whether one subtracts the years of return from thej expected life or divides one into the other. Since the net present value is 3,660 for the former machine and Rs 3,800 for the latter machine, the latter iij in fact the better investment. Now let expre: indefi
The I infinity expre tends t One can depend on the net present value in this case only because the costs 01 isgre the two machines are equal. It is not immediately obvious whether buying tl the exa
ManyM latter machine would be preferable to a third alternative, which costs Rs i>leteH 38,000 and has a net present value of Rs 3,500. The simple policy of Jjed byS maximizing the total net present values of all investments is valid only when lost unlimited capital is available at the stipulated rate of interest. This is rarely even an approximation to the true situation.
N o t e s
.ion, aj The soundest method of investment evaluation is using the net present values of the investments under consideration. We can say more about this method t h a t and about minimizing the effects of error in estimating future cash flows and iber of assumed rate of interest, but such detailed study of investment evaluation is outside the scope of the present text and can be read in the 'Accounting and \ali Finance for Managers' text.
Perpetual A n n u i t i e s a n d Infinite S e r i e s The formula for the present value of a fixed-term annuity can be applied to find the present value of a perpetual annuity. For example, let us say that we want to find the present value of a perpetual annuity of Rs 560, if compound interest is allowed at 4% per annum.
vhc
\ The present value P of an annuity is given by the formula:
kis t h e
r"(r-l) Since r is greater than 1, r" increases indefinitely as n increases. To solve this limit we can either go through the 'limits' route or simply say 10.8.. that as r" becomes very big there would be practically no difference between 3 1 pay r " and f l . W e understand that mathematicians would scoff a t this, but w e are saying this because this makes it easier for you to understand. We would also use the mathematicians' method of going through the 'limits' route so mratf that you get a taste of both. s froi ?riod. First using the simple method, we can say that r" and r" -1 cancel out in the if theyj above formula and the formula is simply reduced to: flat thJ a hence, P = (r-1) when [Now let us use the 'limits' method. Whenever a limit has to be found, the achii [expression must first be rearranged so that the variable, which is to increase bi [indefinitely, appears in as few places as possible:
p
_ limit
(r" -l)_
r r
limit
/7-oc
KhH r J (r-1) om is [The term 'limit' with the symbols underneath mean 'the limit as n tends to atter infinity of the expression which follows. In finding the limit of the rearranged [expression, it is only necessary to observe that as n tends to infinity, r" also tends to infinity as so its reciprocal tends to zero. Since this is Irue only when r bstsi j greater than 1 or less than - 1 , this restriction must be placed on the result; the expression has no limit when r is between -1 and 1.
l s
rt-ao "( _2)
(r-1)
When compound interest is allowed at 4% per annum, r is equal to 1.04 and so the present value of a perpetual annuity of 1 unit is 25 units. The answer to the question is Rs / , which is Rs 14,000.
5 6 0 0 0 4
Notes Student Activity

A company is considering the purchase of two different new machines Machine A costs Rs 5,00,000 and is expected to yield profits of Rs 1,10,000 per year for six years and then can be sold for Rs 1,50.000. Machine B costs Rs 9,00,000 and is expected to yield profits of Rs 1,50,000 per year for seven years and then can be disposed off for Rs 3,00,000. Find the total cash flows from the two investments, assuming a cost of capital (expected return) of 10 per cent and what will the net present values of the two machines then be?
Depreciation
The term depreciation is a specialized subset of amortization. Amortization simply means the spreading out of a cost over a period of time. It is a generic term used for any type of item that is being prorated over time. Depreciation technically should be used with respect to the using up of assets that wear down with time and usage such as plant and equipment. The term depreciation comes from the word deprecate-indicating a breaking down or physical depreciation and wearing out. Some items don't wear out or break down per se and we don't refer to them as depreciating over time. For example, natural resources such as oil, gas, and coal are said to deplete. Deplete means to empty out and this is essentially what happens to a coal mine or oil well. Finally, some items neither deplete nor depreciate. For example, a patent loses its value over time. It doesn't break down, wear out, or empty out-it simply expires with the passage of time. When neither of the terms depreciation or depletion is applicable, we refer to the item as amortizing. Therefore, for a patent, instead of depreciation expense or depletion expense, the annual reduction in value is referred to as amortization expense. From an accounting point of view there is no substantial difference if we refer to an item as being depreciated, depleted, or amortized. In each case, the key question is, "by how much?" After all, it is the amount that we record as an expense that impacts on the firm's income, not the name bv which we call it. We will discuss only depreciation, even though the principles generally apply in a similar fashion for assets to be depleted or amortized. We will initially focus primarily on depreciation for financial statement purposes. Later we will discuss important tax laws concerning depreciation. Methods for Charging Depreciation There are three major factors in computing depreciation. 1. Cost N We have seen earlier that the cost of an asset includes all necessary and reasonable expenses to acquire it and to prepare it for its intended use. Only the fixed assets are depreciated. The current assets may also lose in value but they are not depreciated. The methods of valuing current assets will be discussed later. Asset valuation for depreciation basically follows the rules of historical cost. We stated that the historical or acquisition cost of an asset is simply what we paid for the item when we acquired it. However, for depreciation purposes determination of asset cost is somewhat more complex.
2. Salvage Value What if we bought a machine with a ten-year expected life at a cost of Rs 20,000, including all of the costs to put it into service. Suppose further that after ten years we expect to be able to sell the machine for Rs 2,000. Then we really have not used up Rs 20,000 of resources over the ten years. We have only used up Rs 18,000 and we still have an Rs 2,000 asset left over. This Rs 2,000 value is referred to as the machine's salvage value. Salvage value, also called residual value or scrap value, is an estimate of the assets value at the end of its benefit period. This is often viewed as the amount we expect to receive when we sell the asset. Therefore, from an accounting perspective, we depreciate a machine by an amount equal to its cost less its anticipated salvage value. That difference is referred to as its depreciable base. The salvage value will have to be estimated-at best, it will be an educated guess. Your accountant reviews the reasonableness of your salvage value estimates for financial statement preparation. 3. Useful Life The useful life of a plant asset is the length of time. It is used productively in the company's operations. The useful life of the asset for depreciation purposes may not have anything to do with the total useful life of that asset but represents the period in which the asset is going to be depreciated in. Now, there could be significant differences between the value of the useful life as thought about by the company and as accepted by the Income Tax Act or the Companies Act. If the company and is making good amount of profits, it would like to charge more depreciation than what it would have charged if the profits were low or there were losses. Because of this reason, there are rates prescribed by the Companies Act which specify the maximum amount of depreciation that can be charged by a company for different types of assets. This schedule is attached to this chapter as Annexure II. These rates depends upon the useful life of the asset as expected but may not correspond to the actual useful life of the asset. What happens if we are still using the asset after its estimated useful life is over? We stop taking further depreciation. The role of depreciation is to allocate some of the cost of the asset into a number of periods. Once we have allocated all of the cost (less the salvage value), we simply continue to use the asset with no further depreciation. That means we will have revenues without depreciation expense matched against them. That is simply a result of a matching based on estimates instead of perfect foreknowledge. What if we sell the asset for more than its salvage value? That presents no problem-we can record a gain for the difference between the selling price and the asset's book value. The book value of an asset is the amount paid for it, less the amount of depreciation already taken. Thus, if we bought our machine for Rs 20,000, and sold it after ten years during which we had taken Rs 18,000 of depreciation, the book value would be Rs 2,000. If we sold it for Rs 5,000, there would be a gain of Rs 3,000. What if the asset becomes obsolete after three years due to technological change and it is sold at that time for Rs 500? Assuming we were depreciating
lantitative
Techniques
Notes
it at a rate of Rs 1,800 a year (to arrive at Rs 18,000 of depreciation over ten years), then we would have taken Rs 5,400 of depreciation (3 years at Rs 1,800 per year) during those first three years. The book value (Rs 20,000 cost less Rs 5,400 of accumulated depreciation) is Rs 14,600, and at a sale price of Rs 500, we would record a loss of Rs 14,100. Depreciation Methods Internationally three depreciation methods are used. These three methods are: 1. 2. 3. Straight line method Written down value method and Units of Production method
There are many variants of these methods but they are outside the scope of this book. Let us now look at these methods in a little more detail. 1. Straight line method To illustrate, let us take up a printer that the company has bought for Rs 10,000. This machine is expected to be utilized for the next five years and the salvage value at the end of five years is expected to be Rs 1,000 only. As this printer is expected to print 36,000 documents over its useful life. This becomes its total units. Now, if you want to calculate the straight line depreciation for this asset, you can see it in the formula below: Annual Depreciation = In this case Annual Depreciation = [0,000-1,000 Rs 1,800 per year Total Cost - Salvage Value No. of years
Lets us also assume that this asset was bought on 1st April '99 and used through out its predicted useful life of 5 years. This would mean that the straight-line method would allocate an equal amount of depreciation to each of the financial years 1999 - 2000 to 2003 - 2004. As the value of the asset would go down by Rs 1,800 each year, the book value (the net value of the asset) would also go down by Rs 1,800. This Rs 1,800 would be charged as depreciation in the Income Statement and the accumulated depreciation account would go up by Rs 1,800 every year. The values^nd the book value is shown in the Table below:
Period
Depreciable Cost'
Depreciation for the Period Depreciation Rate Depreciation Expense 1,800 1,800 1,800 1,800 1,800
End of Period Accumulated Depreciation 1,800 3,600 5,400 7,200 9,000 Book value 8,200 6,400 4,600 2,600 1,000
1999 2000 2001 2002 2003
9,000 9,000 9.00C 9,000 9,000
20% 20 20 20 20
There are four main points to be noted here. 1) 2) Depreciation expense is the same in each period Number of documents printed have no role to play in the calculation of depreciation. This means that whether the printer is used or not used, the depreciation can still be charged The accumulated depreciation figure is the sum of the depreciation figures of the current and previous periods, and The book value declines in each period till the time it equals salvage value at the end of it's useful life.
Notes
3)
4)
2. Written down value method (Also known as Reducing Balance Method) It is the most widely used depreciation method both in India and abroad. Internationally, it is also known as the declining balanced method or fixed percentage of declining balance depreciation. It is also more often used in Income Tax Statement then when reporting earnings to the shareholders. But a lot of companies use the written down value method for the reporting purposes also. This method uses depreciation rate which is upto twice the straight line depreciation rate and applies it to the assets at the book value beginning of the period. Because book value for the next period would be lower than the value this year and the depreciation percentage is fixed, it follows that the depreciation next year would be lower than the depreciation this year for the same asset has been discussed above. If you use a written down value method with a depreciation rate of 40%, the schedule that will come out is shown below:
Period Depreciable Cost* Depreciation for the Period Depreciation Rate 1999 2000 2001 2002 2003 10,000 6,000 3.600 2,160 1,296 40% 40 40 40 40 Depreciation Expense 4,000 2,400 1,440 864 296 End of Period Accumulated Depreciation 4,000 6,400 Book value 6,000 3,600 2,160 >
7,840 8,704 9,000
4,296
1,000
Now there are two points that you should be careful at while calculating the depreciation using this method. As you can see from the table, the starting book value of Rs 10,000 was depreciated at 40% resulting in depreciation expense of Rs 4,000 .and a book value at the end of the first year of Rs 6,000. Now, the next year's depreciation is going to be charged at the same rate of 40% but on Rs 6,000 and not on Rs 10,000. This means that depreciation expense for the second year is only Rs 2,400. Similarly the book value keeps on going down and in same manner depreciation expense also keeps on going down.
3. Units of Production Method The main purpose of recording depreciation is to provide relevant information about the cost of consuming and asset's usefulness. Now, there could be certain situations where the machinery would be used for a very small duration in a particular year and used heavily in the next period. A builder for example may use an excavator for extracting earth when he is building up a building but may not use that excavator for many more months after that because it is not required. In these kinds of situations the units of production depreciation method provides a better matching of expenses with revenue than the other two methods. In this method we charge a varying amount to expense for each period of an asset's useful life depending on its usage. There is a two step process for calculating the units of production depreciation. The first step is to calculate the per unit depreciation cost and the second step is to measure the number of units consumed in a particular period and charge depreciation expense accordingly for that period. If you remember in the Printer illustration that we have been using in the above two methods, we said that the expected output of the printer is 36,000 pages over a period of five years, a information we have not used in the above two methods. So, the depreciation cost by this method would work out to be Rs 0.25 per page. If in a particular year, we have only printed 3,000 pages then the depreciation cost would work out to be Rs 750 for that particular period. Now, the point to note here is that the depreciation amount may vary from year to year based on the actual usage of the machine in that particular year. Step 1: Depreciation per unit = Step 2: Depreciation expense = Depreciation per unit x Units used in period Depreciation expense - 0.25 per page x 7,000 pages = Rs 1,750
Period Depreciable Cost* Depreciation lor the Period Depreciation Rate 1999 2000 2001 2002 2003 7,000 8,000 9,000 7,000 5,000 0.25 0.25 0.25 0.25 0.25 Depreciation Expense ^ 1,750 2,000 2,250 1,750 1,250 End of Period Accumulated Depreciation 1,750 3,750 6,000 7,750 9,000 Book value 8,250 6,250 4,000 2,250 1,000
Notes
Cost - Sal vage value 10,000 -1,000 Total Units of Production 36,000 units
= Rs 0.25 per page
The percentage of depreciation permitted for different types of assets have been given in the Annexure 3.3 attached to this chapter. Note the higher values of WDV, which gives higher depreciation in the initial years. Therefore, most of the companies adopt it. Cosco can be no different. If you look at the notes on account in schedule T which is a part of the balance sheet
says that "Depreciation has been provided on pro-rata basis by Written Down Value Method at the rates and manner prescribed in Schedule XIV of the Companies Act, 1956."
Summary
Logarithms are mathematical tools that convert multiplication, division, exponentiation and rooL_operations into addition, subtraction, multiplication and division operations respectively. A series of quantities form an arithmetic progression ii each subsequent term is obtained by adding to the previous term a constant amount, which is called the common difference. A series of quantities form a geometric progression if each terms is obtained by multiplying the previous term by a constant, which is called the common ratio. The term depreciation is a specialized subset of amortization. Amortization simply means the spreading out of a cost over a period of time.
Notes
Student
Activity
1. A stamping machine bought for Rs 50,000 is expected to be utilized for the next 10 years. The salvage value at the end of 6 years is expected to be Rs 20,000. Calculate the straight line depreciation for this asset. 2. Verify the calculation in the above using reducing balance method
Keywords
Arithmetic Progressions: A series of quantities form an arithmetic progression if each subsequent term is obtained by adding to the previous term a constant amount. Geometric Progressions: A series of quantities form a geometric progression if each terms is obtained by multiplying the previous term by a constant. Depreciation: Amortization simply means the spreading out of a cost over a period of time. Salvage Value: Salvage value, also called residual value or scrap value, is an estimate of the assets value at the end of its benefit period.
Review
1.
Questions
Mr Anand has invested Rs 5,000 in Unit Trust of India at a rate of interest of 12% per annum. How much amount he will get after 6 years? Mr Ramesh Chandra has joined the recurring deposit scheme of the post office for 10 years. In the beginning of each year, he invests Rs 1,000 in the scheme, if the compound rate of interest is 10%, what is the terminal value of the deposit. Find the 24th term of an A.P. whose first term is 24 and common difference is 4. Matsushita Electronics Ltd is a reputed manufacturer of TV sets sold as brand name of 'Panasonic'. It has produced 1,50,000 TV sets during
2.
3. 4.
fj
r s t
v e a r m
India five years back. If the total production of the
company today, at the end of the 5th year of operations in India, is 8,00,000 TV sets, then:
N o t e s
a.
Estimate by how many units, production has increased each year if the increase in terms of number of units is the same every year. Estimate by how many units, production has increased each year if the increase in terms of percentage of last year's production is the same every year. Based on the annual increment in production in both fixed units and percentage terms, forecast the amount of production for the 10th year. If the total market size is 30,00,000 TV units today and is growing by 5 per cent per annum, in how many years would Matsushita garner 50 per cent of the total market if it keeps growing at the same percentage rate calculated earlier? (Part D is not easy, you will have to apply your mind and work hard for it!
b.
c.
d.
5.
How much should be paid for the freehold of a property worth Rs 84,000 per year in rents and expected increase in value, if interest is allowed at (i) 5% per annum, (ii) 10% per annum?
F u r t h e r Readings
R.S.Bhardwaj, Business Mathematics, Excel Books D.R., D.J. Sweeney, and T.A. Williams Anderson, Quantitative Methods for Business, 5th edition, West Publishing Company E R Tufte, The Visual Display of Quantitative Information, Graphics Press
Equations
Unit 4 Equations
Notes
Unit Structure
Introduction Equations Applications of Linear Equations in Business Supply and Demand Functions Irregular, Unequal and Discontinuous Functions Quadratic Equations Fitting a Quadratic Cost Curve Summary Keywords Review Questions Further Readings
Learning Objectives
After reading this unit y o u s h o u l d b e a b l e to:
Define equations Apply linear equations to solve business problems Define and coin supply and demand functions Define irregular, unequal and discontinuous functions Define, form and solve quadratic equations Fit a quadratic cost curve into given data
Introduction
One of the most basic concepts in mathematics is the notion of association. We find associations between two or more sets all over us. Formally, a relation defines an association between any two sets in mathematics. A particular type of relation is called a function, which forms the basis of a large number of mathematical operations. While the funcitons tell us that a relationship exists, equations give us the exact relationship between the variables. These equations take many forms and have one or more variables. For example, we can say that Sales Revenue y = Number of items sold NX price per item P. This is an equation with three variables Y, N, P and defines an exact relationship amongst them. Her^? sales revenue Y is an dependent variable and other two are independent variables. Here we will discuss only linear and quadratic forms of equations. Linear Equations Linear Equation may be defined as an equation where the power of the variable(s) is one, and no cross or product terms are present. The general expressions of these linear equations look like the following: AX + B = 0 Here X = independent variable, A and B are numeric coefficients This definition is a working definition.
57
Quantitative
Techniques
Notes
Note that it is an accepted convention in mathematics that letters from the beginning of the alphabet are used to typify known quantities and letters from the end of the alphabet are used to represent unknown quantities. In a linear equation, A and B are real numbers which can be either positive or negative and may involve fractions or decimals; it is also possible that B can be zero but A cannot be zero for then it is not an equation and B also has to be zero. Illustration 4.1 A manufacturer of printed fabrics has three machines, that prepare raw fabric and five machines that print on it. Two types of printed fabrics are produced; type A requires 3 minutes per metre to prepare and 6 minutes per metre to print, while type B requires 11 and 17 minutes per metre respectively. How much of each type of fabric should be produced per hour in order to keep all the machines fully occupied? Solution The quantities to be produced per hour can be represented by X metres of type A and Y metres of type B. Then the situation above can be summarized in two simultaneous linear equations, one equation for each machine 3X + 11Y = 180 6X + 17Y = 300 (1) (2)
The right-hand sides of these equations are obtained from the fact that there are 180 machine-minutes available per hour for preparing fabric (60 minutes x 3 machines) and 300 machine-minutes for printing (60 minutes x 5 machines). There are two ways of solving any pair of simultaneous linear equations. The first method is by elimination and the second method is of substitution. Elimination Method: It will be observed that 6X is exactly twice 3X and so the first equation can be doubled to give: 6X + 22Y = 360 This is then subtracted from equation (2) to eliminate the terms involving X : -5Y= -60 Y = 12 Substituting this value of Y in equation (1): 3 X + 132 = 180 X = 16 Substitution Method: The second method of solution is by substitution. Equation (1) is rearranged so that one of the unknowns is expressed in terms of the other: 3X = 180-11 Y
x =
180-UY 3
Notes
Equations
This formula for X is then substituted in equation (2):

6(
m-m
1 7 Y
360-22Y + 17Y = 300 360-5 Y = 300 Y = 12 The value of X is then found using equation (1) and substituting them in equation (2) can check both values. The stepwise general procedure for solving these linear equations is given below for your reference:
Step-wise Procedure for solving 2x2 simultaneous equations
1.
Eliminate one of the variables using any or both of the properties specified below(i) Any linear equation can be multiplied or divided both sides by any number without altering its truth or meaning. Any two linear equations can be added or subtracted (one from the other) to give a third, equally valid, equation.
(ii)
2.
Solve the resulting simple equation (to yield the value of the other variable). Substitute this value back into one of the original equations, say equation 1 (to yield the value of the first variable). Check the solutions (by substituting both values into original equation 2).
3.
4.
Of course, graphical method can also be used to solve the 2x2 simultaneous equations. The first step is to graph the two lines represented by the two given equations on the same graph and the second step is to identify the X and Y values at the intersection of the lines. These X and Y values are the required solution for the pair of simultaneous equations. Graphical Solution The situation above was summarized in two simultaneous linear equations: 3X+UY 6X+17Y = = 180 300 (1) (2)
Plotting the two equations simultaneously on the graph we find that they intersect at values Y-12 and X=16. This becomes the solution to the problem.
Notes
Figure 4 . 1 : Graph of 3X +11 Y= 180 and 6X + 17Y = 300
Similarly 3 x 3 simultaneous equations can also be solved using an extension of a same technique that we have used above. There are other methods to solve these simultaneous equations which we would discuss in subsequent chapters on matrices and determinants and linear programming. The basic method for solving these 3 x 3 equations mathematically is given below:
Procedure for solving 3 x 3 simultaneous equations
1.
Using any two of the given equations, eliminate one of the variables (using the equation-manipulating techniques previously described) to obtain an equation in two variables. Using another pair of equations, eliminate the same variable as in (1), which will give a second equation in two variables. Solve this 2 x 2 system of equations in the normal way. Substitute into one of the three original equations to find the value of the third variable. Check the solutions by substituting the values of these variables in the three equations.
2.
3. 4.
5.
Illustration 4.2 A furniture manufacturer sends Company A a bill for Rsl0,700 to cover 3 tables, 4 chairs and 3 stools. Company B is charged Rsl4,800 for 2 tables, 5 chairs and 7 stools. Company C is charged Rsl5,100 for 5 tables, 9 chairs and 2 stools. What are the respective prices for each of these items? Solution Representing the prices of one table, one chair and one stool by Rs x, Rs y and Rs 2 respectively, the problem gives rise to three simultaneous linear equations: 3x + 4y + 3z = 107
2.v
+ 5i/+ 7z =
MS
5.T + 9i/ +
2z = 757
These equations are still called 'linear' even though each could only be represented by a plane in a three-dimensional model and not by a straight line on a two-dimensional graph. The first step in their solution would be to multiply the first equation by 2 and the second equation by 3 in order to i eliminate x and then subtracting first equation from the second one:
1
Equations S t u d e n t Activity
Solve the following systems of linear equations by graphical method. (a) 3x + 2y = 8
6x+ Sy+ 6 = 214 6 x + lby + 21z = 444 7y + 25z = 230 The second and third equations are then multiplied by 5 and 2 respectively in .order to obtain a second equation in which x has been eliminated. The two equations involving only y and z are then solved as in illustration 4.1, to give y = 5, 2 = 13. Substituting these values in the first of the original equations gives x = 16. Substituting them in the other two original equations can check all three values. Such equations can be solved much more easily using Matrix concept, which is discussed later.
2x - 3y = 3
(b) x - 4 y = 10
4x - y = 2
Applications of Linear Equations in Business

Let us start with an illustration Illustration 4.3 The total production costs of a packaging machinery manufacturer is found to a average of Rs 60,000 per day. The cost accountant finds that the fixed costs are Rs 32,000 per day and the direct costs average Rs 7000 per machine. Calculate the average number of machines produced per day? Solution This employs the accountants' terms 'fixed costs' and 'direct costs' and uses the accountants model:
Total costs = fixed costs + (direct costs x quantity produced) i.e. T = F + Dx

Let x represent the no. of vehicles sold, the above model would look like:
Rs
60,000 = Rs 32, 000 + (Rs 7,000 x x)

60 = 32 + 7x
Dividing the equation by Rs 1,000 it reduces to
In the above example, the model is very useful though approximate, since the direct costs per vehicle will probably vary quite widely. These types of models are used a great deal and are regarded as absolute truth by top management. But these models have their limitations as we see. Now if all the vehicles are sold at the same price, then the revenue is a linear function of the quantity produced. Putting R for revenue and p for price the function becomes R = px
61
Notes
In the case of machine manufacturer, let us assume that the selling price is Rs 18,000. By putting p = 18 (again assuming Rs 1000 as the unit) a graph (Figure 4.2) can be drawn of the revenue function. It is much more informative to, however, to draw the line representing the cost function and revenue function on the same graph, as shown in the graph.
Figure 4 .2: Machine Manufacturers' Cost and Revenue Functions
Extending the lines beyond their range in which their practical usefulness is proved is called extrapolation; it is a bad practice to extrapolate too far. The unreliable parts of the lines on the graph are shown by broken lines and the meaningless parts by dotted lines. The difference between revenue and total production costs can be described as gross profit G: G R-T px - (F+Dx) = 18x-(32 + 7x) = Ux-32
The break even point is when profit = 0, that is your revenue is equal to your costs. Putting this in the above equation we get: Ux-32 = 0 Ux = 32 32 x = =2.91(approx.) So your average production should be 2.91 for you to cover all your costs but make no profits. This breakeven level can also be found from the graph where your revenue and costs curves cross each other. Alternatively, you can plot the equation ( l l x - 3 2 = 0 ) and find the value of x where the line meets the x-axis as at that point the value of the function would be zero. There are two possible ways in which you could have obtained the information related to fixed and direct costs as the cost accountant found.
Either you would take all the accounting records and classify each cost into the two headings, a tedious and time consuming process which is prone to error because of limited accounting knowledge and problems of classification. A quicker and better method would be to record the actual total costs at two different levels of production and then find linear cost function which 'fits' these actual costs. For instance, the records might show that the average total cost per day was Rs 4.9,500 when production averaged 2.5 machines per day and it rose to Rs 63,500 when production rose to an average of 4.5 machines per day. All that is necessary is to insert these two values of T and the corresponding values of x into the linear cost function, defined above. T = F + Dx This gives two equations, involving two unknowns F and D 49.5 = F + 2.5D 63.5 = F + 4.5D Solving these equations using the techniques already described w? get D = 7 and F = 32. Substituting these values in the above function we get T = 32 + 7x, i.e., the same equation the cost accountant had with all his information.
Notes
Equations
Supply and Demand Functions

Supply and Demand Functions are an important field of study for the economists. The amount of a particular product which affirm is willing to supply at a specified price will depend on the firms' cost function and also on its marketing policy. The firm may be concerned with maximizing profit, increase market share, or just to keep the factory going in times of economic slow down. Once the firm decides what is its policy, the amount of products, which can be supplied to the market, is clearly a function of the price at which these products can be sold in the market. This forms the supply function of the firm. If the quantities of products that can be supplied by all the firms in this industry are totalled up for each price level of the product, this gives us the total supplv function for the market as a whole. As an illustration, let us assume that total supply of a particular type of phones in the market is 29,000 pieces per month when the price is Rs500 per piece. The same manufacturers are prepared to supply a total of 52,000 pieces per month if the price is raised to Rs600 per piece. A further rise in the price per piece would justify working overtime in the factory and also bring in foreign suppliers who were earlier not interested in selling at low prices in the market. It is found that a total of 75,000 pieces per month can be supplied when the price is Rs 700 per piece. These three individual points can be plotted on a graph (Figure 4.3) letting Rs P represent the price per phone (in Rs hundred) and X the total quantity (in thousands of pieces per month) which would be supplied at that price.
63
Notes
Although, here P is the independent variable and X is the dependent variable, it is customary for economist to plot prices on the vertical axis and quantities on the horizontal axis and this practice would be followed here. In the simplified example being considered, three points are found to lie on a straight line and so it can be assumed that the supply function for the market is approximately linear. The function is than found to be: X = 23P - 86
6.5
Dmond Curvt
5.55 4.5 4 I 3.5
Supply C u r v e
3
II 28 30 49 l 50 fr m n t h 60 70 X - C u a n t i t y i n thcusanW i
Student Activity The total production cost of an item is Rs. 20000 per day on an average. If the fixed cost and direct cost are Rs. 10,000 and Rs. 2000 per day respectively, what is the average number of items produced per day? If the cost function is C(x) = 1000 - 3x and the profit function P(x) = 2.5x, what is the breakeven level of production?
Figure 4.3: Demand & Supply Curves
For example, by substituting Rs 500 as price (P = 5) we get X = 29 (i.e., 29,000 phones). This line represents the quantities which will be produced at different prices provided all the quantity produced can be sold. But to find out what can be sold in the market we need look at the demand function of the market. The demand function would indicate the total quantity that will be purchased at a particular price and therefore, represents the total individual demand functions of all the individual buyers. Normally, large quantities would be bought when the price is lower and as the price goes up the quantities purchased come down. In this particular case it was found that only 24,000 telephone pieces can be sold at Rs 700 per piece but that-the sales would increase to 35,000 and 46,000 pieces per month at the prices of Rs 600 and Rs 500 respectively. These three points can be plotted on the same graph as supply curve so as to get the demand curve. The demand function is then found to be: X = 102 IIP
It would be wrong to assume that we can extend these lines on either side for supply and demand functions. It would be absurd u> assume that the demand is 2,000 pieces when the price is Rs 900 and equally wrong to assume that demand is approximately 90,000 pieces when price is Rs 100. The reason for plotting supply and demand of the same graph is to find out the point of market equilibrium, which is the point of intersection of these two lines. It can also be find out using simple equation solving techniques mentioned earlier, by finding the value of P which makes the value of X same for both demand and supply functions. The prices and quantity at the point of market equilibrium are known as equilibrium price and the equilibrium quantity. Under the condition of free competition, the equilibrium quantity will be the quantity actually produced and the equilibrium price would be the price in the market.
S?fC/rnrrtjrtt, Mnt,
7/
Fitting Demand and Supply Curves is much more tedious than solving other business situations. It is much more difficult to access how supply will respond to change in price than to access how the total production cost within a firm will vary with the quantities produced. Demand is also complicated because of the presence of substitute products in the market. These difficulties explain why mathematical economists need a lot of training and experience and why sometimes forecasted situations vastly differ from the actual situations in the market.
Irregular, Unequal and Discontinuous Functions

Sometimes it is assumed that 'y is a function of x means that there is a single formula connecting y with x. While it is easy to discuss functions which are described by a single formula, the only correct interpretation is that y is a function of x if the value of x determines the value of y irrespective of the fact whether it is in steps, multiple formulas are required or there are constraints attached. For example, since the price of a commodity determines the quantity supplied and the quantity demanded, these quantities are functions of the price, even in cases where the relationship is so irregular that it can be described only by a list of prices with the corresponding quantities.
Quadratic Equations
We saw that it is usually possible to sell larger quantities of a commodity where the price is lower. For a monopolist, the demand curve for the market is the price curve to be used in calculating the revenue of the firm. Where there is no monopoly, the amount a manufacturer can sell is still a function for the price at which he offers his goods, although in this case the price curve will not be the same as the demand curve for the market as a whole. Let us consider an example. Illustration 4.4 (Extending illustration 4.3) The same machine manufacturer finds that he could sell an average of four machines per day at a price of Ks 18,000 per machine. Stepping up his production to an average of 4V6 machines per day, he finds that he has to reduce the price to Rs 17,500 per machine in order to sell all that he produces. Find the profit function. Solution Now this problem abandons the unrealistic assumption of traditional cost accounting that the price is a constant and the revenue function therefore linear. Putting the machines sold per day, x, as a linear function of the price (in units of Rs 1,000), P: x = op + b The method learned earlier makes it easv to find out a and b. Substituting their values we find that the equation reduces to:
The revenue R is the price multiplied by the number of machines sold: Reveue R = Price P x Quantity X R = px
Notes
In order to find the breakeven point, it is simplest to express p as a function of x; R then becomes a quadratic function of x: p = 22-x from the equation x = 22 - p above Substituting this value of p in R = px we get R = px = (22-x)x = 22x-x
2
Assuming that the linear cost function to be 32 + 7x, as found earlier, the gross profit G becomes a quadratic function of x: G = R-T = (22x - x ) - (32 + 7x) - -x* + 15x - 32 This is the profit function for this manufacturer. This quadratic function more closely approximates the real life situation. Now the question comes, how do we solve these quadratic functions/equations. There are three basic methods: 1. 2. 3. Factorization Using Graphs Using Formula >
2
value, cc imi twdr

nil Fir4
So\ The 0.2 sur

(
The; as sma| A little Putting thenj the
Solution off Quadratic Equations by Factorization If the quadratic equation can be expressed as a product of two linear expressions (known as factors) it can be solved using factorization. This is possible only if the solution to the equation is an ordinary number or a fraction. For example: 2x -17x + 21 = (x-7)(2x-3) = 0 The identity sign (=) is used as a reminder that the two sides are equal for all values of x, which can be confirmed by multiplying out the right-hand side. If the product of any two expressions is zero, then at least one of these expressions must be zero. So recognizing the factors immediately leads to the solution of the equation: 2x?-17x + 21 = 0 (x - 7) (2x -3) = 0 Either x - 7 = 0 x-7 or 2x - 3 = 0 or x
= 1
2
Hltu
Fine
Soluj
Two
231.
and
integral 1 Soli
One wm berer
The main difficulty in finding the solution of a quadratic equation by factorization lies in finding the factors. It is possible to do this by a routine procedure which will either find the factors systematically or prove that none exists.
is toe If the soluti
procedure for finding factors Let the factors be (px +q) and (rx + s), where p,q, r and s are positive or negative integers and the product of the two factors is ax + bx + c. Multiplying out the factors we get:
2
Equations
Notes
prx + psx + qrx + qs = ax + bx + c

2 2
The coefficients must be the same on both sides as this must be true for all values of x. This gives equations relating the unknown quantities to the coefficients in the expression to be factorised: pr = a; ps + qr - b; qs =c. It implies that the product of ps and qr is ac and so the first task is to find these two numbers whose sum and product are known. Illustration 4.5 Find the factors of W.Sx + 93x + 240 Solution The first stage is to take out the fractional factor, resulting in the expression 0.2 (54x + 465x + 700). It is then necessary to look for two numbers whose' sum is 465 and whose product is 54 x 700 = 37,800.
2 2
The simplest method is to start with any two numbers whose sum is 465, such as 80 and 385. If the product is too small, a suitable amount is added to the smaller one of the two and an equal amount is subtracted from the larger one. A little exercise will show that the numbers are 205 x 360 = 37,800. Putting ps = 205 and using the fact that pr = 54, the highest common factor 3 is then equated to p. Calculations would show that s = 35, r = 18 and q = 20 and the factors of the above equation are: 0.2 ( 3x + 20) (lSx + 35) Illustration 4.5a Find the factors of 22JV + 56x + 21. Solution Two numbers are to be found whose sum is 56 and the product is 22 x 22 = 232. Following the above procedure, it will be found that 4 x 52 is too small and 5 x 52 is too large. It can immediately be concluded that there are no integral factors. Solution of Quadratic Equations using Graphs One way of finding the solution of any equation which is in the form, or can be rearranged in the form fix) = 0 is to draw the graph of the function: Y = fix) if this graph cuts the X-axis at any point, the value of x at that point is the solution of the original equation, since it is the value of x at which y = 0. This
?
S t u d e n t Activity 1. Factorize
2
the
following
quadratic functions. (a) - x + 7 x - 1 2 (b) 6 x - 1 3 x + 6 2. Solve the following quadratic equations (a) 3 x - 1 8 x + 15 = 0 (b) x - 6x = 0
2 2 2
Punuib
7 prhyiirni / fnin.
Quantitative
Techniques
Notes
was the method discussed earlier also for linear equations and can be applied to quadratic equation ax + bx + c as well. Here a, b and c can be positive or negative and may involve fractions or decimals. It is also possible for b or c to be zero, but if was zero, the function would become a linear function.
1
Illustration 4.6 (illustration 4.4 extended) Plot the function G = -x + 15x- 32 on a graph. Solution To draw the graph of the function: G = -x- +
15.Y
2
- 32
Student Activity
1. Draw graphs of the following equations. Mark the minimum point. f(x) = 3 x - 2x + 5 2. Solve the equations by plotting its graphs. - 2 x +3x - 7 = 0
2 2
it is necessary to choose a range of values of x and calculate the corresponding values of G. In this illustration it is enough to consider values of x between 0 and 14: 0 -32 -6 12 6 22
S
10 IS
12 4
14 -18
2
24
It can be seen from the graph below that G is zero when x is about 2.6 or about 12.4. These two values are said to be the roots of the equation -X + 15x - 32 = 0.
to
Figure 4 . 4 : G r a p h of - x + 1 5 x - 3 2 = 0
This means that the manufacturer will make a profit between 2.6 to 12.4 machines and would make the maximum amount of profit when he makes 8 machines. Every quadratic function in which the co-efficient of x is negative gives a graph of shape shown above, which is termed a parabola. If the highest point is above the x-axis, the corresponding equatioi\has two roots, but if the highest point of the function is below the x-axis there is no solution for the corresponding quadratic equation. Solution of Quadratic Equation by Formula It is necessary to have a method of solving an equation in which the quadratic expression is difficult to factories such as illustrations 4.6 and 4.8. The method used is equally applicable to equations where factorization is simple. The derivation of the formula is of interest only to mathematicians the solution is given below for a quadratic equation ax + bx + c = 0.
2 2
-b x=
4ac 2a
Equations
This formula can be applied to any quadratic equation irrespective of the fact whether the coefficients are positive or negative. Illustration 4.6 (Cont....) Taking the equation given in the illustration 4.6, here a - -l,b = 15 and c = -32. The solution by formula is:
- 1 S J ( 1 5 ?
Notes
- 4(-V(-32) -15 9.85
2(-V -15yl(225 - 128
W
;
*2 -2 ; = 12.425 or 6.25. This is the exact solution whereas from the graph we got an approximate solution. The sum of the roots is always equal to / and their product is always equal to / . In this illustration the sum 15.05 and the product 32.62 can be checked with the original equation. This check makes it unnecessary for you to check the roots separately by substitution in the equation.
:
*''
Student Activity
1. Solve the following quadratic equation (a) - x + ttx + 3 / 5 = 0 (b) x- 2x + 1 where x > 0 2. Find the solution of the following simultaneous equations using graphical method. x -2x + 3 5x + 9
2 2
Fitting a Quadratic Cost Curve

A linear function is never an ideal model of production costs. It is usually possible to obtain a much better model by fitting a quadratic curve as we did in the illustration 4.6 above.
T
= ax + bx + c
2
The values of a, b and c will be positive. The term x implies that costs increase more steeply as the production goes up. If there was no such effect, it would never pay to enlarge a factory. It must not be assumed that a quadratic curve will be a perfect model of production costs. There is nothing magical about the x term. A quadratic function will always be as good as a linear function and nearly always be a better deal.
2
Illustration 4.7 (Extension of illustration 4.3) The machinery manufacturer, discussed earlier, found that the ^ptal production costs averaged Rs 60,000 per day when an average of 4 machines per day are produced. An accurate assessment of costs when the average production is 3% machines per day and again at 4Vj machines per day gives figures of Rs 56,600 and Rs 63,600 respectively. Fit a quadratic cost curve. Solution Just as fitting a linear curve to two known points was shown earlier to give two simultaneous linear equations, so fitting a quadratic curve to three known points give three simultaneous linear equations. From the given information, again using units of Rs 1000, the equations are: a(3V ) + b(3V ) + c = 56.6
2 2 2
69
Quantitative
Techniques
a(4) + b(4) + c = 60.0 a(4Vz) + b(4V2) + c = 63.6

2
Notes
This set of equations can be solved very easily by elimination, eliminating first c and then b to give a = 0.4, b = 3.8 and c = 38.4. The cost function is therefore: T s 0.4x + 3.8x + 38.4 When the quadratic revenue curve found in illustration 4.7 is applied, the gross profit function is found to be: G = -1.4x +18.2x - 38.4 The breakeven point is then approx. 2.65, compared with 2.58 when a linear! cost curve is assumed. More advanced techniques based on more complicated models are available in managerial problems, but the practical benefits of increased accuracy will! be negligible in most cases.
2 2
Summary
A relation defines an association between pny two sets. A particular type of relation is called a function. An expression containing known and unknown quantities related with sign of equality true for all the possible values of the unknowns, the relation is called an identity; whereas if the relation is true for only a few values of the unknowns, it is called an equation. Variables are the terms used for mathematical quantities that can assume any values within a given set. The set of values of the variable is known as the domain of the variable, which could be limited or can be unlimited. Variables can be independent or dependent. Independent variables are those variables whose values are not governed by the values of another variable. Variables whose values are dependent on the values, taken by another variable are called the dependent variables. The set of values of dependent variable is called range. Writing the relationship in function and equation form above makes it easy for us to understand how the values of the dependent variables change because of a change in any of the independent variables on which it is based. Linear Equation may be defined as an equation where the power of the variable(s) is one, and no cross or product terms are present.
Keywords
Variables: Variables are the terms used for mathematical quantities that can assume any values within a given set.
70
Domain of the variable: The set of values of the variable is known as the domain of the variable. Constant Function: Let A denote a fixed number. Consider that the function X has this value A for the value of X. Identity Function: The function that associates to each number X the same number X is called the identity function. Exponential Function: The function that associates the number ex to each real number x is called the exponential function. Equations: Equations give us the exact relationship between the variables Logarithmic function Modulus function Linear equation Quadratic equation
Review Questions
1 Gwalior Leather Ltd is a medium scale company, engaged in the manufacture of shoes of different qualities and sizes. It has a fixed cost of RslO,000,000. The average cost of manufacturing of a pair of shoes costs company Rs60 which the company sells at RslOO. Assuming that every shoe pair produced is sold off, find a formula for profit for the company. Find the minimum number of shoes the company should produce and sell to meet exactly the cost. M / s Kalyani Forge pays its workers Rs70 for an 8-hour shift. In addition each worker is paid RslO for every one hour of overtime. However, overtime cannot exceed 4 hours per day. (a) (b) 3 Cite the total wage paid to the worker as a function of overtime. Draw the graph of this function.
The monthly supply of sugar in Delhi is estimated to be 95,000 tons when the price is Rs 13,000 per ton and 1,10,000 tons when the price is Rsl6,000 per ton. The monthly demand is estimated to be 109,000 tons at Rs 13,000 per ton and 99,000 tons at Rsl6,000 per ton. Assuming that the supply and demand functions are both linear, find these functions and hence determine the equilibrium price and quantity. A manufacturer of steel strips finds that his total production cost is Rsl,20,66,000 per week when he is producing 1240 tons per week. The fixed costs are Rs67,34,000 per week, and the selling price is Rsl 1,700 per ton. Find (a) the weekly revenue, (b) the weekly gross profit, and (c) the weekly production and total production cost at the break-even porjit. A textile manufacturer finds that he can sell 1,38,000 metres of cotton pads per week in 400 metre rolls at Rs 190 per roll. He increases the price to Rs 200 per roll and finds that he can sell only 1,28,000 metres per week. Assuming that price curve is linear, find (a) the price, (b) the weekly revenue of the number of rolls sold per week, (c) find the prices and quantities for which the weekly revenue will be Rs 60,990 per week.
Readings
P. N. Mishra, Quantitative Techniques for Managers, Excel Books Notes E R Tufte, The Visual Display of Quantitative Information, Graphics Press D.R., D.J. Sweeney, and T.A. Williams Anderson, Quantitative Methods fo Business, 5th edition, West Publishing Company
72
Matrix Algebra
Unit 5 Matrix Algebra
Notes
Unit Structure

Introduction Vectors Multiplication of Vectors Matrices Use of Matrices for Production Planning Solving Linear Equations Determinants Cramer's Rule Applications in Management Summary Keywords Review Questions
Further Readings
4
Learning Objectives
After reading this unit you should be able to:

Define a vector Perform vector operations

Define a matrix Perform matrix operations U s e matrices for production p l a n n i n g Use matrices to solve systems of l i n e a r e q u a t i o n s Define and c o m p u t e determinant of a m a t r i x Use determinants in solving s y s t e m s of linear equations W o r k with higher order d e t e r m i n a n t s Apply determinants and m a t r i c e s in s o l v i n g practical problems
Introduction
Matrices form one of the most powerful tools of modern management and of modern mathematics. They have innumerable applications in the analysis of material and machine requirements and the solution of problems in planning and organization. An understanding of matrices is also essential^for most branches of advanced mathematics and statistics. Matrices can be better understood using another mathematical structure called vectors. For vectors lies at the base of matrices, let us start by understanding them first.
Vectors
The use of vectors can be illustrated by a very simple example. A small firm uses sheeting fabric to manufacture white sheets and pillowcases for hospitals and hotels* which are sold by the dozen. Orders received in the office are passed by telephone to the packing department, who is interested only in the quantity to be packed in each parcel.
Typical orders would be '4 dozen sheets and 2 dozen pillowcases', '18 dozen sheets and 6 dozen pillowcases', '12 dozen sheets', '6 dozen pillowcases' and so on. It would not be long before speaker and hearer agree to save a lot of time and breath by giving simply a pair of numbers for each order: [4 2} [IS 6] [12 0] [0 6]
Here the first number stands for 'dozen sheets' and the second number stands for 'pillowcases'. The four brackets denotes four different orders. As long as the zero is inserted when necessary, there can be no confusion as to the meanings of these figures. As the orders are packed, the quantities can be added up. These pairs of numbers are examples of vectors. A vector is any row or column of figures in a specified sequence. The fact that [12 0] is an order for 12 dozen sheets while [0 22] would be an order for 12 dozen pillowcases indicates that the numbers acquire meaning from their positions in the sequence. A vector is normally printed between square or curved brackets or between a pair of double vertical lines. The sum of the four orders is an example of vector addition. Two vectors are added together by adding the first number in the first vector to the first number in the second vector, the second number in the first vector to the second number in the second vector and so on. Each number is called an element of the vector. Vectors can have more than two elements, but two vectors can only be added together if both have the same number of elements. Clearly the sum of the above four orders is [34 14], i.e., 34 dozen sheets and 14 dozen pillowcases. If the firm started to sell blankets also, a new convention would be needed by which [4 2 3] means 4 dozen sheets, 2 dozen pillowcases and 3 dozen blankets. The convention wou.d have to be adopted completely for all orders, inserting 0 whenever an order did not include any blankets. The total quantities ordered would be given by the sum of these three-element vectors, which would itself be a three-element vector. Vectors are, thus, an ordered arrangement of numbers - it can be in a row or a column.
Multiplication of Vectors
If the customer, responsible for the order [4 2], asked for it to be doubled, this would be interpreted as [8 4]. If he asked for it to be tripled, it would become [22 6]. This is the rule for multiplying a vector by an ordinary number, which is called a scalar to distinguish it from a vector. Hence, the result of multiplying a vector [a b c] by a scalar k is the vector [ka kb kc]. A vector may also multiply a vector. But it would be meaningless to multiply together two vectors, both of which represent orders for goods. The definition of vector multiplication will be seen to make sense only when it is applied in a sensible situation. When sheets and pillowcases have been ordered and packed, the next stage is to invoice them. If the prices are Rs 1,800 for a dozen sheets and Rs 700 for a dozen pillowcases, then the amount due for the order [4 2] will be: (4 x 2,500; + (2x700) = 8,600
This suggests a use for the multiplication of vectors. The prices can be represented by a new vector, which, because it is a different kind of vector, will be written as a column: 1800 700 Then multiplying an order vector by this price vector can be defined as multiplying the first element of the order vector by 1,800 and the second element of the order vector by 700 and adding the results together: 1800 [4 21 700 '1800 [06] 700 8,600 [18 6] 1800 700 1800 700 = 36,600 [12 0] 1800 700 = 22,600
Matrix
Algebra
Notes
4,200
[34 14]
= 71,000
It is obvious that (34 x 1800) is the total value of all the sheets in the preceding four order vectors and (24 x 700) is the total value of all the pillowcases, so that the sum of these products, 71000, must be the sum of all the separate orders: 8,600 + 36,600 + 21,600 + 4,200 = 71,000 Two vectors can be multiplied together only if both have the same number of elements. Multiplication of a row vector by a column vector, which always results in a scalar, is called the scalar multiplication of vectors. Now let us turn our attention to matrices.
Matrices
A matrix is a rectangular array of numbers arranged into rows and columns, where the numbers acquire meaning from their position in the array. This means that vectors we discussed earlier are just simple example of matrices. Let us take up an example. Illustration 5.1 Let us assume that the manufacturer of-'sheets and pillowcases discussed earlier has three types of machines. There is one machine for cutting the fabric, three machines for sewing and one for folding. The manufacturing times in minutes per dozen are: Cutting Sewing Folding ^> Sheets 8 38 14 Pillowcases 6 32 4 To form these facts into a matrix, it is only necessary to arrange the numbers between brackets or double vertical lines: 6 38 14 6 32 4 Single vertical lines will not do, as these are used to represent a determinant. Even when it 1ms the same number of rows as columns, a matrix is not at all the same thing as a determinant. There is no way of expanding or evaluating a matrix,
75
since each element has its own distinctive meaning. The production time for an order [4 2] can be calculated by multiplying the order with the manufacturing time. For this purpose, each column of the matrix will be treated as a column vector and the scalar multiplication of the order vector by these column vectors would give the three results. (4x8) + (2x 6)= 44 minutes cutting
Notes
(4 x 38)+ (2 x 32)= 216 minutes sewing (4 x 14)+ (2x4) = 64 minutes folding
However, it is not necessary to separate out the column vectors; a convention of matrix multiplication is adopted which gives the same result: 38 14 2] 32 14 4 = 144 216 64]
The production times now appear as the elements in a new row vector. Vectors are really simple examples of matrices. In general, a matrix has m rows and n columns. If m = 1, then the matrix is a row vector with n elements. If n = 1, then the matrix is a column vector with m elements. All general statements that we made or will make about matrices will also apply to vectors as well. Formally speaking, a matrix is a rectangular arrangement of objects in which the elements can be referred to with their respective position value. Mathematically, a matrix is represented by a capital letter. Thus, A, a matrix having 2 rows and 3 columns is represented by:
6
A=
38 14
6 32 4
The number of rows and columns of a matrix is referred to as its order. Thus, the order of the matrix A shown above is 2x3. Using this notation, a matrix A having n rows and m columns may be written as follows: A;
J
ijJmxn
In plain English it reads as: A is a matrix of mxn elements as shown below. a, A= a,.
I2 'Im
Note that while describing the individual elements, the 1 subscript represents the row number and the 2 the column number of the element.
n d
st
Types of Matrices Matrices appear in various different forms. Some of the important forms are introduced below in brief.
1 f\ ^plfl/nrrn/rttnrisjl KAsiterinl
Null or Zero Matrix A matrix whose all the elements are 0 (zero) is called a null matrix or zero matrix. For example, matrix A given below is a null matrix of order 2x3.
Matrix
Algebra
Notes
A=
0 0 0 0 0 0
As another example, matrix B given below is a null matrix of order 1x4. B=[0 0 0 0] Similarly, following matrix C is a null matrix of order 3x3. "0 0 0" C= 0 0 0 0 0 0 A null matrix is represented by O. For example, a null matrix of order mxn is represented by the following.
Row matrix A matrix that has a single row is called a row matrix. It is the same as a row vector. The order of a row matrix is of the form lxm. For example, the matrix A shown below is a row matrix. A = [ 0 5 7 - 3 1.8] Column matrix A matrix that has a single column is called a column matrix. It is the same as a column vector. The order of a column matrix is of the form nxl. For example, the matrix A shown below is a column matrix. 5 A= -23 [9.0 Square matrix A matrix that has equal number of rows and columns is called square matrix. Thus, every matrix of the order nxn is a square matrix. For example, the matrix A shown below is a square matrix of order 3x3. -10 A= 1 6
- 2 . 5 6 4.1 34 0 0 3.9
77
The elements lying on the diagonal of a square matrix are called diagonal elements. Thus, -10, 4.1 and 3.9 are the diagonal elements of the matrix A shown above. Unit Matrix A square matrix having all its diagonal elements equal to 1 and rest of the elements zero is called a Unit matrix. In other words if A is a unit matrix of order nxn,
A = k l .
Notes
then,
1
// / = / and if i * j
a =0
y
For example, the matrix A given below is a unit matrix of order 4x4. 1 0 0 0" 0
A=
1 0 0 1 0
0 0
0 0 0 1 Diagonal Matrix A square matrix whose all the non-diagonal elements are zero is called a diagonal matrix. In other words if A is a diagonal matrix of order nxn,
A=
KL
0
then,
if i * j
0~ 0 0
For example, the matrix A given below is a diagonal matrix of order 4x4. "1 0 0 0 0 0 0 0 6
0 0 0 -2 Note that a null matrix is also a diagonal matrix. Scalar Matrix A diagonal square matrix whose all the diagonal elements are equal is called a scalar matrix. In other words if A is a scalar matrix of order nxn,
A = k j then,
// i = j and if i * J
where c is a constant quantity. For example, the matrix A given below is a scalar matrix of order 4x4. 3 0 0 (f 0 3 0 0 0 0 3 0 0 0 0 3 Gearly, a unit matrix is also a scalar matrix. The name justifies the fact that a scalar matrix is simply a scalar multiple of a unit matrix as shown below. "3 0 A= 0 0 0 0 0" 0 0 3 = 3x "l 0 0 0 0
1
Matrix Algebra
Notes
0 0
1
0" 0 0
1 =314x4
3 0 0 0 3 0
0 0
Triangular Matrix A square matrix whose all the elements below its diagonal are zero is called an upper triangular matrix. If all the elements above the diagonal are zero it is called a lower triangular matrix. In either case it is simply called a triangular matrix. In other words if A is a lower triangular matrix of order nxn,
A
=kL
for all i < j
then,
, = 0
Similarly, if A is an upper triangular matrix of order nxn,

A
=kL
}
then,
a. = 0 for all i > j

For example, the following matrices are triangular matrices. 1 0 0 0
Mx4=
3 1
7 0 0
B3x3-
2 3
0 0
0 0
0 0 6
0 -4 1
0 0 0 -2 Further, while A is an upper triangular matrix B is lower triangular matrix.
&
Sub-matrix A matrix obtained from a given matrix by deleting one or more rows and/or columns is called a sub-matrix of the given matrix. Thus, if the matrix A is given by: 1 -5 6 0 2 9
Notes
8 - 1
then, all the following matrices are sub-matrices of A. 1 6" ;by removing last two columns. -5 0 1 6 9 -1 ;by removing third column.
-5 0 [1 2 9 ]
;by removing second column and second row.
Addition of Matrices Though matrices do not represent a single numerical value, they can be added together provided both the matrices are of the same order and elements of both the matrices are addable to each other. The elements are added individually to get the sum matrix. Thus, if, and then,
A+B=
=hL
+b
hL
where
c s:a
i 9 S
Null matrix behaves as additive identity for matrices of that order because if a null matrix of appropriate order is added to any matrix the matrix remains the same. That is,
Amxn + Omxn ^ m x n
For example, if A and B are matrices as shown below, "3 0" 1 -5" 3 0
5 -2 then, 4
A+B=
-5 -2
S O Kelfi iv,
Subtraction of Matrices In a manner similar to addition, matrices can also be subtracted from each other provided they are of the same order and the elements are addablerThe elements are subtracted individually to get the difference matrix. Thus, if, A = [aA then, A-B=[cJ where c . = a
(> tj
Matrix Algebra
Notes
and B = [bA
For example, if A and B are matrices as shown below, A= 3 0" B= "1 - 5 " 3 0
5 -2 then, 2 A-B= 5
2 -2
Multiplication of a Matrix with a Number A matrix can be multiplied with a number provided its elements are multipliable with the given number. The elements are multiplied individually with the number to get the product matrix. Thus, if, A then,
k A
=KJ
"3 0 " 5 -2
For example, if A is a matrix as shown below, A= then, 5,4= "5x3 5x5 5x0 ' 5x(-2) "15 0 "
25 - 1 0
Multiplication of Matrices Two matrices can be multiplied together only if the number of columns in the first matrix is equal to the number of rows in the second matrix. The first row of the first matrix is then multiplied by the first column of the second matrix, following the rules for the scalar multiplication of vectors and the result becomes the first element in the first row of the matrix forming the answer. In general, the
product of the r* row of the first matrix and the ; column of the second matrix becomes the element in the I row and the ;' column of the matrix forming the answer.
t h th
t h
Notes
Therefore, it follows that a matrix with m rows and n columns can be multiplied by a matrix with p rows and q columns only when n is equal to p. The product is then a matrix with m rows and q columns. In the above multiplication of a row vector by the manufacturing time matrix, m = 1, n = 2, p = 2, and q = 3. Let us now apply the rules of matrix multiplication. The original four order vectors can be formed into a matrix in which each of the four rows represents a different order. The manufacturing time matrix can. then multiply this and the answer is a matrix in which the columns represent the three types of machine and the rows represent the time taken to process each of the four orders. 4 2 18 6 12 0 0 6 Properties of Matrix Operations It is often useful to represent a matrix by a single symbol. A capital letter is usually printed in bold type to emphasize that it is a matrix. Putting the matrix of the four orders as A and the manufacturing time matrix as B, the product can be written as: AB = C Therefore, C is a matrix giving the total production time on each type of machine for each of the four orders. The equation would no longer be true if B were written before A. In fact, it is impossible to multiply B by A since B has three columns while A has four rows. If the fourth row of A was disregarded, there would then be two matrices which could be multiplied in either order, but the results would be different: N 4 2 44 216 64 8 38 14 18 6 180 876 276 6 32 96 456 168 12 0 OR '8 38 14 ~ 4 2" 18 6 12 0
44 216 64 8 38 14 6 32 4 180 876 276 96 456 168 36 192 24
884 648
244 204
6 32 4
The products are entirely different in the two cases. The second product is, in fact, completely meaningless, since it includes terms such as (8 x 2) where 8 is the cutting time for sheets and 2 is an order for pillowcases! \Ve showed earlier how a row vector could be multiplied by a column vector to obtain a scalar product. From the rules of matrix multiplication, we can now see that it is impossible to multiply a row vector by another row vector or to multiply a column by another column vector, except in the trivial case of vectors with only a single element. Multiplying a column vector by a row vector gives a matrix instead of a scalar product: 1800 700 [4 2] = 7,200 2,800 3,600 1,400
Matrix Algebra
Notes
In this matrix, the elements 3,600 and 2,800 are completely meaningless. For instance, 3600 is obtained by multiplying the price for sheets by the order for pillowcases. Saying that the multiplication of ordinary numbers is commutative (axb = bxa) while the multiplication of matrices is in general non-commutative (AxB = RxA)summarizes the above result. This means that changing the order of multiplication of two matrices will generally change the answer. It will be seen later that there are a few cases where changing the order does not change the answer. A little thought will show that this can only be true for square matrices, that is, matrices with equal numbers of row and columns; but matrix multiplication is not in general commutative even for square matrices. Since multiplying a vector by k means multiplying every element by k, irrespective of whether it is a row vector or a column vector, it is to be expected that the same rule would apply to matrices. This is the case: 4 18 12 0 r 2^ 6 0 6 J 4k 18k 12k 0 '2Jp 6k 0 6k J
l(k = 0, one obtains a matrix in which all the elements are zero, termed a Zero Matrix. A zero matrix is also obtained if two matrices are multiplied together of which one is a zero matrix. But it is also possible to obtain a zero matrix as the product of two matrices neither of which is a zero matrix: S -2 6 1 -3 '1 2 -4' -8 '0 0
0 0
Matrix addition has not been defined so far. It is possible to add together two matrices only when both have the same number of rows and the same number of columns. The sum is then obtained simply by adding together the corresponding elements: '3 2 Co 5" 2 7 /
^5 30 4 25
9~
~>
3+5
8+30
5+9
2 + 4 7 + 25 2 + 2
Notes
Here, the first matrix could represent the machine loading and unloading times and the second matrix the machine running time for the sheets and pillowcases in above illustration, so that the sum would be the manufacturing time matrix already employed. It will be seen that the rule for vector addition conforms to this rule for matrix addition. Unlike matrix multiplication, matrix addition is commutative; changing the order of the matrices, which are added together, does not change the result. A few words must be added on the equality, subtraction and division of matrices. Two matrices are said to be equal only if they are identical; they must have the same number of rows and the same number of columns and every element in the second matrix must be equal to the corresponding element in the first matrix. A matrix can be subtracted from another matrix only when both have the same number of rows and the same number of columns. Subtraction is then simply the reverse of addition:
i LT)
Student Activity
1. Find the where, 3 5 8 2 k 6 value of k
+
0 1 9 5 6 3
38 14"
1 1 1 2 2 2
"5 30 9" 4 25 2
32 4
2 7
Matrix subtraction is non-commutative, but this is to be expected since the subtraction of ordinary numbers is also non-commutative.
0 0 0 0 0 0
2.
Amxn O
n x p =
^nxp
Omxp'
is
Matrix division is quickly dealt with, as it is impossible to divide a matrix by another matrix directly. There is a round about method which you will learn later in the chapter.
if always true that either

Omxn Or B
n X
Give an example of Slit h a pair of matrices whose product is zeromatrix but neither is zero itself.
.
Use of Matrices for Production Planning

If the manufacturer of sheets and pillowcases in the illustration above finds that the machine costs are Re 0.2 per minute for cutting, Re 0.1 for sewing and Re 0.3 for folding, then the total machine costs per product are given by: "0.2" "8 38 14" 0.1 6 32 4 0.3 For most manufacturers there would, of course, be*much larger numbers of machines and products. In this section only a very simple example can be followed through as it makes it easier to understand the process. Computers can handle matrices with dozens of rows and columns, each element having three or four digits and they still use the same process. Matrix B with m rows and q columns may represent the materials contents of the different products. With four ingredients such as sheeting fabric, thread, labels and packing material, there would be a four-column ingredients matrix for sheets and pillowcases, such as:
=
"9.6" 5.6
84
Self Instructional Material
46 7 12 28 16 3 12 13 The units of measurement may be different for each column, being chosen to suit the nature of the ingredient. The same unit will be used in each case when preparing the ingredient-cost vector. This will be a column vector with q elements and may be represented by d. Then Bd will be a column vector with m elements, giving the total ingredient cost per unit for each product: '2.5' '46 7 12 12 28 13 0.3 0.1 0.2 Any labour or other costs not already included in the machine-cost and ingredient-cost vectors will be computed for each product to form an additional cost vector e with m elements. The total cost per unit for each product is then obtained by adding together the three vectors each with m elements: Ac "9.6" + Bd + e
r v
123.9 44.7
16 3
total cost vector "135.5" 51.8
"123.9"
+
5.6 44.7
2.0" 1.5
All the information about machine time requirements, material contents and costs is collected by work study and costing staff and the matrices A and B and vectors c, d and e are stored in the computer. When new techniques or changed prices or wage rates make it necessary, the matrices and vectors are brought up to date. The products Ac and Bd and the total cost vector Ac + Bd + e are also computed and kept up to date. Depending on the pricing policy of the firm, there may also be a selling price vector p, which is computed from the total cost vector by adding a suitable percentage for fixed costs and profit. When an enquiry is received, the prices and delivery dates are quoted by reference to the computer. If this results in an order, it is recorded as a row vector with m elements. A row vector is distinguished from a column vector by a distinctive mark: ^ x' = 14 2] The order vector x' may be multiplied by A to obtain a row vector giving the production times for the order, as calculated earlier. At the same time, x may be multiplied by A c to give the total machine cost for the order: 9.6 5.6
[4 2]
49.6
One would expect to obtain the same result if the production-time vector x' A multiplied by the machine-cost vector c and this is in fact the case:
ls
0.2 [44 216 64] 0.1 0.3 = 49.6
Notes
The fact that x'A multiplied by c always gives the same result as x' multiplied by Ac is called the associative property of matrix multiplication. The product can be written simply as x'Ac. The computer stores a vector of running totals of machine-time commitments, to which the vector x'A is added. As each order is completed, its production times are deducted from the running totals. The commitment for each type of machine may then be divided by the number of machines of that type, obtaining the number of minutes and hence the number of weeks it will take to produce all outstanding orders. This information forms the basis for quoting delivery dates for new order and perhaps also for planning overtime work or the purchase of additional machines.
Student Activity
1. Compute the following products
(a)
[,,,]
(b)
The vector x'B is computed and similarly incorporated in the running totals of ingredient requirements. The scalar quantity x'Bd gives the total ingredient cost of the order and may also be added to a continuous running total if it is necessary to keep a check on the amount of capital needed to finance work in progress. When the order is delivered, the amount of money due is given by x'p. This serves as a check on the invoice total. This presentation is far from complete, but it is enough to show how computers using matrices can keep a check on production commitments and stocks of materials. The most important extension necessary in most firms will take into account production dates. The final delivery date of each order will determine the dates by which various stages of manufacture must be completed. Running totals of commitments will be kept by dates so that no type of machine can be overcommitted at any stage. Let us now turn our attention to other applications of matrices.
23 321
1
71 15 32
Z What is the value of k so that the following expression is here? k 3 5 1 2 9
12 23 34
Solving Linear Equations

A system of linear equations can be solved by the following two approaches involving matrices: (a) (b) Using Elementary Row Operations Using Determinants
0-13 32 44
A few hints have already been given that matrices can be used in the solution of sets of simultaneous linear equations. But before considering the role of matrices, it is useful to consider a technique known as row operations. Elementary Row Operations Solving linear equations by elementary row operations is in principle the same as solving them by elimination. The difference is that elementary row operations, in turn, aim systematically at a co-efficient of 1 for each unknown. For instance, consider the following equations.
3.v
lly =
ISO
6x +
86 Self-Instructional Material
I7y s 300
The first stage in row operations is to divide the first equation by 3 in order to convert the co-efficient of x to 1. This equation is then multiplied by 6 and the result subtracted from the second equation to eliminate x from that equation. So the equations become: x + 3-y = 60
Ox +
5y = -60
2
The next stage is to divide the second equation by - 5, in order to make the coefficient of y equal to 1. This equation is then multiplied by 3 / and the
3
result subtracted from the first equation in order to eliminate y from that equation: x + Oy = 16 Ox + y =12
This gives the solution, x = 16, y = 12. If one is able to multiply and subtract mentally, the whole procedure is very rapid. The other special feature of row operations is that it is unnecessary to keep writing the letters and the addition signs. All that is needed is to write down the figures, keeping the zeros as in the above equations and to insert a vertical line to separate the two sides of each equation: 3 6 1 0 1 0 11 17 32/3 -5 0 1 16 12 180 300 60 -60
A set of three equations in three unknowns requires three stages. It is necessary at each stage to write first the row, which has to be divided through to give the co-efficient 1. This is the first row in the first stage, the second row in the second stage and so on. This row is then used to eliminate the corresponding coefficients in all the other rows, both above and below. Hence, after the first stage the first column of figures reads 2 0 0; after the second stage the second column of figures reads 010, while the first columns remains as 1 0 0 and so on. Take another example. 3 2 5 1 0 0 4 5 9 IV, 2% 2V
3
3 7 2 2 5 -3
207 148 151
76% -27 y
3
1
Notes
0 1 0 0 1 0
-1% -8 0 0 1
-8"
0 0 1 0 0
32% -104 16 5 13
The obvious disadvantage of the method is that it introduces fractions even when the final solution does not include any fractions. Its advantage is that it follows a strict routine, which is always necessary if computers are to be used. Since each row represents an equation, it is permissible to rearrange the rows in order to get out of a difficulty. For instance, the equations:
Student Activity
1. Reduce the following matrix in diagonal form using elementary row operations. 3 6 8 -2 5 8 3 2 9 0 2 1 5 6_
2x 3x 6x -
4y 6y 3y
+ + +
3z 2z z
= = =
14 11 2
After the first stage of row operations give: 1 0 0 -2 0 -1 -2V

-v
2
7
2
1 1
-10 -5
A diagonal matrix is the matrix in which all nondiagonal elements are zero, i.e., a^ = 0 V i # j. ?.. Reduce the following matrix into upper triangular and lowertriangular forum using elementary zero operations. 3 5 - 2 4 2 3 - 1 6 6 2 1 0 1 5 3 1
To obtain 1 in the second position of the second row, the simplest procedure is first to interchange the second and third rows. It would alternatively be permissible to add the third row to the second row. The reader should follow through both methods, obtaining the solution x = 7,y = 3,z = 4.
Row operations can deal with a set of equations that are not all independent. Lets say the initial set of rows is:
2 4 1 3 6 4 1 1 6 1 6 4 1 4
3 1 6 2 3 1 6
4 8 4 5
0 0 0
-1.575 -0.75 0 0
8
0 0
After three stages of row operations they become:
0
0 1
0 0 0 It is now clear that the fourth row contributes no information other than that contained in the first three rows; in other words, the equations were not all independent. There has been no attempt to discard an equation arbitrarily, as done when solving this illustration here. If the solution of a set of equations using determinants gives zero divided by zero for each of the unknowns, row operations will always lead to discarding the correct row since it will eventually produce a row of zeros. If the equations had been contradictory, row operations would have produced a row of zeros to the left of the vertical line with a non-zero value on the right. Since this leads to three independent equations in four unknowns, it is only possible to express each unknown in terms of one of the others. The first row in the final set of rows can be interpreted as b = 1.575w and the other rows give the corresponding solutions for/and h. Using Inverse of Matrices to Solve Linear Equations Before we understand this method it is necessary for us to understand the concept of 'inverse' of a matrix. Inverse of a Square Matrix Inverse of a square matrix A is another matrix represented by A' such that, A A" = A" A=I where the orders of the above square matrices are the same. Consider the following set of simultaneous equations: 3x + l l y = 180 6x + 17y = 300 and 3x + 4y +3z * 107 2x + 5y + 7z = 148 5x + 9y + 2z = 151 These equations can respectively be written as:
] 1 1
0 0,0
0
-1.05
Matrix Algebra
Notes
r
<* A and J 2
11
ISO
\.
300 <~ x >

V z
r
4 5 9
3^ 7
107^ IAS
1151 j
89
It is only necessary to follow out the rules for matrix multiplication to see that these matrix products are identical to the sets of equations. If the symbol A is used to represent the matrix of coefficients, x to represent the vector of unknowns and B to represent the vector on the right-hand side, any set of simultaneous linear equations can be expressed in the form: Ax = B Both A and B are known, but X is unknown. If there was a simple rule for dividing B by A, the result x would be obtained. However, it is'impossible to divide a matrix directly by another matrix. The operations of division are best understood as the reverse of multiplication. 'Divide B by A' is a way of saying, 'find the matrix or vector which when pre multiplied by A will give B'. Since matrix multiplication is in general non-commutative, it is necessary to say 'when premultiplied by A' to indicate that A precedes x in the above equation. If the required matrix or vector exists, it can be found. But the procedure is not very straightforward. It involves the new concepts termed a unit matrix and the inverse of a matrix. The unit matrix of order n, written I , is the square matrix with n rows and columns which has the figure 1 for each element in the principal diagonal, the diagonal from the top left-hand corner to the bottom right-hand corner and 0 for all other elements. Thus:
n
1 0 0~ 0 0 1 0 0 0 \
0 1
In a set of n simultaneous linear equations in n unknowns, A will always be a square matrix. Let I be the unit matrix with the same number of rows and columns as A, which is also the number of elements in x, it is easy to see that:
AI = A
1 A = I x = A x #
These equations can be confirmed by writing any elements one chooses in A. and x and then following the rules of matrix multiplication. Since the product of / with any matrix or vector is identical to the matrix or vector itself, / is sometimes called the identity matrix. For a given matrix A, it is usually possible to find a matrix which when multiplied by A gives the answer I. This matrix is a sort of reciprocal of A. The matrix which has this property is represented by A' and is called the reciprocal or more commonly the inverse of A.
}
The inverse is accordingly defined by the equation:
It can be shown that the multiplication of a square matrix and its inverse always commutes, that is:
AA> =
In this discussion, it has been assumed that a matrix can have only one inverse. Although academic proofs are not the main purpose of this book, it is of interest to see how this can be proved. Let B be any matrix, which satisfies the equation:
BA = I
Then:
B' = BI
BAA' =
IA'
A'
Each step in this proof uses one of the results previously obtained, including the associative property of matrix multiplication mentioned earlier. Since two matrices are only equal if they are identical in all respects, this proves that B is identical to A' and therefore that there is only one inverse of A.
1
This proof gives a glimpse at the foothills of what mathematicians' term matrix algebra. The usefulness of matrices lies in the fact that one can employ them in all kinds of proofs and manipulations, simply representing each matrix by a bold letter without specifying what its elements are or even how many rows and columns it has, provided that one always conforms to the rules of matrix multiplication and addition. If the inverse of A can be found, it is easy to solve the equation: Ax = b It follows that:
A Ax
1
A- b => x = A' b
1
Ix
s A-*b
So the method of solving the equations is to find the inverse of A and then multiply this by B. This is not the same thing as dividing B by A, but it is the nearest one can get to division of matrices. Obtaining Inverse using Elementary Operations The simplest method of finding the inverse of a square matrix is to use elementary row operations. Using the matrix interpretation of a set of equations, elementary row operations involved converting the matrix A on the left of the vertical line into the matrix /. The effect of these operations was to convert the vector b on the right of the vertical line into the solution vector, which is now seen to be equal to A~'b. It is reasonable to deduce that the same operations will convert the matrix / into A' !, which is the same thing as A' .
1 1
Consider
the following matrix: 3 11 17
Notes
Write an identity matrix of the same order along with as shown below. 3 6 11 17 1 0 0 1 matrix and apply the same
Perfrom elementary row operation on the given transformation on the adjoining identity matrix. 1 0 3^ -5 1 3 -2 1 0
1 0
0 1
v_ n
15 15 2 _1_ 5 5
As a check, the final matrix obtained on the right of the vertical line may be multiplied by the original matrix on the left to show that it produces the unit matrix. It is then multiplied by the vector B to obtain the solution vector: 17 15 2 5
n
5 I 5
'180' 300
16' 12
In using the inverse, it is often convenient to keep fractions outside the brackets: _7_ -17 ~15 6 11 -3
To multiply this by the vector with elements 180 and 300 it is then obviously easiest to start by dividing the latter numbers by 15. <^ For example, inverting the following matrix:
3
2 5 7 5 9 2
using row operations the inverse gives: 53 -19 -13 56
31 9 15 7-7
92
Rather large numbers become involved when the vector multiplies this matrix formed by the right-hand sides of the original equations, but eventually the number 56 cancels out: 53 -19 -13 56 -31 7 9 7 15 -7 107 148 151 16
Matrix
Algebra
Notes
5 13
Student Activity
1. Solve the following system of linear equation using matrices. x + 2y + 3z = 18 x - y + 4z = 16 -x + 2y + z = 10 2. Solve the following system of linear equation using elementary rowoperations. 2 3 0 1
It is unnecessary to check that the inverse was correct provided that the solution vector is now checked in the original equations: "3 4 3" 2 -5 7 5 9 2 16' 13 '107' 151
5 = 148
Following this example though involves a lot of arithmetic, it must be remembered that all modern mathematical methods assume that computers are available, so these large amounts of routine calculations are no drawback. Obtaining Inverse using Adjoint of a Matrix Associated with every non-singular square matrix there exists a matrix of the same order called adjoint of the given matrix such that: A
A
12
1 3
=Âdj(A)
provided \a\ * 0
where I A I represents the determinant of the square matrix A explained in the next section. This relation can be used to compute inverse of a given nonsingular square matrix.
Determinants
Determinant is a scalar number that represents some characteristic of a square matrix. Determinants play very useful role in algebraic manipulation of square matrices. Determinant of a square matrix A is represented by enclosing the name of the matrix within two vertical lines, i.e., IA I. Note carefully that it does not represent absolute value or modulus because A is not a number but is a matrix. N Thus, if A is a l x l matrix as given below A=|xl then its determinant is given by: IA i =x. For example, if A =[4],
93
then its determinant is given by: IA 1=4.
Notes
If A is a 2x2 matrix as shown below:
A=
fl
a
i\ 22.
then its determinant is given by
7\ 22
\\ n- 2\ \2
a
For example, if 4 A= 2 3 - 4
then its determinant is given by: '4 - 4 ~ |

23
J
Q
= 4 x 3 - 2 x ( - 4 ) = 12 + 8 = 20
In a similar way, if A is a 3x3 matrix as shown below:

\2 a
A=
2x
22
23 fl
33
32
then its determinant is given in terms of 2x2 determinants as shown below. ll \A\ = I2
a
I3
22 2i 32 3J
a
2 2
2 3
H)
, + ,
a, 32 33
\2 13
+ fl a
>3+!
12
fl a
f l
13
32 33
f l a a _a
a + (-ir'a 3|( i2 23
a a _ a
2 2
2 3
3 1
= a (a a
i l 2 2
3 3
-a
a i 2
2 3 ) ~ 2 | ( i 2 3 3 32^l3)
_fl a fl +a fl a
22 l3)
a fl fl
= a, tf ,a
| 2
33
-t7,|fl a 3 2i i2 33 2 l 3 2 i 3 3 1 1 2 2 3 ~ 31 22 13
32 2 st
+a
Here, we have expanded the 3x3 matrix by 1 column. We can do the same using any row or column of our choice. In all the cases the determinant will remain the same.
94
Elinor of an element a. Minor of an element o of a determinant of order nxn is a determinant of the order (n-l)x(n-l) obtained by deleting the ith row and jth column from the original determinant. Thus, if A is a 4x4 matrix as given below: a A=
u
Matrix Algebra
Notes
a
a
l2
a
a
a
a
2I
22
a
2i
24
!2
Q
ii
9*1
fl
43
44
then some of the minors of elements of A are given by:

22
a 23 *33 43
'24 '34 a 44 ;removing 1 row 1 column

st st
Minor of a,, = A/,
'32 '42
Minor of a
2 3
=M =
2}
a 41
'32 a 42
'34 a 44
;removing 2
n d
;ow 3
r d
column
Other minors can be obtained in a similar way. Any pair of simultaneous equations in two unknowns can be written in the form: a,x + b y = hj
}
The symbols, other than x and y, represent known quantities and the solution by elimination can be carried out in the same way as done earlier. The first step is to multiply the first equation throughout by a and the second equation throughout by a
2 r
+ a b,y = a h
2 2
M'
Then, subtract the first equation from the second in order to eliminate^:
fl.fy/ - a b,y = a, h -a h,
2 2 2
ajb
-a b^
2
Substituting the value of y in the first equation and rearranging we find that
hjb
2
-h bj
2
x= These two formulae are called the general formulae for the solution of any pair of simultaneous linear equations.
95
The next important step is to introduce a particular way of representing thi solution so that it is easy to remember. The method adopted is to write: a^b - a b.
2 2
Notes
The left-hand side of this equation is known as a determinant and the symbols between the two vertical lines are termed the elements of the determinant. The rule for evaluating a third order determinant is:
= a\
*3
c
c
+ (-l)a
b,
2
c, c
3
+a
b
3
c,
b,
The values of y and 2 are obtained by expressing the information in the original equations in the form of determinants by the same routine as for a ; pair of equations; that is, the denominator is the same as in the formula for x, and the column of values from the right hand sides of the equations replaces; the coefficients of y or of 2 in the determinants forming the respective* numerators.
1
a K
}
K
3 h y=
3
3 2=
a a
A,
a,
c c
The same principles can be applied to solve larger sets of equations, the determinants being set up by exactly the same routine. A determinant always has the same number of rows as columns, corresponding to the number of equations and unknowns that have to be solved. The rule for evaluating a determinant of n rows and columns is to take the n elements in the first row and multiply each by the smaller determinant obtained by missing out the first row and the column which contains the elements in question. Thus, the first element is multiplied by the original determinant reduced by the first row and the first column; the second element is multiplied by the original determinant reduced by the first row and the second column and so on. Finally, the products are collected together by adding all the odd ones and subtracting all the even ones. So a fourth order determinant is expanded as:
96
a, a
2
c, b
2
4
=
c 1 ^3
d d
d d
Matrix Algebra
bj a
3 b a
4
^3
*. 4
C
4
d<
Notes
K 2 b
2
d d d
C
2
*2
C, fl
d, a
4 K
<4
Third order determinants, which occur in this expansion are termed as minors of the original detenriinant. If the minor is prefaced by the appropriate sign, it is termed as the co-factor of the element by which it is multiplied. Thus, the cofactor of b, is: a a
2
^2
c
3
d d d
a*
Main Properties of Determinants Unless a computer is being used, a larger determinant is hardly ever multiplied out as it stands. It is much easier to start off by simplifying it, using the principal properties of determinants to reduce large numbers to small numbers or to zero wherever possible. Three of the principal properties have already been mentioned when discussing second order determinants and it can be proved that they remain true for determinants of higher order. The main properties of the determinants can be summarized as under: 1. 2. The value of the determinant is unchanged if the rows and columns are interchanged with each other. The value of the determinant is multiplied by -1 if any-^row is interchanged with any other row. It follows from these first two properties that the value of the determinant is multiplied by - 1 if any two columns are interchanged. If any multiple of any row or column is added (or subtracted) to any other row or any other column respectively, the value of the determinant is unchanged. However, one cannot add a multiple of a row to a column, or vice versa. If any row or column has a factor common to all its elements, then this factor may be divided out. For instance, it can be seen by expanding both sides that:
3.
4-
pa
Notes
pb
b,
5.
It follows from (4) that if any determinant has a row of zeros or a column of zeros, the value of the determinant is zero because then you can take out 0 common and multiply the whole determinant with it resulting in a zero. It follows from (3) and (5) that if any row is identical to any other row* or a multiple of any other row (or if any column is identical to any other column or a multiple of any other column) then the value of the determinant is zero.
6.
We would like to illustrate the use of these six properties before we introduce the final three properties. Illustration 5.2 1 0 is obviously zero, using the properties (5) or (6) discussed above.
i a
Illustration 5.3 3 2 6 is zero, using (6) 4 times the second row.
The first row is 1 Illustration 5.4 47
7 is equal to using (3) -1 13
11
By subtracting twice the first row from the second row, the numbers are made smaller and so easier to manipulate. In evaluating a large determinant, property (4) iSsJirst applied wherever possible. Then property (3) is used where possible to make the numbers small and in particular to obtain as many zeros as possible. Illustration 5.5 12 -11 9 13 17 36 19 13 28 14 15 -5 12 -8 6 11 12 -11 3 13 17 13 5 -5 36 28 4 -8 19 14 2 11
12 -11 = 22 3 13 13 -10 = 12 0 14
17 13 5 -5 -1 -1 3 -1
9 7 1 -2 9 7 1 -2
19 14 2 11 1 0 0 15 = 12 = 12
12 -11 3 13 12 -10 0 -1
-1 -1 -1 -1 -1 3 -1
9 7 1 -2 9 7 1 -2
1 0 0 15 1 0 0 15
Matrix Algebra
Notes
The numbers have now been reduced quite a lot and there is a row^which includes two zeros. The next step is to make a third zero in this row. The third row is then interchanged with the second row and again interchanged with the first row. The fourth order determinant then reduces to a single third order determinant.
22 -28 -10 -22 0 0 22 -1 5
9 7 1 -2
1 0 0 15
12 -28 0 0 = -12 -10 -22 -1 5 0 0 = 12 12 -28 -10 -22 -1 5
9 1 1 0 7 0 -2 15 1 9 7 -2 0 1 0 15 12 = 12 -10 -1 -28 -22 5 1 0 15
The same procedure can now be applied to this smaller determinant. A second zero is obtained in the final column, this column is moved to the first column by applying property (2) twice and property (1) is then applied before the final evaluation.
12 -28 -11 5 1 12 24 -5 -28 -11 1 12 -24 -5 -181 1 -28 -11 425
24
-5 -1
0
15
0
0
-181 425
0 0
1 24
12 -5
-28 -11 425 = 24
1 12 -28
0
-11
0
425
0 0
-5 -18h
-181
-5 = 24 -11
-181
425
24
[(-5
x425)-(-llx-181)]
24 (-2125 - 1991) = 24 x- 4,116 98,784
Notes
This illustration is a long one, since the original determinant was quite a large one. It should also be remembered that it is much easier to apply the rules for simplifying and evaluating determinants oneself than to follow the steps which someone else has chosen. Often there is more than one route to the answer and the choice may be a matter of taste. The last three of the principal properties of determinants combine together some of the earlier properties, enabling one to amalgamate two or three steps into one. 7. If any row is moved up or down an even number of rows, the value of the determinant remains unchanged. If it is moved an odd number of rows, the value of the determinant is multiplied by - 1 . Similarly, if any column is moved an even number of columns, the value is unchanged; if it is moved an odd number of columns, the value is multiplied by-1. 12 -10 0 -1 12 -28 -22 0 5 -28 -11 425 9 7 1 -2 1 0 0 1 0 0 15 1 0 0 = -1 0 12 -10 -1 12 -5 -181 0 -28 -22 5 -28 -11 425 9 7 -2 0 1 0 15
Student Activity
-5
1. Without expanding the following determinants prove that:
-181
(a)
6 3 9 4 1 0 =0
12 3 2
(b)
x x 2 2 2
7. 7, 7.
A determinant can be expanded using the elements in the first column instead of the elements in the first row. Each element is multiplied by its minor, the smaller determinant obtained by deleting the first column and the row which contains the element. The products are then alternately added and subtracted in the same way as when expanding by the first row. 12 -5 -1 -28 -11 5 1 0 15 =12 -11 0 5 15 -28 1 (-5) 5 15 + (-1) -28 1 -11 0
3 3 3 = xyz R
y y y
2. Compute the following determinant using only properties. 4 6 1 0 2 0 8 10 0 2 0 0
3 9 27 3
If there is any row or column in which all theêlements, except one, are zero, the determinant is equal to the product of that one non-zero element and its cofactor. For an element in the j row and the k column, the cofactor is obtained by deleting the j row and k column to obtain the minor and then multiplying by -1 if (j + k) is odd but leaving it unchanged if (j + k) is even.
l h th l h th
12 10 0 1
-28
1 0 12 -28 1 -10 -22 0 -1 5 15 (Positive since j = 3 and k = 3)
-22 7 0 5
1 0 -2 15
12 -5
-28 1 -11 0 = -5 -11
(Positive since j = 1 and k = 3)
Matrix Algebra
-181 425 0 2 -1 4 5 7 3 2 -4 0 0 0 7 5 4 -2 1
-181 425 2 = -7 -1 4 7 3 2 5 4 -2 (Negative since j = 4 and k = 3)
Notes
All these properties of determinants can be proved, but the proofs are of interest only to mathematicians. They are all intuitively reasonable and the manager who needs to use determinants should be content to accept that they are valid for determinants of any order.
Cramer's Rule
Cramer's rule enables one to obtain the solutions of a system of linear equations using determinants alone. Consider the following system of linear equations in three variables x, y and z. ax + b y + cz = d
x x x 2 2 2 x
ax + by + cz = d a x+6 y+c z = d
3 3 3
Let us consider the determinant of the coefficient matrix D as given below. a, D = a

2
b b
c, c
2
Multiplying D with x, we get, ax

x
c, c c
2
xD = a x
2 a
b
x
From the properties of determinants, adding y times column 2 and z times column 3 to column 1 will not change the value of the determinant D. Thus, ax
x
c, c c
2
xD
ax
2
flj* b,
a x-rb y + c z
i l l
b
2
c,
b
2
a x + by +c z
2
2
c
}
Notes
cijX + bjy + CyZ b c
h
d d b
3J
Now, if D * 0 then, we get

y x
c,
d
A
a
*2 *3
d, bj c
3.
Ai
*2 <*2
c
C
3_
Similarly, we can obtain the values for y and z as gi\ given below
2 . 3
fl
c
C
dy
bx b
3.
2 . 3 and a.
a
c 2 2
*3
3.
d, d
2
l
fl
bx b
2
a, b, d.
D
a
b> A. x bx b
2 c
x~
2 a
3
%
C
3.
This is Crammer's Rule. The idea can be easily generalized to obtain solutions to higher order equations. The Best Method of Solving Linear Equations Three different systematic techniques have been presented for the solution of sets of simultaneous linear equations: determinants, row operations and matrices. It is reasonable to ask which is the best method.
102
The method of determinants is usually the quickest if one requires the value offer few unknowns. This often applies when the equations arise from problems in probability. The determinant of the coefficients on the left-hand side of the equations should always be evaluated if there is a doubt whether a set of equations has a unique solution. The quickest way to obtain a complete solution for all the unknowns is usually by row operations. If the method of determinants gives zero divided by zero for each of the unknowns, only row operations will provide the solution, which expresses the unknown in terms of one another, there being no unique solution. Row operations are obviously quicker than matrix methods, since they obtain the whole solution with no more work than is involved in inverting the matrix. Matrix methods are extremely valuable if there are a large number of different sets of equations, all of which have the same matrix of coefficients. For instance, it might be desired to consider all the possible product mixes obtainable by adding or subtracting one or two machines of each of the four kinds. There would then be several hundred possible combinations of machines, each giving rise to a set of equations with the same matrix of coefficients. The inverse of this matrix can be found and then each set of equations is solved simply by multiplying this inverse matrix by the appropriate vector. All three methods can be applied using computer, but in practice computer programs tend to prefer matrix methods because of the wider applications of matrices.
Applications in Management
Managerial Problems Involving Determinants An obvious managerial application of determinants is in the type of problem illustrated below, where it is required to find the correct product mix in order to make full use of machine capacities. Illustration 5.6 In an engineering workshop there, are seven machines for drilling, two for turning, three for milling and one for grinding. Four types of brackets are made. Type A is found by work study to require 7 minutes drilling, 3 minutes turning, 2 / , minutes milling, and V/ minutes grinding, ancl the corresponding times in minutes for the other types are: B: 5, 0, 1 / , / ; C: 14, 6, 9, 3 / ; D: 26, 9, 11, V/ . How many of each type of brackets should be produced per hour in order to keep all the machines fully occupied?
1 2 1 1 2 2 1 2 2
Solution The four equations could be set up in the same way as done earlier, each equation representing the total minutes of work per hour on a particular type of machine. The numbers on the right-hand sides would therefore be 420,120, 180, and 60 respectively. The co-efficients on the left-hand sides would form the determinant constituting the denominator in the solution.' This can be
written out directly from the information given in the question and then evaluated: 7 3
IVi
Notes
5 0
V/l
14
6
26
9
2
V/2
9 11 3 " ~4
7 1 5 3
5 0 3 1
14
2
26
3
22
18
_3
0 5 1 0 0 3 0 1
0
2
8 1
5 3 7
-6
~4
The operations has been as follows. From second row 3, and from 3rd and 4th row Vi has been taken out as common. Then, 1 in the second determinant the second row multiplied by 7, 5 and 3 respectively is substracted from 1st, 3rd and 4th row respectively. In the fifth determinant first column is subtracted from third column. 3
5 3 1
0 8 1 0 8 1 4
5 7
-6
1 15 3 4 1
0 8
1 7
1 -6
1 15 3 4 1 8
25
4
0 4
-7
2 = -25 2 -7 2
2 =
-7
-25 ( - 2 4 - 2 ) =225
The numerator for type A appears more formidable at first, but can be quickly simplified:420 220 2 SO
60 5 0 3 1 14 6 9 3 26 9 11 1 = 60
"7
5 0 3 1 0 0 0
2 3 2 -3 2
0 2 -3
24 6 9 3 -21 6 -3 7 22 9 23 -26 35 9
26 7 9 = 50 2 6 22 2 2 22' 9 23 3 -3 2
5 0 3 1 -21 6 -3
14 6 18 7 11 9 13
26 9 22 3
= 15
15
2 -7 2 -2 -7 2 -2
= 45
2 0
= 45
-3 2
0
-3 45 2
-80
35
104
Matrix Algebra -3 -26
225 2 7 225x22
Notes
= 225(-21+32) =
Since the denominator is 225, the number of brackets per hour of type A in the solution is 11. There are no new problems involved in finding the solution for the other three types of bracket and so this task is left for you. You should note that it has not been necessary either to introduce" symbols for the four unknowns or to write out the four equations. Most managerial applications of determinants are as straight-forward as this example. Determinants are useful when you have to solve a number of equations, but are difficult to use when you have to look at computerization of the firm's operations. We then turn to matrices, to see how these can be used to solve managerial problems. Brand Switching and Markov Chains Let us take up a managerial application of matrices, which is of theoretical as well as practical interest. This is mostly used by advertising agencies and big companies in brand management. Illustration 5.7 Three brands of detergent share the market, 40% of customers buying brand A, 50% brand B, and 10% brand C. Each week there are changes in the customers' choices. Of those who bought brand A previous week, 50% buy it again, but 15% change to brand B and 35% to brand C. Of those who bought brand B, 6 0 % buy it again, 10% buy brand A and 3 0 % buy brand C. Of those who bought brand C, 85% buy it again, 5% buy brand A and 10% buy brand B. What proportion of the market will each of the three brands eventually hold? Solution It is simplest to express the brand switching percentage as decimalsHêeping percentage figures for the market shares. The change in market shares in the first week can be obtained as the product of a matrix representing the brand switching and a vector representing the initial market shares: 0.50 0.15 0.35 0.10 0.60 0.30 0.05 0.10 0.85 40 10 25.5 37.5
50 = 37.0
It will easily be seen that the terms involved in this product, (0.50 x 40), (0.10 x 50), etc., are the correct calculations from the information given in the example
in order to obtain the new market shares. In this type of model, each of the | matrixes adds up to 1 and the elements in each vector total 100. For the following week, the new market-share vector must multiply the same brand switching matrix: 0.50 0.10 0.05 0.15 0.60 0.10 0.35 0.30 0.85 "25.5" 37.0 37.5 '18.325' 29.775 51.900
Notes
As could have been guessed from the original information, brand C is getting a steadily larger share of the market. It cannot, however, obtain a monopoly. A little arithmetic will show that if brand C has 80% of the market one-week it cannot have more than 75% the next week. Clearly its eventual share will be | somewhere between 51.9% and 75%, but obviously a direct method of finding the eventual share is desirable. Let the brand switching matrix be M and the successive market-share vector a, b, c... so that the above two equations can be written as: Ma = b Mb = c A vector x will represent the final market shares such that pre-multiplying it by the brand switching matrix produces an unchanged result. That is: Mx = x Using / for the unit matrix and 0 for a zero matrix or vector, this can be rearranged: Mx = lx Mx - Ix = 0 (M-l)x = 0 The fact that the last equation is equivalent to the preceding one depends on the property of matrix multiplication termed distributive. The full summary of the properties of matrix multiplication is that it is associative and distributive but not, in general, commutative. The latter property makes it essential to place M and / before x in both equations to be sure that they are equivalent. Since M must be a square matrix and / is chosen to have the same number of rows and columns as M, there is no difficulty in finding the matrix (M -1): 0.50 0.15 0.35 0.10 0.05 0.60 0.10 0.30 0.85 "1 0 0" "-0.50 0.10 0.05
- 0 1 0 = 0.15 - 0 . 6 0 0.10 0 0 1 0.35 0.30 --0.15
One cannot find x by obtaining the inverse of (M-I), because it is a singular matrix. Since each column of M totals 1, each column of (M-l) must total zero. The determinant of this matrix must therefore be equal to zero, applying properties (3) and (5) of determinants section in adding the second and subsequent rows to the first row. It follows that (M-I) will always be a singular matrix in this type of problem. The equations represented by the matrix equation are clearly not all independent. When it has been stated what proportions of previous customers are retained or gained by all brands except the last, the equations giving the market share of the last brand is made up of what is left. There is an additional fact not included in the matrix equation. Expressing the market shares as percentages, all the elements in the vector x must total 100. This gives an additional row to permit a unique solution to be found by row operations:
-0.50 0.100.05
0.15 -0.400.10
0.35.
0 0 0
100
0.30 -0.15
In the process of solution, one of the rows will become a row of zeros and the remaining rows will give a unique solution. It is left to you to confirm that the final shares are respectively / , / and / , o r approximately 11.0%, 21.1% and 67.9%.
n 2 3 7 i m m m
Forming it into a vector and pre-multiplying this vector by the brand switching matrix can check the solution. It will be noted that the information about the initial market shares is not used in finding the solution. The final market shares will be exactly the same whatever shares the brands started with. As a practical application, the brand switching example has two drawbacks. First, repeated matrix multiplications will eventually involve fractions of customers, which is impossible. Second, it is highly improbable that there will be a persistent pattern of brand switching; either customers will become less inclined to switch brands, or the pattern of switching will be disrupted by special sales campaigns or dynamic market forces. However, the model has many other applications more realistic than brand switching, particularly when the discussion is transferred from proportions to probabilities. Where an operator attends several machines which are subject to random stoppages at differing average frequencies, so that two or more may be stopped at the same time causing machine interference, the technique here discussed enables the average productivity of each machine to be accurately calculated. It can also be used to calculate average stock levels and the probability of running out of stock when demand and supply are random
a n c
j assist in finding the optimum stock-holding policy. The model has the impressive title ergodic Markov Chains.
s o
Student Activity
1. Solve by Cramer's Rule 2x + y + z = 7 3 x - y + z = -2 x + 2 y - 3 z = -4 2. Solve by Matrix Inverse Method X) + 2 X 2 - X 3 = 3 3X) - X2 + 2xj = 1 2x! - 2 x + 3 x j = 2
2
Summary
Matrices form one of the most powerful tools of modern management and of modern mathematics. Two vectors can be multiplied together only if both have the same number of elements. Multiplication of a row vector by a column vector, which always results in a scalar, is called the scalar multiplication of vectors. A matrix is a rectangular or square array of numbers arranged into rows and columns, where the numbers acquire meaning from their position in the array. Matrix subtraction is non-commutative. Only square matrices can have determinant. The value of the determinant is unchanged if the rows and columns are interchanged with each other. The value of the determinant is multiplied by -1 if any row is interchanged with any other row. If any multiple of any row or column is added (or subtracted) to any other row or any other column respectively, the value of the determinant is unchanged. If any row or column has a factor common to all its elements, then this factor may be divided out.
Keywords
Vector: A vector is any row or column of figures in a specified sequence. Null or Zero Matrix: A matrix whose all the elements are 0 (zero) is called a null matrix or zero matrix. Row matrix: A matrix that has a single row is called a row matrix. Column matrix: A matrix that has a single column is called a column matrix. Square matrix: A matrix that has equal number of rows and columns is called square matrix. Unit Matrix: A square matrix having all its diagonal elements equal to 1 and rest of the elements zero is called a Unit matrix. Diagonal Matrix: A square matrix whose all the non-diagonal elements are zero is called a diagonal matrix. Scalar Matrix: A diagonal square matrix whose all the diagonal elements are equal is called a scalar matrix.
108
Self-Imtructwnal Material
Triangular matrix: A square matrix whose all the elements below its diagonal are zero is called an upper triangular matrix. If all the elements above the <Jiagonal are zero it is called a lower triangular matrix. In either case it is simply called a triangular matrix. Sub-matrix: A matrix obtained from a given matrix by deleting one or more rows and/or columns is called a sub-matrix of the given matrix. Determinants: Determinant is <haracteristic of a square matrix. a scalar number that represents some
Matrix Algebra
Notes
Cramer's Rule: Cramer's rule enables one to obtain the solutions of a system of linear equations using determinants alone. .*....
Review Question
1. Four boys order in a fish-and-chips restaurant. A orders fish, chips and coke. B orders two fish with chips. C orders fish and coke. D orders chips and coke. The prices are Rs 50 for fish, Rs 18 for chips, and Rs 15 for coke. (a) (b) Express each boy's order as a row vector. Add together these four vectors to obtain a fifth row vector representing the total quantities ordered. Express the prices as a column vector. Multiply each of the five row vectors by the price vector, to obtain the amount owed by each boy and the total amount owed. Check that the fifth result in (d) is equal to the sum of the other four results.
(c) (d)
(e)
2.
A manufacturer produces three products A, B, C which he sells in the market. Annual sale volumes are indicated as follows:
Market A 1 II 8,000 10,000 B 10,000 2,000
Products C 15,000 20,000
If the unit sale price of A, B and C are Rs 2.25, 1.50 and respectively, find the total revenue in each market with the matrices, (ii) If the unit costs of above three products are Rs 1.20 and Re 0.90 respectively, find the gross profit with the matrices.
Rs 1.25 help of 1.60 Rs help of
Notes
hTa development plan of a city, a contractor has taken a contract toj construct certain houses for which he needs building materials like I stone, sand, etc. There are three firms A, B, C that can supply him these materials. At one time these firms A, B, C supplied him 40, 35 and 2 5 1 truck loads of stones and 10, 5 and 8 truck loads of sand respectively. If the cost of one truck load stone and sand are Rs 1,200 and Rs 5001 respectively then find the total amount paid by the contractor to each of these firms A, B and C separately. Robin Singh & Company Ltd stocks toothpastes of Cibaca brand and Colgate brand. The matrix of transition probabilities of the toothpaste is shown below:
Cibaca Toothpaste Cibaca Toothpaste Colgate Toothpaste 0.9 0.3 Colgate Toothpaste 0.1 0.7
4.
Determine the market share of each of the brand in equilibrium position. 5. Evaluate the following determinants, first using property (4) three times, then property (3) once or twice to make the number smaller and I finally property (9), or other properties as appropriate. (a) 2 4 12 (d) 1 6 1 4 6. 3 6 24 3 12 6 9 1 1 2 1 4 3 2 2 12 8 2 (b) -3 2 7 -44 30 5 35 -18 -63 (e) 2 3 5 12 (c) 32 28 34 3 -1 -3 -6 35 10 40 6 3 3 9 3 4 3 4 8 4 12
Nahar Spinning Mills produces three varieties of threads, Super fine Grade (A grade), fine grade (B grade) and coarse grade (C grade). The total annual sales in rupees lacs of these products for the year 1999 and 2000 in the four cities is given below, find the total sales of three varieties of threads for two years. For the year 1999
City Calcutta Product Superfine thread (A grade) Fine thread (B grade) ^ 30 10 16 48 12 14 24 16 Mumbai Chennai Delhi
10
Matrix Algebra
CO
Coarse thread (C grade)
16
62
12
For the year 2000

City Product Superfine thread (A grade) Fine thread (B grade) Coarse thread (C grade) >v 34 10 26 20 44 12 10 22 78 10 8 10 Calcutta Mumbai Chennai Delhi
Further Readings
P. N. Mishra, Quantitative Techniques for Managers, Excel Books R.S.Bhardwaj, Business Mathematics, Excel Books D.R., D.J. Sweeney, and T.A. Williams Anderson, Quantitative Methods for Business, 5th edition, West Publishing Company
Unit 6 Mathematical
Notes
Induction
Unit Structure
Introduction Induction and Deduction Principle of Mathematical Induction Summary Keywords Review Questions Further Readings
Learning Objectives
After reading this unit you should be able to: Differentiate between induction and deduction Define mathematical induction Describe the two general steps of mathematical induction Prove identities using mathematical induction
Introduction
One of the major usages of mathematics is to simplify a rather complex mathematical expression, which is encountered in managerial practices related to decision-making. A complex expression can be reduced to an equivalent but simpler expression provided both the expressions yield identical values. Very often one needs a mechanism to prove or disprove a mathematical identity. Induction is very powerful tool for proving mathematical identities. This unit explains and explores the principles of mathematical induction.
Induction and Deduction

Induction and deduction are two very powerful tools used in logical reasoning. These methods are employed to establish the validity of a logical statement. Deduction assumes individual cases to arrive at the validity of a particular case while induction approaches just in the opposite direction, i.e., from particular to general cases. For instance, if you examine 100 cases of fresh recruits and observe that they all tend to look for a better initial placement, you may conclude that Ramesh, who is a newly recruited management trainee he would be looking for a better opening somewhere. This is deduction - from general to particular. Now, assume that from the past experience you have observed that if a new recruit satisfies a set of conditions then he is sure to look for another
employment opening. Applying this rule to all the new recruits, you may conclude that the ones satisfying the conditions will look for another job. This reasoning process is induction - from particular to general. The two approaches are depicted in the following diagram. s Induction
Mathematical Induction
Notes
Particular Instances
General Rule(s)
Deduction Though both the approaches suffer from inaccuracy and fallacies, sometimes these are only methods to arrive at a timely decision. This is inevitable given the fact that the assumptions are to general in these cases. The principles of induction can be applied in a variety of mathematical contexts where one needs to prove or disprove a particular mathematical identity. Unlike logical scenario mathematical induction is accurate more so because it deals with natural numbers.
Student Activity
With the help of suitable examples differentiate between inductive and deductive logic.
Principle of Mathematical Induction

In situations where one is interested to prove whether a given statement is true of all natural numbers or not, principle of mathematical induction provides an easy solution. The premise on which principle of mathematical induction is based is the well-ordered property of the set of natural numbers that says: "...corresponding to each natural number n there exists a natural number n+1" >
The principle of mathematical induction was used first of all probably by Francesco Maurolico to prove that the sum of the first n odd positive integers is n .
2
Mathematical induction attempts to prove that a given statement is true for all the values of an infinite sequence through proving two sub-statements: 1. Prove that the given statement is true for the first element of the infinite sequence, and then Prove that if the statement true for any other element then it is also true for the next element of the sequence.
2.
The sequence in question may be any sequence. However, majority oft induction problems pertain to sequence of natural numbers. Induction of patural numbers involve the following two steps: 1. The basis step: In this step it is proved that the statement is true for n = 1, i.e., the first element of the sequence. The induction step: In this step the statement is shown to be true for n = m+1 whenever it is true for n = m.
Notes
2.
The method is elucidated in the following illustrations. Illustration 6.1 Apply the principle of mathematical induction to prove that for all natural; numbers n the following identity is true. l +2 +3 +4
2 2 2 2 +B
" ( " + lX2i + l)

=
6 Solution: Let the above proposition be represented by P. Thus, P(n):l + 2 +3 + 4

2 2 2 2 +l |
2 i( + lX2n + 1 )
=
6 1. The basis step: Prove that P (l) is true. Putting n =l in the statement, we get,
p ( 1 ) : 1
2 _
1(1
0(2x1
1)
6 LHS=1
2
=1 1x2x3 6
R H S ^ K l + 0 ( 2 x 1 + 1) 6 LHS=RHS P(l) is true 2.
The induction step: Prove that P(k+1) is true assuming that P(k) is true for some natural number k. Let P(k): l + 2 + 3 + 4
2 2 2 2
+ A
= *<*
+ 1
X & + l)
^ ^
6 To prove that P(k+1) is true, i.e., P(k +l):l + 2 + 3 + 4 true. LHS = l +2 +3 +4

2 2 2 2 2 2 2 2 2 2 2
+ * + (* + l ) = (* X * + 2)(2* + 3) . 6
+1
+ k +(k + \)
2 2
=[l + 2 + 3 + 4
+ A ] + (/c+l)
* ( * + l X 2 * + l) 6 k(2k
+ Y)
[v P(k)istrue
Mathematical Induction
= (* + !)
+ (* + !)
Notes
=(* + !)
k(2k+\) + 6(k + \) 6
VLll[2 i 4k + 3k + 6]
k +
(* + 0 [2(* + 2 ) + 3( + 2)]
6 [(* + 2 X 2 * + 3)] 6 + 1)(A-+ 2)(2A:-i-3) 6 Therefore, P(k+1) true whenever P(k) is true. Hence, by mathematical induction P(n) is true for all natural numbers. Illustration 6.2 Prove that for all natural numbers n n + 2n is divisible by 3. Solution Let P(n): n +2n is divisible by 3. 1. Basis Step: To prove that P ( l ) is true put n=l in P(n). P(l)= l + 2 x 1 = 1 + 2 = 3 P(l) is true because 3 is divisible by 3. 2. Induction step: Let P(k) be true, i.e., k +2k be divisible by 3. Let k +2k = 3t for some natural number t. To prove that P(k+1) is also true.
3 3 3 3 3
=RHS
Now, (k +l ) +2(k +l) (k +3k +3k +l )+(2k+2)

3 2 3
Notes
k +3k +5k+3 (k +2k)+(3k +3k+3) 3t+3(k +k +l) = 3(t+k +k +l) which is divisible by 3.
2 2 3 2
Hence, P(k+1) is true whenever P(k) is true. Therefore, from Principle of mathematical induction P(n) is true for all natural numbers. Illustration 6.3 Prove that for every n > 4, n\> 2" Solution LetP(n):n!>2 Student Activity
Prove the following by mathematical induction.
n
V,i>4
1.
Basis step: To prove that P(4) is true pin n=4 in P(n). P(4): 4! > 2 LHS = 4! = 4x3x2x1 = 24
4
1 + 11 + 111 + + 1ML1
n - times n+l
10
n N
- 9n - 10 81
RHS
=2
= 16 Since, LHS > RHS, P(4) is true. 2. Induction step: Let P(k) be true for a k > 4, i.e., k!>2
k
To prove that P(k+1), i.e., (k+1)! > 2 LHS = (k+1)! = (k +l )k! =>(k + \)kl>(k + l)2 Since, (k+l)>2 (* + l)2* > 2 * Therefore, {k + l ) ! > 2
, i + l ) ( +l) k k + 1
is also true.
16
Summary
Induction and deduction are two very powerful tools used in logical reasoning. These methods are employed to establish the validity of a logical statement. Deduction assumes individual cases to arrive at the validity of a particular case while induction approaches just in the opposite direction, i.e., from particular to general cases. The principles of induction can be applied in a variety of mathematical contexts where one needs to prove or disprove a particular mathematical identity. The principle of mathematical induction is bsed on the premise that corresponding to every natural number n there exists a natural number n+1. Mathematical induction attempts to prove that a given statement is true for all the values of an infinite sequence through proving two substatements: Prove that the given statement is true for the first element of the infinite sequence, and then Prove that if the statement true for any other element then it is also true for the next element of the sequence.
Keywords
Induction and Deduction: These methods are employed to establish the validity of a logical statement. I h e basis step: In this step it is proved that the statement is true for n = 1, i.e., the first element of the sequence. The induction step: In this step the statement is shown to be true for n = m+1 whenever it is true for n = m.
Review Question
1. Can principle of mathematical induction be applied in cases other than natural numbers? Give some examples other than those of natural numbers where principle of mathematical induction is applicable. Prove that , (" + 0 1 + 2 ,+ 3 n =" 2. ' 'N V
Prove that number of elements in the power set of a set having n elements = 2"
Further Readings
E R Tufte, The Visual Display of Quantitative Information, Graphics Press P. N. Mishra, Quantitative Techniques for Managers, Excel Books D.R., D.J. Sweeney, and T.A. Williams, Quantitative Methods for Business, Anderson, 5th edition, West Publishing Company
SECTION-II
Unit 7 Data Analysis Unit 8 Correlation and Regression
Unit 7 D a t a Analysis
Notes
Unit Structure
Introduction Data Collection and Presentation Frequency Distribution Measure of Central Tendency Mathematical Averages Positional Averages Commercial Averages Measure of Dispersion Skewness Kurtosis Summary Keywords
J
Review Questions Further Readings
Learning Objectives
After reading this unit you should be able to: Define frequency distributions of a set of data Interpret frequency distributions Compute and interpret averages Compare 3 M's of statistics Measure and interpret dispersion Compute and interpret skewness Explain kurtosis of a frequency distribution Obtain and interpret frequency distributions Apply distributions in practical problems
Introduction
Data is the substrate for decision-making process. Data is measure of some observable characteristic of a set of objects of interest. Statistics is a vast area of applied mathematics wherein data are collected, classified, presented and analyzed for a specific purpose.
Data Collection and Presentation

The data collected could exist in terms of either: Qualitative variables Quantitative variables
Frequency Distribution
Data can be collected either through sampling or otherwise. Data in this unprocessed form is called raiv data. The data at this level can be arranged in an array. For example, if you collected data on electricity consumption for one day of 1000 households, you would get an array with 1000 rows and two
120
columns, 1000 rows for the households and the two columns for house numbers and electricity consumption respectively. Raw data is not much useful in drawing inferences. In our example, we can process the data so as to show the number of houses, which are using electricity within a particular range, number of households which are consuming electricity beyond certain value, etc. The table of electricity consumption ranges and number of houses is shown below:
Table 7.1 Electricity Consumption
Electricity Consumption (kilowatts)

0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99 100-109 110-119 120-129 130-139 140-149 150-159 160-169 170-179 180-189 190-199
Number of houses
1 3 5 10 20 35 50 70 100 130 130 100 70 50 35 20 10 5 3 1
Note that we are not using the house numbers here as the information is irrelevant for the purpose of determining how many houses use how much electricity per day. Also note that this is only single day's utilization and is not indicative of the average utilization of electricity by that household. If it is Sunday, the overall average would be lower than what you car^ draw an inference from this data. This kind of overgeneralizations is very common and is frequent source of errors in real life situations. Frequency is the number of occurrences of a data item. A table such as the one shown above that summarizes number of cases against a column of interest is called a frequency table. Frequency can also be plotted as a graph when it is called frequency distribution graph or more appropriately a frequency polygon as shown below:
Quantitative
Techniques
Frequency Distribution of Electricity Consumption

Notes

Figure 7 . 1 : Frequency distribution graph of Table 7.1
The data items or classes are plotted on X-axis while corresponding frequencies on Y-axis. It is interesting to note that for large number of data values the graph tends to assume a special shape called normal shape. You* will learn about it somewhere later. A more general form of a frequency distribution graph is shown below. The same will be used throughout the text.
Frequency Distribution of Electricity Consumption
< " 2 E o
3
-J
Electricity Consumption (kilowatts) Figure 7.2: General shape of a Frequency distribution
It is important that a frequency distribution should have a suitable number of class intervals. By class-intervals we mean the ranges for which we classified the number of items. In the above case 10 is the class interval used for electricity consumption. If too few classes are used, the original data would be so compressed that little information will be available. If too manv classes are used, there will be too few items in the classes, and the frequency polygon would be irregular in appearance. There are basically three precautions that must be kept in mind when determining the class intervals. We must select the class interval so that the mid-values of the classes coincide, as far as possible, with the concentration of items that may be present. We should avoid open-ended classes like below 10, above 100, more than 10 or less than 200. The class intervals should usually be uniform.

Measure of Central Tendency

One of the important activities involving data sets is comparison of values belonging to the data sets. It is not as simple as comparing to values. For this purpose we need a single value that may represent the set of data. In a series of statistical data that parameter which reflects a central value of the series is called the central tendency. Central tendency refers to a single value that represent the whole set of data. We will discuss some of the most important measures of central tendency. Average An average can be defined as a central value around which other values of series tend to cluster. An average is computed to give a concise picture of a large group. By the use of average complex groups of large numbers are presented in a few significant words or figures. Averages help in obtaining a picture of the universe with the help of sample. Although sample and the universe differ in size, still their average may be very much identical. Averages give a mathematical concept to the relationship between different groups, for example, the trees in one forest are taller than in another forest but in order to find any definite ratio of heights it is essential to resort to averages. An average is representative of the entire data set essentially because of three reasons: i) ii) iii) Ordinarily most of the values of a series cluster in the middle, At the extreme ends the number of items is usually very little, and Ordinarily items with values less than the average cancel out the items whose values are greater than the average.
An average should be affected as little as possible by sampling fluctuations, i.e. for different sample of same population the variation in the average is very little. An average should be capable of algebraic treatment so that it can be used for further mathematical manipulation. Averages may be classified into three broad types: i) Mathematical Averages: a) b) c) ii) Arithmetic mean Geometric mean Harmonic average ^>
Positional Averages: a) b) Mode Median
iii)
Commercial Averages: a) b) c) Moving average Progressive average Quadratic average
Mathematical averages are those which utilize mathematical formulas for the calculation of their values. Positional averages do not use mathematical calculations but give you an indication about the positional characteristics of a certain items. Commercial averages are the applications of averages in commercial situations. If so many varieties of averages are there, the question that arises is which one to use. As we go ahead we would see that each type has a specific application and should be used only in that case.
Mathematical Averages
Arithmetic Mean Most of the time when we refer to the average we are talking about arithmetic mean. This is true in cases like average winter temperature in Delhi; average life of a flash light battery, average working hours of an executive, etc. Arithmetic Mean or simply mean (represented by putting a bar above the variable name) is the quantity obtained by dividing the sum of the values of items (XX) in a variable by their number (n) i.e. number of items.
n For instance, the arithmetic mean of 3, 4, 5, 6, 7 is: 3+4+5+6+7 =5 n 5 Looking at the formula above we can say that the algebraic sum of the differences (called deviations) of the individual items from the arithmetic mean is zero. If the sum of the deviations of individual items from the mean is zero, then the sum of squares of the deviations is minimum when taken from the arithmetic mean than taken from any other item.
c
IX
X=
This means that if any one or more items in the group are replaced by new items, the new arithmetic mean would be changed by the net change divided by number of items. For example if the values 3 and 4 in the above example changes to 8 and 9 (total change of 10) then the mean can be calculated in either of the two ways mentioned below: 5+6+7+3+9 5 =7
New
X = oldX +
OR Change in Value
n
=5+=7 5
Although in this example it would have been faster to do it the original way, the alternate method assumes more and more significance as the number of items goes up.
The above expression can be used to obtain mean when the data is available in raw form. When the data is given in the form of a frequency distribution, as in the table 7.1, the mean is calculated using a variation on the above formula.
Notes
Data Analysis
Here, /stands for frequency of the class x stands for mid value of the class and n stands for total of all frequenes in all classes Revised Table 7.1 is reproduced below as Table 7.2.
Table 7.2: Frequency distribution of Electricity Consumption

0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99 100-109 110-119 120-129 130-139 140-149 150-159 160-169 170-179 180-189 190-199 Total Value
Mid-value (x)
5 15 25 35 45 55 65 75 85 95 105 115 125 135 145 155 165 175 185 195
Number of houses (f)

1 3 5 10 20 35 50 70 100 130 130 100 70 50 35 20 10 5 3 1 848
fx
5 45 125 350 900 1925 3250 5250 8500 12350 13650 11500 8750 6750 5075 3100 1650 875 555 195 84800
Applying this formula to the table we get:
848
In the case where cumulative percentage distribution is given, grouped frequency distribution is derived from the cumulative percentage distribution and then the usual procedure is applied for computing the mean.
Open-ended Frequency Tables

At times frequency tables are expressed in open-ended class intervals. In almost all the cases the classes can be approximated to a corresponding closeended classes. Let us consider some examples.
125
Class Below 100

Notes
fi 7 27 45 66 75
Below 200 Below 300 Below 400 Below 500
This table can be transformed into the following table with fixed-ended classes. Class fi 7 0-100 27-7=20 100 - 200 200 - 300 45-27=18 66-45=21 300 - 400 75-66=9 400 - 500 Consider yet another form of frequency table as given below Class 0-9 10-19 20-29 30-39 40-49 fi 7 20 18 21 9
Such classes can be transformed into the one as given below: Class 0-9.5 9.5-19.5 19.5-29.5 29.5 - 39.5 39.5 - 49.5 Linear Transformation Method One interesting property of the arithmetic mean is that if a constant quantity is added to, subtracted from, multiplied to or divides into each of the items of a set of data, the arithmetic mean also undergoes same operation with that quantity. Thus, for the following set n of data: fi 7 20 18 21 9
AM =
.Y, + *
+ X, + .
+X
Transforming each data in the following manner:
y,
=-r i
+c-d
V/ = 1,2,3...w
Data Analysis
Now, AM(y)=-Y>>.
i
Notes
a^
Xi
+ c-d
b ' -YjC: nc - ntf
bntt = AM(x) + c-d This property can be used to simplify the calculation of arithmetic mean where data values are large in magnitude. The idea is illustrated in the following example. Let the data values be:
Data items(Xj) 1000 1100 1200 1300 1400 1500
Frequency(fj) 67 115 18 99 124 98
X -1300 arming Xj's as Y. = ' we get, 100
Dataitems(Xj) 1000 1100 1200 1300 1400 1500 Total
Frequency(fj) 60 110 10 100 120 100 500
(1000-1300)/100=-3 (1100-1300)/100=-2 (1200-1300)/100=-1 (1300-1300)/100=0 (1400-1300)/100=1 (1500-1300)/100=2
-180 -220 -10 0 120 200 -90
This yields, 1 AM(Y)= Y Y.f. 500 tt

5 0 0
Notes
_qn 0 = = - 500 50
t
Applying reverse transformation, X = 100 x Y. + 1 3 0 0 , we get

Student Activity
1. Compute arithmetic mean for the following data. Class :0-10 11-20 21-30 31-40 41-50 Frequency 45 67 78 34 12
AM(X)=100x
( 9Q\
V 500y
+ 1300 = - 1 8 + 1300 = 1282
The same can be crosschecked using direct method as shown below:
Data items(Xj) 1000 1100 1200 1300 1400 1500 Total
Frequency(fj) 60 110 10 100 120 100 500
Xifj 60000 121000 12000 130000 168000 150000 641000
2. What should be the value of x in the following data if the arithmetic mean is 25? Class 0-10 11-20 21-30 31-40 41-50 Frequency 15 67
X
Here also, AM(X)=641000/500=1282 If two or more groups contain respectively N,, N , ... observations with means ,, ... respectively, then the combined mean (X) of the composite group is given by the relation:
2
34 12
Xn =
NX1+NX2 + NX3+...
Nj + N + N + . . .
2 3 2 3
Here N stands for the sum of the denominator (Nj+N +N +...). Arithmetic means are very frequently used in statistical analyses though they also suffer from certain limitations. Following are the merits and demerits of arithmetic mean. Weighted Average In calculating simple arithmetic mean it is assumed that all items were equal in importance. It may not be the case always. When items vary in importance they should be assigned weights in order of their relative importance. For calculating the weighted arithmetic mean the value of each item is multiplied by its weight, product summated and divided by the total of weights and not by the number of items. The result is the weighted arithmetic average. Symbolically:
A1V =
X.W, + X W + XjW + ...

2 2 3
'
1 7 R
^plf-lyjxtrurfinn/il Khifprinl
Here w w , w ... stands for the respective weights of each of the items.
v 2 v
Weighted averages have important applications in trend analysis and forecasting. But it should be used when any of the following conditions holds true: i) ii) When the importance of all the items in a series is not equal. When the items falling into different grades of the classes of the same group show considerable variation and it is desired to obtain an average which would be representative of the whole group, weighted average is the only proper average to be used. In other words, when the classes of the same group contain widely varying frequencies. When the percentages, rates or ratios are being averaged. When there is a change either in the proportion of frequencies of items or in the proportion of their values.
iii) iv)
It would not be improper to remind you that simple mean and weighted mean are two of the mostly used means. Geometric Mean Geometric mean of a set of data is defined as the positive nth root of the product of all the values n being the number of data items. Thus, if the given data items are:
X
\> 2> 3> n
then, C M . =yX|,X2,Xj or G.M. = (x x ,x

lt 2 3
,x
,.rj
For example, geometric mean of 2, 8 is given by: G.M. =(2 x 8)2 = j2~x8 =4
2
Most important application of geometric mean is in the construction of index numbers i.e. averaging rates of change. For example if you are investing in the stock markets, and your money grows from Rsl,00,000 to Rs2,50,000 in three years and you want to know what is the average percentage gain you are making over the three years, you can use this mean. G= JiL 2-xi
x
This could simply be written as:
This leads us to the general formula for geometric mean G =
Notes
Mathematically speaking the geometric mean is the nth root of the product of n items of a series. The only problem with using this formula is that you cannot do it on a simple calculator and this is the biggest drawback of Geometric mean is also useful in skewed distributions and when averagin ratios. Merits Demerits
gives comparatively extreme values. little weight" to It cannot be used when any of the quantities is zero or negative It is difficult to compute and requires more time in computation It is difficult to understand It gives less weight to large items which sometimes may be a limitation, viz., computing average cost per unit.
It is suitable for arithmetic and algebraic manipulation It is reversible both ways and therefore, suitable for ratios and percentages
Student Activity
1. If the arithmetic mean of two positive numbers is 15 and their geometric mean is 9, find their harmonic mean. 2. The weighted geometric mean of 5 numbers 10, 15, 25,12 and 20 is 17.15. If the weights of the first four numbers are 2, 3, 5, and 2 respectively, find weight of the fifth number.
Harmonic Mean Harmonic mean is defined as the reciprocal of the arithmetic mean of the reciprocals of the data items. Thus, for a simple harmonic mean of
+ + + + .... + A',
For a weighted harmonic mean, the above equation is rewritten as:
I" '0
r
+ w.
1 % J
K 4J
+ ....+ vv
-0
For example, harmonic of 8 and 10 is given by:

T I
_
M =
2
=
2
=
2x40
80
=
T 7 I
8 10
5 > 4
40
40
Although harmonic mean is of limited use, it is less affected by extremely large observations than any other average. It is properly used to average rates where the weights are the numerators of the fractions used to compute the rates.
positional Averages
Mode Mode is that value of the data ivhich has the maximum frequency (i.e. occurs most often). Thus the mode of a set of data is simply the value that is repeated most often. It is the most typical value and, therefore, the clearest example of a measure of central tendency. For example, if you leave for your office everyday in the morning and you recorded the following times for two weeks: 8.30 8.25 8.35 8.29 8.31 8.30 8.32 8.31 8.31 8.31
Notes
Data Analysis
One obvious observation is that you are quite punctual! Anyway if obtain the frequency table for your data as shown below: Time 8.25 8.29 8.30 8.31 8.32 8.35 Frequency 1 1 2 4 1 1
the value 8.31 occurs most frequently (4 times) and is therefore the mode of the given data. You must be thinking that there usually be two items of exactly the same size for a continuous variable, (if measurements are made with sufficient precision), it is apparent that our definition of the mode is somewhat vague. For this we group the data and then use this simple equation: Mode - L+d, + d x C where
2
L, = lower boundary of the class containing the largest frequency d, = difference of the largest frequency and the frequency of the last class d -= difference of the largest frequency and the frequency of the next class
2
C = class interval Application of this formula is illustrated in the following illustration. Illustration Calculate mode of the following data.
20-25 fi 4
25-30
6
30-35 20
35-40 12
40-45 33
45-50 17
50-55 S
55-60 2
131
Quantitative
Techniques
To obtain the modal class take the following steps: 1. Extend the table by 5 more columns Insert into the 1 column the sum of a group of two frequencie starting from the top row Insert into the 2 column the sum of a group of two frequencies; starting from the second row Insert into the 3 column the sum of a group of three frequencies^ starting from the top row Insert into the 4 column the sum of a group of three frequencie starting from the second row Insert into the 5 column the sum of a group of three frequencies^ starting from the third row Identify the highest frequency in each row The row having highest frequency of individual highest rowfrequencies is taken to be the modal class
t h t h r d n d st
Notes
2.
3.
4.
5.
6.
7. 8.
Applying the above steps to our data we get,
x,
20-25 25-30 30-35 35-40 40-45 45-50 50-55 55-60
fi
4 6 20 12 33 17 S 2
(1) W
(2)
(3)
(4)
(5)
26 32
30 38
45 50
25 10
65 62 58
27
Now analyze the tabic to identify the row that appears most of the times in the highest frequency rozv-combination.
S x,
fi (1) (2) (3) (4). (5) Count
4 20-25 10 25-30 6 26 30 30-35 20 32 38 65 45 35-40 12 62 33 40-45^ 58 45-50 17 i 25 50-55 10 27 2 55-60
0 0
3 6 3
1 o
32
Self-InsmutionalMaterial
r0
Since, row 40-45 appears maximum number of times in the highest frequency w s , the modal class is 40-45.
Data Analysis
Therefore,
Notes
Li = 40 d, = 33-12 = 21 d
2
33-17=16
21
|/c = 5
Mode = 40+
21+16
x5 = 4 0 + 2 . 8 3 8 = 42.838
Median Median is the value of that item in a set of data which divides the data into two equal parts, one part consisting of all values less and the other all value greater than it. Defined in another way median is that value of the central tendency, which divides the total frequency into two halves. For example, consider the following data. 8 7 12 11 5 4 11 10
Arranging the data in increasing order, we get: Data value Position 4

1St
5 2nd
7
3
r d
9 5
t h
10 6
th
11
11 8
th
12
4 th
7 t h
9 t h
The middle position number
Number of data itmes (n) +
9+1
Hence Median = 5 value = 9(as shown shaded in the table above) Note that the same result is obtained if the data is arranged in decreasing order. Also note that here in this case the number of data values is odd and hence there is a single position in the middle. However, when the number of data items is even there will be two middle positions. The median in such cases is obtained by taking arithmetic mean of the values corresponding to the two middle positions. For example, consider the following data. 8
t h
12
11
11
10
B~
133
Quantitative
Techniques
Arranging the data in increasing order, we get: Data value 3 1

st
4 2nd
5 3
r d
7 4th
8 5
t h
9 6
t h
10
11
th
11 9th
12 10
th
Notes
Position
7th 8
Since, the number of data values is even, there are two middle positions are 1 given by: /
I
Number of data items and , Number of data items +1
The middle positions in this case are number = ^ and ~ + 1 , i.e., 5 data items (as shown shaded in the table above). 8+9
t h
and 6
t h
*|
Hence Median = Mean of 5
t h
and 6* value =
8.5
Calculation of Median from simple series is very simple. If the data set 1 contains an odd number of items, the middle item of the array after arrangement is the median. If there is an even numberof items the median is the average of the two middle items after arrangement' Calculation of Median from simple frequency distribution (Ungrouped) is also easy- The cumulative frequency (less than type) corresponding to each distinct value of ih*\variable is calculated. If the total frequency is N, the value of the variable corresponding to cumulative frequency ^ * ^ median. Calculation of Median from frequency class distribution (Grouped) or continuous variable is slightly more complex. The following formula is used: N Median where L = Lower boundary of the median class N = Total frequency F = Cumulative frequency below the class immediately preceding the median class F = Frequency of the median class
m
gives the
xC
C = Class interval or width of the Median class. For example, consider the following illustration.
Illustration Calculate the median of the following data. Xi fi 20-25 12 25-30 18 30-35 35 35-40 42 40-45 50 45-50 45 50-55 20 55-60 8
Notes
Data
Analysis
Form greater than cumulative frequency distribution table as shown below.
Xf 20-25 25-30 30-35 35-40 40-45 45-50 50-55 55-60
fi 12 18 35 42 50 45 20 8
Greater than Cumulative Frequency 230 218 200 165 123 73 28 8
Total frequency (N) = 12+18+35+42+50+45+20+8=230 Or, = 2 3 0 / 2 = 1 1 5 2 Since 115 falls into class 40-45, this is the median class.
L =40
N = 230 F =73 F
m
=50
C =5 230
-73
Therefore, Median= 40 + 50 Quartiles
x 5 = 40 + 4.2 = 44.2
Quartiles are another set of measures of positional central tendency. Like median, a quartile divides the entire set of data into four equal parts. Each part is known as a quartile. Therefore, three quartiles are possible in a data set as shown below. General idea remains the same. The data values are arranged either in ascending or descending order.
First value
First quartile(Qi)
Second quartile(Q )
:
Third quanile(Q )
?
Last value
!35
The first quartile (Qi) represents a value below which there are 1 fourth of t total data items fall. Similarly, half of the data items fall below the Q2. Clearl Q2 is same as Median. Likewise, 3 fourths of the data items fall below Q3. Deciles In a manner similar to median and quartiles, the data set can be divided int 10 equal parts when arranged either in ascending or descending order. Eac pint of division is called a decile. Thus, there are nine deciles represented as Di, D ,....D .
2 9
Notes
The interpretation of a decile is similar to that of median and quartile. Percentiles The data set can also be divided into 100 equal parts whence each point 0 division is called percentile. The 99 number of percentiles are represented b
Pl,P2,..P99.
A general formula for all the positional measures of central tendency for frequency class distribution is given by: N
1
x
fr
C xh
where,
1
N
" p
th
partition
th
lower limit of the i partition total frequency number of partitions or divisions frequency of the i partition class width
p th
f r
For median n = 2. Thus, N C xh
fm
For quartiles i = 1, 2, 3. The T;'s are Qi, Q2 and Q3; n =4. Thus,
p
)_
xh
f3,
1 \ft Ks/tZIntrrurtinmstf Mstta-ris:!
Ql Q,
=L
N } 2x - C 4 J x /? V +
Ddta Analysis
A 'ft
( \ 3x N -C I 4 JX
Notes
/?
For deciles i = 1, 2,3 ..., 9 and n = 10. Thus,

p
U0
fo,
t
x /j
and likewise
7x-C
1 0
/ D
7
xh
Student Activity
The following table gives the distribution of daily wages of 900 workers. However, the frequencies of the classes 40 - 50 and 60 - 70 are missing. If the median of the distribution is Rs 59.25, find the missing frequencies.
For percentiles, i = 1, 2, 3
99 and n = 100. Thus,

p
100
x /?
and likewise
5 3
c 100
x/7
Empirical Relation between Mean, Median and Mode For a perfectly normal set of data, the following relation holds between mean, median and mode. However, for data set which is not normal, the relation is only approximation. Mode = 3Median - 2Mean
Commercial Averages
Moving Average The moving average is an arithmetic average of data over a period and is updated regularly by replacing the first item in the average by the new item as it comes in. It is useful in eliminating the irregularity of time series and is generally computed to study the trend.
137
Notes
Suppose the prices for 12 months are given and a three monthly average is to, be computed. Then the first item in the 3-month moving average would be the average [(a +a +a )/3], the second item would be the average of the next three* months [(a +a +a )/3] and so on. The last item would be the average [(a +a +a )/3]. As the next month would come in a 10 would be dropped and, al3 would b e added in [(a +a +a )/3] and so on.
1 2 3 2 3 4 w n 12 u 12 13
Progressive Average Progressive average is also calculated with the help of simple arithmetic^ mean. It is a cumulative average. In computation of prog ^ssive average, figures of all previous years are added and divided by the number of items^ As the number of items go up and reach a desired number, we switch t moving average. Quadratic Average The Quadratic Mean or average is estimated by taking the square root of the average squares of the items of a series. Symbolically,
Student Activity
Compute the quadratic average for the following values.
2 2 4 6 4 8 6
3
9 6
7 2
a + b +c +... + N where Q = Quadratic Mean

m
a, b, c
ii = squares of the different values
Quadratic average is useful when some items have negative values and other positive values because in such cases the mean is not very representative. It is also used in averaging deviations, rather than original values, when the standard deviation is computed.
Measure of Dispersion
We will consider several measures of dispersion, but there are only two that are used much; the range and the standard deviation. Range Range of a data smallest value. set is the difference between R = Max(Xj) - Min(xi) For the example, consider the following sets of data. TEAM-A:500 TEAM-IB: 250 Here, AM(Team-A)=300 300 350 200 250 100 350 400 280 320 the largest value and the^
AM(Team-B)=300 Range(Team-A) = 500 - 1 0 0 = 400 Range(Team-B) = 350 - 250 = 100 Lower value of range indicates that the data values in the set are more concentrated about the mean value.
Notes
Data Analysis
Mean Deviation
Also known as average deviation, mean deviation is the mean of the absolute amounts by which the individual items deviate from the mean. The following procedure is usually applied: 1. Calculate the absolute deviation from the mean, removing any negative signs. Add all the deviations. Divide the sum of the deviations by the total number of items.
2. 3.
Symbolically, these steps may be summarized as follows: M D = l x, -x where xis the arithmetic mean of
variable x. For example, consider the following data. X:10 Here, 240 = 30
X 10 20 30 50 35 25 45 25 240
20
30
50
35
25
45
25
8 Stem number 1
2 3
Deviation from mean

-20 -10 0 20 5 -5 15 -5 0
Absolute deviation from mean

20 10 0 20 5 5 15 5 80 >
4
5 6 7 8 Total
Therefore, M D = - Y X, -X 8 Mean deviation is simple and easy to understand and unlike Range, it is affected by the value of each item. But it is unreliable because it varies from sample to sample taken from the same universe. Also it is a biased estimator of the population. Therefore standard deviation, discussed below, is the most often used measure of population dispersion.
Standard Deviation and Variance

Notes
The standard deviation of a sample (SD) is similar to the mean deviation in that it considers the deviation of each X value from the mean. However, instead of using the absolute values of the deviations, it uses the squares of the deviations. These are added, divided by n, and the square root extracted. The formula for standard deviation SD (usually represented by a )
SD= Variance is the square of SD and is represented by:
Student Activity
1. Compute the standard deviation of the first n natural numbers. 2. Compute standard deviations and coefficients of variation for the foil owing two sets of data. TEAM-A 500 300 200 100 400 TEAM-B: 350 250 350 280 320
Variance= V = a
z \ x, -xJ
i=l
n The detailed calculation is shown below:
Item number
1 2 3 4 5 6 7 8 Total
X 10 20 30 50 35 25 45 25
Deviation from mean

-20 -10 0 20 5 -5 15 -5 0
2
Square of Deviation
400 100 0 400 25 25 225 25 1200
;, -x
S D = o = + '
=+
2 2
=1 2 . 2 5 (apyrox.)
Varianee=a =(12.25) =150.0 The concept of using sums of squares of deviations about the arithmetic mean of a distribution is very important and we would use it extensively in the chapters that follow. Coefficient of Variance To get an indication of the variation that is related to the mean, we divide the standard deviation by the mean to get the coefficient of variance. This enables
140
Self-In;t'n11ionalMaterial
ys to compare two groups, which have different standard deviations and means more easily. Coefficient of variation=
Data Analysis
M
It is independent of the unit of measurement of the data values. Hence it is more useful for comparison purposes. Sometimes coefficient of variation is expressed in percentage form as shown below. Percent coefficient of variation= x 100
M
Interquartile Range Interquartile Range is an absolute measure of dispersion given by the difference between third quartile (Q ) and first quartile (Q,). Symbolically,
3
Interquartile range = Q - Q
3
The range of a distribution is not a satisfactory measure, particularly, when it contains extreme values. The presence of a very high and a very low observation unduly increases the range of distribution. To avoid this difficulty, interquartile range is used as measure of dispersion. Interquartile range is the range of middle 50% of observations. If the observations of a distribution are more densely concentrated around median, interquartile range will be less than half of the range. Further, if they do not have any concentration around median then corresponding interquartile range will be wide and will tend to be equal to half of the range. Interpercentile Range The difficulty of extreme observations can also, be tackled by the use of interpercentile range or simply percentile range. Symbolically, Percentile range = P - P [i < 50).
This measure excludes /% of the observations at each end of the distribution and is a range of the middle (100 - 2i)% of the observations. Normally, a percentile range corresponding to i - 10, i.e., P - P is usetl. Since Q, = P . and Q = P^ therefore, interquartile range is also a percentile range.
w m 2 3
Quartile Deviation or Semi-Interquartile Range Half of the interquartile range is called the quartile deviation or semiinterquartile range. Symbolically, The value of Q.D. gives the average magnitude by which the two quartiles deviate from median. If the distribution is approximately symmetrical, then M + Q.D. will include about 50% of the observations and, thus, we can write Q, = M - Q.D and Q, = Al + Q.D.
i
Further, a low value of Q.D. indicates a high concentration of central 50% observations and vice-versa. Quartile deviation is an absolute measure of dispersion. The corresponding relative measure is known as coefficient of quartile deviation defined as Coefficient of Q.D. = Analogous to quartile deviation and the coefficient of quartile deviation we can also define a percentile deviation and coefficient of percentile deviation as a n d , respectively.
Skewness
Normal frequency distribution is symmetrical about the mean. A frequency distribution may be compared with a normal distribution for symmetry. Skewness is a measure of the lack of symmetry or degree of distortion from symmetry exhibited by a normal distribution. Any measure of skewness, indicates the difference between the manner in which the items are distributed in a particular distribution compared to a normal distribution. For a normal frequency distribution mean=mode. If mode > mean the distribution is said to be positively skewed otherwise it is called negatively; skewed. Here are examples of discrete symmetric and skewed frequency distributions.
M Mo Symmetric
Mo M Negatively Skewed
M Mo Positively Skewed
The most useful measure of skewness is the Karl Pearson's coefficient of skewness given by: ^> Mean - Mode Std Deviation x- M a
n
Karl Pearson's Coefficient of Skewness
When the mode is not clear or where there are two or three modes, the following formula is used: . . 3(Mean- Mode)
Kurtosis
Another characteristic of a frequency curve is its peakedness of flatness. Look at the two continuous frequency distributions superimposed on each other for convenience of comparison.
Data Analysis
Notes
Student Activity
1. Compute Karl Pearson's coefficient of skewness for the following data. 6 5 7 5 8 4 6 4 6 4 1 2. Which expression will you use tcrcbmpute Karl Pearson's coefficient of skewness for the following data set. W h y ? 1 3 4 2 2 4 1 2 1 2 4 2 2 4 4 2 1 1 1 4 4 1
In both the distributions mean, mode and skewness are same. However, distribution A is flatter as compared to the distribution A. Kurtosis is a measure of flatness or peakedness of a frequency distribution as compared to a normal distribution. As such the literal meaning of this Greek word is bulginess. A distribution more peaked than a normal distribution is called leptokurtic and the one more flatter than the normal is called platykurtic. The peakedness of normal distribution (held to be normal) is called mesokurtic. To measure kurtosis one another property of frequency distribution called moment is required. Moment
Summary
The data collected could be in terms of qualitative variables or in terms of quantitative variables. Quantitative variables may be discrete, continuous or a combination of the two. Discrete variables take on only whole number values. Continuous variables can be measured to any arbitrary degree of accuracy. The result may or may not be a whole number. Frequency distributions may differ in average value, dispersiori, shape or any combination of the three. Skewness is the measure of lack of symmetry or degree of distortion from symmetry exhibited by a normal frequency distribution. A positively skewed distribution has more extremely large values but fewer extremely small values than does a norma] distribution. Distributions can also be skewed to the left and then thev are known as negatively skewed.
Keywords
Data: Data is measure of some observable characteristic of a set of objects of interest. Statistics: Statistics is a vast area of applied mathematics wherein data are collected, classified, presented and analyzed for a specific purpose. Frequency Distribution: Data can be collected either through sampling or otherwise. Data in this unprocessed form is called raw data. Central tendency: Central tendency refers to a single value that represent the whole set of data. Average: An average can be defined as a central value around which other values of series tend to cluster. Arithmetic Mean: Arithmetic Mean or simply mean is the quantity obtained by dividing the sum of the values of items ( X) in a variable by their number (n) i.e. number of items. Geometric Mean: Geometric mean of a set of data is defined as the positive; nth root of the product of all the values n being the number of data items. Mode: Mode is that value of the data which has the maximum frequency (i.e. occurs most often). Median: Median is the value of that item in a set of data which divides the data into two equal parts, one part consisting of all values less and the other all value greater than it. Quartiles: Quartiles are another set of measures of positional central tendency. Like median, a quartile divides the entire set of data into four equal parts. Deciles: In a manner similar to median and quartiles, the data set can be divided into 10 equal parts when arranged either in ascending or descending order. Each pint of division is called a decile. Percentiles: The data set can also be divided into 100 equal parts whence each point of division is called percentile. ^ Moving Average: The moving average is an arithmetic average of data over a period and is updated regularly by replacing the first item in the average by. the new item as it comes in. Range: Range of a data set is the difference between the largest value and the smallest value. Skewness: Skewness is a measure of the lack of symmetry or degree of distortion from symmetry exhibited by a normal distribution.
Review Questions
1. 2. What do you mean by 'Central Tendency? Compute arithmetic mean of the following series : Marks No. of 3. : 0-10 10-20 Students : 12 20-30 30-40 40-50 50-60 18 27 20 17 6
Notes
Data Analysis
Find out the missing frequency in the following distribution with mean equal to 30. Class Frequency : : 0-10 5 10 - 20 6 20-30 10 30-40 ? 40-50 13
4.
A firm of readymade garments makes both men's and women's shirts. Its profit average 6% of sales ; its profit in men's shirts average 8% of sales. If the share of women's shirts in total sales is 60%, find the average profit as a percentage of the sales of women's shirts. The following data denote the weights of 9 students of certain class. Calculate mean deviation from median and its coefficient. S.No. : 1 2 3 4 5 6 7 8 9 Weight : 40 42 45 47 50 51 54 55 57
5.
6.
Calculate the standard deviation from the following data: Age less than (in years) : 10 20 30 40 50 60 70 80 No. of Persons : 15 30 53 75 100 110 115 125
Further Readings
R. S. Bhardwas, Busmess Statistics, Excel Books B. H. Erickson and T. A. Nosanchuk, Understanding Data, McGraw Hill Abraham, B., and J. Ledolter, Statistical Methods for Forecasting, John Wiley & Sons
145
Unit 8 Correlation a n d
Notes
Regression
Unit Structure
Introduction Correlation Analysis Scatter Diagram Covariance Karl Pearson Coefficient of Linear Correlation Spearman's Rank Correlation Regression Analysis Fitting Regression Lines Summary Keywords Review Questions Further Readings
Learning Objectives
After reading this unit you should be able to: Understand the basics of bivariate frequency distribution Explain the concept of correlation Define correlation Compute and interpret correlation coefficients for a bivariate frequency distribution Explain the concept of regression Compute and interpret regression equations Describe and use regression coefficients Employ correlation and regression in managerial decision making
Introduction
We have been concerned with data sets describing a single characteristic of a population such as income of employees of a company, amount of electricity bills for residents of a colony, and the like. A data set containing measures of a single variable characteristic is known as a univariatejiate set. However, in practice two or more variables are measured in a data set. For instance, the income and savings of employees of a company, income and amount of electricity bills of residents of a colony, are data sets having measures of two variables. A data set having measures of two characteristics is called bivariate data set. Correlation and regression are useful tools employed to predict the value of one variable in terms of other variables. This unit explains correlation and regression in detail.
146
Correlation AnalySiS
Correlation is a measure of degree of association between two (or more) variables in a data set. Thus, if it known that two variables are highly
Correlation and Regression
Notes
correlated then one can predict the value of one variable on the basis of the value of the other variable. Two variables - say X and Y - are said to be correlated if: a. Both increase and decrease together. In this case the variables are said to be positively correlated. One increases then the other decreases, when the variables are said to be negatively correlated.
b.
Correlation between two variables may exist due to the following reasons: (i) Causal effect: Movement in one variable causes the movement in the other. For instance, have a look at the following set of data having two variables - number of passengers and airfare for an airline during different months in a year.
Airfare (Rs.) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1000 1300 1100 1500 1700 900 1100 1200 1300 900 1000 1200 Number of Passengers 470 350 420 300 250 500 410 380 345 520 460 400
Here, we observe that the two variables bear some definite relation. As the airfare increases the number of passengers goes on decreasing though the proportion is not the same. Clearly, since we know that the number of passengers is affected by the amount of airfare, which is also reflected in the data, these two variables are correlated to each other in a causal manner. The number of passengers can be taken as dependent variable and the airfare as independent variable. This is because the airfare determines the number of passengers who commute. (ii) Interdependence: It may so happen that both the variables affect each other. In the previous example, we can also argue that since the umber
of passengers decrease the airline has to hike the airfare to maintain the profitability. In such cases either of the two variables may take up dependent or independent role. (iii) Outside interference: The observed correlation between two variables may also be caused by a third variable. For instance, suppose variable X increases as variable Z increase. Hence, X is dependent variable while X is dependent variable. Also suppose that variable Y increases with variable Z. Now, though the two variables X and Y exhibit a correlation, it is not due to one causing an effect directly on the other but because of the interference of a third variable. Coincidental: Two variables may be correlated without having any. meaningful relation existing between them purely depending on chance. For example, the number of employees recruited in my company during a year and number of train accidents in Bihar during the same year. Neither of the variables posses any plausible relationship and yet their data may exhibit a correlation. Such correlations are totally coincidental.
(iv)
Having understood the basic concepts of correlation, it is pertinent to obtain a method of measuring this aspect of a bivariate distribution. Before, we take up numerical measures of correlation let us acquaint with a visual representation of the same.
Scatter Diagram
Scatter diagram or scatter plot is a graphical representation of a bivariate distribution in which the two variables are plotted on the two axes of a coordinate graph region. Thus, if (X;,Yj) for i = 1, 2, ...n be the bivariate data, then a scatter diagram may be obtained by plotting the individual points on a graph paper. While doing so it is customary, though not mandatory, to represent the first value of the pair on the X-axis and the second value on the Y-axis. Note that the unit and scaling on the two axes may be same or different. For example, let us plot a scatter diagram for the bivariate data shown below.
Airfare (Rs.) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec1000 1300 1100 1500 1700 900 1100 1200 1300 900 1000 1200 Number of Passengers 470 350 420 300 250 500 410 380 345 520 460 400 >
Representing Airfare on the X-axis and Number of Passengers on Y-axis and calibrating the axes suitably, we get the following scatter diagram. Different calibration would yield different scatter diagrams.
Notes
Clearly we see here in this scatter diagram that the two variables are negatively correlated, i.e., one increases the other decreases and vice-versa. There are many possibilities in the pattern of a scatter diagram. The variables may be linearly distributed and positively correlated as shown below.
Positive Linear
The variables may be linearly distributed but negatively correlated as shown below.
Negative Linear
Quantitative
Techniques
The variables may be non-linearly distributed in a variety of ways such as oi shown below.
Non-linear Relation
Notes
The variables may be perfectly linearly correlated as shown below.

Perfectly Linear Correlation
It is only a linear (perfectly or nearly) and some non-linear relationships that are meaningful for prediction purposes. Two variables may also posses no apparent relationship between them as shown below. Here there does exist a linear relationship, however, increase or decrease in one does not affect the value of the other.
No Correlation
****
***
**
+*
* *
Note, however that two variables may have a linear relationship and yet may not be correlated at all as shown below.
No Correlation
Self Instructional Materia/
*s
Once the qualitative relationship between the two variables is determined we would like to measure the strength of association between them. The strength of association between the variables is measured in terms of coefficients of correlation discussed below.
Notes
Covariance
Let us consider the following scatter diagram, which depicts two variables which are positively linearly correlated.
Draw two straight lines parallel to axes and passing through (X, Y) where X and Y are arithmetic means of variables X and Y respectively as shown above. Observe the following expressions: (i) For every value of x right of line X = X and above the line Y = Y, the values of
X-X
is positive.
)
(ii)
For every value of y right of line X = X and above the line Y = Y, the values of Y-Y is positive.
J
(iii) For every value of x left of l i n e X = X and below the line Y = Y, the values oil X - X (iv) is negative.
For every value of y left of line X = X and below the line Y - ^, the values of Y-Y is negative.
Therefore, if there is a positive linear correlation between the two variables the points will be concentrated more on the right-and-above and left-andbelow region. In both the cases the product of the two deviations will positive. Some data points may give negative product. However the sum products of deviations of all the data points will definitely yield a positive value. The sign of the following expression thus'is ar measure of the correlation between the two variables.
v I k - * A i=|
Note, however, that this expression depends on the number of observations * and also on the units of measurements of the variables. The average of the products of deviations is called covariance. It does not depend on the number of observation. Thus, covariance of two variables X and Y is defined as: ( Covariance between X and Y = Cov(X,Y)= ^jT \
x , x Y
t
Notes
\
- Y
)v
Karl Pearson Coefficient of Linear Correlation

We observed that the more is the covariance the more will be correlation! between the two variables. Therefore, covariance can be treated as a measure? of correlation between two variables. However, the magnitude of covariancefH will depend on the units of measurements. The following expression derived from covariance does not suffer from the effects of units of measurements and! hence is called Karl Pearson Coefficient of Linear Correlation or simplylj coefficient of correlation and is denoted by r.
1
r=
\
Y
f
r l
x
x J
- Y J
a y
"
M \
Or, more concisely,

C o v { X , Y )
r=
x
where, a and a are standard deviations of variables X and Y respectively..

Y
The same can be simplified to give the following equivalent expression.
Or, more concisely,

n
r=
n where,
x
i
aa
= X i X
and
= Y i - Y
deviations of variables X and Y from their
respective means. Example 1 Compute the Karl Pearson coefficient of correlation for the following set of data.
oo
Y Solution
10 7
7 3
12 8
11 7
15 5 10
Notes
In order to apply the following formula,
r=
we will have to compute the following from the given table. Xi Yj 7 10 7 3 12 8 4 8 11 7 9 5 15 10 72 44 XjYi 70 21 96 32 77 45 150 491 Yi Xi 49 100 49 9 144 64 64 16 121 49 25 81 225 100 784 312
2 2
Here n= 7. Putting the values in the expression, we get: 7x491-72x44
r=
Or,
V7 x 784 - 72 x 72 V7 x 312 - 44 x 44
3437-3168
269 V304V248
269 17.43x15.74
r=
Or,
V5488-5184^2184-1936
r = 0.98
Properties of Coefficient of Correlation The Karl Pearson Coefficient of Linear Correlation possesses a number of very interesting properties as described below. 1. The coefficient of linear correlation always lies between ^1 and 1 inclusive. -1 < r < 1 The value 1 suggests perfect positive linear correlation while -1 implies perfect negative linear correlation. A value 0 indicates that no linear correlation exists between the variables. 2. Coefficient of correlation is not affected by linear transformation of the variables. Thus if r \ Y is the correlation between variables X and Y, and r j is the correlation between A and B, then
Ar
153
AB = J " X Y
where, A = aX+b 3. and B = cY+d
If two variables are not related then they are also not correlated.' However, if they are uncorrelated they may be related. This directly follows from the fact that coefficient of correlation measures strength of linear relationship. If the variables are related but not linearly then . the coefficient of correlation may turn out to be 0 even though they are related otherwise.
Student Activity
1. Compute and interpret the coefficient of correlation for the two variables in the following data.
20 Y 128
21 140
18 80
23 142
12 50
10 50
9 45
11 48
14 70
17 90
2. Show that the variables X and Y in the following data are related but not linearly correlated.
5 Y 25
2 4
7 49
4 16
8 64
9 81
10 100
12 144
11 121
15 225
Spearman's Rank Correlation

The data set describes qualitative characteristics of a population such as intelligence of students, beauty of a "beauty pageant contestant", etc. In order to investigate correlation between such data sets Karl Pearson Coefficient of Correlation, which assumes numerical data, is not suitable for direct application. For such cases, a cruder though effective method has been devised that computes a coefficient of correlation between the variables not based orrlhe numerical values of the variables but on the ranks assigned to the data. The Spearman's Rank Correlation Coefficient is represented by p . The process is simple as described below. 1. The values of each variable are arranged in either both increasing or both decreasing order. The numerical ordinal values assigned to the data values are called ranks. The Spearman's Coefficient of Correlation orîmply Rank Correlation is given by: P where, n d; : number of pairs of observations : difference between the ranks of the variables
2..
=1
n(n~
-1)
:U.,Xi-Yj
Example 2 Obtain the Spearman's rank correlation coefficient for the variables A and B in the following set of data. A B Solution Assigning ranks X and Y to the variables X and Y respectively, we get the following table. ca A 10 7 Xi Yi 4 1 6 2 5 3 5 1 6 2 4 3 7 di=X Yi
r
10 7
7 12 3 8
8 4
11 7
9 5
15 10
di 1 0 0 0 1 0
7 3 4 5
-1 0 0 0 1 0 0
12 8 8 9 15 11 6
10 7
.- 0 2
Therefore, the Spearman's coefficient of correlation between the variables A and B is given by: o , = 1 7(49-1) =1 7x48 =1 28 = 0.964
Student Activity
1. Obtain Spearman 's coefficient of correlation for the following grades given to performance and intelligence of employees.
Performance Intelligence
A 129
C 121
B+ 134
A4 110
B 100
o
K M
B102
2. Compute Karl Pearson and Spearman's coefficients of correlation for the two variables in the following data. Compare the results. Draw a scatter plot to visualize the distribution.
X Y
123 22
142 31
111 21
132 24
121 25
120 23
121 22
118 28
116 34
Regression Analysis
Future holds great fascination for mankind. Broadly, there are two methodologies to anticipate future - qualitative and quantitative. However, both start with the same premise, that an understanding of the future is predicted on an understanding of the past and present environment.
Notes
The quantitative decision-maker always considers himself or herself accountable for a forecastwithin reason. Let us look at the conceptual model first and then the mathematical model and algorithms in turn which are used for making forecast. The Conceptual Model The qualitative school has generated many philosophical, religious or political conceptual models according to which the ideology and dogma is structured and forecasts prepared. Quantitative decision making, defined here as anything that is not based on underlying belief, offers three conceptual models. They are quite quantitative to highly technical. They are guesstimate, fundamental and technical models. In the guesstimate conceptual model the forecast is based on expert opinion. It is almost like qualitative decision making except that the bias of many is pooled. This method of forecast basically revolves around DELPHI METHOD. This conceptual model for forecasts should not be used when ample databases are available. It is also known as option methodology. The Delphi Method consists of a panel of experts and a series of rounds during which forecasts are made via questionnaire. Whether expertise or ignorance is pooled in each round, the result is the same - a forecast is born. But in the absence of sufficient data, it may be preferable to develop heuristics first rather than to rely initially on guesstimates. The second conceptual model stresses the fundamentals that impinge upon the environment at any given time. In this case the forecaster tries to ascertain the functional relationships among variables defining the environment. In addition, attention is paid to changes in the magnitude of the variables that make up the environment. This conceptual model is superior because it is based on logical considerations and not merely on expert opinion. The reason why not all forecasters wholeheartedly embrace the fundamental conceptual model is that it takes a pretty good mind to understand the variables and their interrelationships that represent the environment. It takes constant study, constant learning, constant testing and then the intellectual ability to synthesize it all. To cite an example, it does not take much to c o m e \ i p with a fundamental conceptual model to forecast a nation's economic activity. We know that gross national product (GNP) is a function of consumption (C), investment (I), government spending (G) and net exports (E). In equation form it appears as: GNP=C+I+G+E Now each variable, i.e., consumption, investment, etc., can be carefully quantified, for example - "do we consume more goods or services?" "More hard or soft goods?" and so on. Beautiful econometric models have been generated on the basis of this conceptual model. Beautiful forecasts have also been presented.
Student Activity
Compare and contrast between various models of forecasting. What is the reason behind forecasters not wholeheartedly embracing the fundamental conceptual models? Construct a conceptual model to forecast the manpower requirements in your company for next 10 vears.
56
Self-instructional Material
The third conceptual model was called technical. It is used by forecasters who Correlation and Regression ] l themselves technocrats. Whenever, a pre-determined parameter that the technocrats follow reaches a certain magnitude, they forecast a change in the environment irrespective of the behavior of other variables. Sometimes, this model gives accurate results and some times not.
ca
Fitting Regression Lines

jn a bivariate data set either of the two variables can be treated as independent. If X is taken to be independent then the regression is called regression of Y on X and in the other case it is called regression of X on Y. Consequently, there can be two lines of regression. Regression Line of Y on X Let the equation of line of regression of Y on X be: y = a + bx where, a is the average value of Y when X is zero. The constant b is the rate of change in Y per unit change in X. The constant b is called regression coefficient of Y on X. One method of obtaining the values a and b is the method of least square which chooses that line the sum of squares of differences of the observed and estimated values is minimum. Let us consider the following set of n-pairs of data Xj Yi Xi Yi
x
Y
X Y
And the regression line of Y on X be: y = a + bx The observed and estimated values will then be as tabulated below. Xj Yj yi Xi Y, a+bXi
x
Y
X Y
2
a+bX
a+bX
Let us compute the errors (ej) in each of the pairs (between observed and estimated values) as tabulated below. Xi Yj yi X, Y, a+bXi Y (a+bX )
r 1
X Y
X Y
2
a+bX
2
a+bX
2 n
Y -(a+bX )
Y -(a+bX )
n
Let the sum of squares of the errors be S as shown below.
Notes
/ = I
1 = 1
For the least square fit, a and b should be chosen in such a way that S becomes minimum. Applying the condition for minima, partial derivatives of S with respect to a and b should be individually zero. These two conditions give two equations called normal equations given by:
da~ db~
In general, for many variables the normal equations are given by:
da Thus,
db
dc
as = -2(Y -a-bX )
i i
da Or,
1=1
(T,- -/>J,)=0
f l
Or,
y,-*,-&*,=<>
1=1
0)
Similarly, = 2j (Y -a-bX )(-X )

j i i i
Or,
f (X,Y -aX -bX?) = 0

j i l
Or,
i X A - i i X i - t i * } *
<=i
;=1
(2)
Equations (1) and (2) are the required normal equations from which the values of a and coefficient of regression b can be easily computed. Now, consider the equation (1) once again.
t^-na-btxÔ i=i Or, y,=
158
Dividing both the sides by n, we get,

I n
L.
Notes
Or, Y = a + bX This proves that the regression line passes through the means of the variables (X, Y). The expression for b comes out to be:
b=
Y)
....(5)
Also, we know that, Cov(XJ) ao

x Y
Or,
Cov(X, Y) = ro o
x
Putting this in (5), we get, Cov(X,Y) _ r<j cj

x y
o
Regression Line of X on Y
In a manner similar to the above we can find the regression line of X on Y. Let the equation of line of regression of Y on X be: x = c + dy where, c is the average value of X when Y is zero. The constant d is the rate of change in Y per unit change in X. The constant d is the regression coefficient of X on Y. The observed and estimated values will then be as tabulated below. Xi Yj X] Yi c+dY] X Y
2
X Y
2
c+dY
C+dY
The errors (ej) in each of the pairs (between observed and estimated values): X; Xi Yj Yi i c+dY]
x
Y c+Dy
2 2
Y C+dY
n 2 n
ei
XHc+dYj)
X -(c+dY )
X -(c+dY )
n
Let the sum of squares of the errors be S as shown below. S=e?=[X -(c
i
dY )]
i
Notes
Applying least square method we get the following normal equations:
*,-nc-</f,=e
I=i
(3)
(4)
(=1
1=1
1=1
From equations (3) and (4) values of c and coefficient of regression d can be easily computed. Proceeding in a similar way we get, Cov(X,Y) ra cj
x ]
= r-
o,
Relation between Regression and Correlation The two coefficients of regression are related to the coefficient of correlation in a following way. bd
=r^- x aX
r u
= r
Y
O"
Or, r =
4bd
Hence, coefficient of correlation is geometric mean of the two coefficients of regression. Example 3 Obtain the regression equations and find correlation coefficients between X and Y from the following data. Also find the estimated value of Y when X is 15. X Y Solution Tabulating the necessary columns in the table, we get, X Y 3 4 2 4 2 XY 36 52 20 44 18 X 144 169 100 121 81
2
12 13 10 11 9 4 2 4 2 3
12 13 10 11 9 55
Y 9 16 4 16 4
15 170
615 49
1 6 0
Therefore, n^XY-^X^Y) 5x170-55x15 : =

n c
0=
Also, a=Y-bX Or,
=: *H
X
=0.5 5x615-(55)
2
Notes
T
a =^
15 55 b^ = - 0 . 5 x = 3 - 5 . 5 = - 2 . 5 n 5 5
Therefore, the regression line of Y on X is: Y = -2.5 + 0.5X Also, n^XY-i^X^Y)

= 5=: :
5x170-55x15
=
_
=1.25
2
L
x
5x49-(15)
7
Y I 55 15 c=^-d^- = - 1 . 2 5 x = 1 1 - 3 . 7 5 = 7, 25
n n 5 5
Therefore, the regression line of X on Y is: X = 7.25 + 1.25Y And the correlation coefficients is given by: r = 4bd= V0.5 x 1.25 = V0.625 = 0.79 (approx.) Since, Y = -2.5 + 0.5X Therefore, Y(X=15) = -2.5 + (0.5)(15) = 7.5
Student Activity
1. From the following table of data relating sales and purchase figures^ obtain the two regression lines and estimate the likely sales when purchase is 50 units. Also find the coefficient of correlation between sales and purchase. Sales Purchase 2. 100 15 200 56 300 104 400 110 500 90
Find out the coefficient of correlation from the following data:
X : 300 350 400 4 5 0 500 y : 1600 1500 1400 1300 1200
550 600 650 700 1100 1000 900 800
titative Techniques
Coefficient of rank correlation and the sum of squares of differences in corresponding ranks are 0.9021 and 28 respectively. Determine the number of pairs of observations.
Notes
Summary
Correlation and regression are useful tools employed to predict the value of one variable in terms of other variables. Correlation is a measure of degree of association between two (or more) variables in a data set. Scatter diagram or scatter plot is a graphical representation of a bivariate distribution in which the two variables are plotted on the two axes of a co-ordinate graph region. The average of the products of deviations is called covariance. There are two methodologies to anticipate future - qualitative and quantitative. The Delphi Method consists of a panel of experts and a series of rounds during which forecasts are made via questionnaire. Linear analyses fit straight lines to data sets. Curvilinear or nonlinear analyses do the same with curves. Each point on the regression line with slope represents a mean. Correlation analysis provides a measure of mutuality of two variables. The degree of correlation is expressed as correlation coefficient whose value lies between 1 .
Keywords
Correlation Analysis: Correlation is a measure of degree of association between two (or more) variables in a data set. Scatter Diagram: Scatter diagram or scatter plot is a graphical representation of a bivariate distribution in which the two variables are plotted on the two axes of a co-ordinate graph region. Covariance: Covariance can be treated as a measure of correlation between two variables.
Review Questions
1. Describe the method of obtaining the Karl Pearson's formula of coefficient of linear correlation. What do positive and negative values of this coefficient indicate? Define product moment coefficient of correlation. What are the advantages of the study of correlation? Show that the coefficient of correlation, r, is independent of change of origin and scale.
2. 3.
4. 5.
Prove that the coefficient of correlation lies between -1 and +1. What is Spearman's rank correlation? What are the advantages of the coefficient of rank correlation over Karl Pearson's coefficient of correlation? Calculate Karl Pearson's coefficient of correlation between the marks obtained by 10 students in economics and statistics. Also interpret your result. Roll No. Marks in eco. Marks in stat. 1 2 3 4 5 6 7 8 9 1 0 23 27 28 29 30 31 33 35 36 39 18 22 23 24 25 26 28 29 30 32
0 t e s
6.
Further Readings
Freund, John E., and Ronald E. Walpole, Mathematical Statistics, 5th edition, Englewood Cliffs, NJ: Prentice Hall B H Erickson and T A Nosanchuk, Understanding Data, McGraw Hill Abraham, B., and J. Ledolter, Statistical Methods for Forecasting, John Wiley & Sons
SECTION-HI
Unit 9 Time Series Analysis and Index Numbers Unit 10 Probability Theory Unit 11 Theory of Estimation and T e s t of Hypothesis
Time Series Analysis
Unit 9 T i m e S e r i e s Analysis a n d Index N u m b e r s

Unit Structure
Introduction Time Series Analysis Graphical Method Method of Averages Nonlinear Analysis Measuring Periodic Variations Index Numbers Construction of Index Numbers Price Index Numbers Nature of Weights Laspeyres Index Paasche Index Fisher Index Dorbish and Bovvlev Index Marshall and Edgeworth Index Walsh Index Summary Keywords Review Questions Further Readings
and Index Numbers
Notes
Learning Objectives
After reading this unit you should be able to: Define and perform time series analysis Apply the smoothing algorithm Define, compute and interpret index numbers
Introduction
Time has strange, fascinating and little understood properties. Virtually every process on earth is determined by a time variable. Very often a large amount of data arranged according to time are available to the decision-maker. Time series analysis is a useful tool for crunching historical data collected over a period of time. It aims at characterizing the pattern of the data variation so that a plausible forecast may be made while making decisions.
Punjab Technical University 1 6 5
Time Series Analysis Time series is a s e r i e s of d a t ac o le c t e d over a period of t i m es e p a r a t e d s u c c e s s i v e time intervals. Technically, a a time series is a bivariate d a t as e t } which o n e variable i st i m ea n dh e n c ec a nb e expressed as: TS={(t, Y) I t is s u c c e s s i v e time when Y's were recorded}
t
Notes
T h e time series analysis is e m p l o y e d for two basic purposes: 1. 2. To study a n dc h a r a c t e r i z et h ep a s t behavior of data To m a k e a plausible f o r e c a s t for t h e future
For illustration let us l o o k at a f u n d a m e n t a l conceptual model of a product li cycle which g o e s through four s t a g e s - introduction, growth, maturity ar decline - as depicted in Figure 9.1. T h es a l e s performance of this product gc through the four s t a g e s i n t r o d u c t i o n , growth, maturity and decline.
Figure 9.1: Product Life-Cycle
D a t a have b e e n plotted a n dr e g r e s s i o n lines fitted to e a c h of t h ef o u r environments. Thus, w h e nas a l e sf o r e c a s t is m a d e and the t a r g e t horizon falls, within the s a m es t a g e ,t h e linear fit yields valid results. If, h o w e v e r ,t h e target horizon falls into a future s t a g e , a linear forecast m a y be e r r o n e o u s . In t h i s c a s e a curve should be fitted as s h o w n . It is usually lightly s p e c u l a t i v e to s e l e c t a forecasting horizon that s p a n sm o r e than two s t a g e s .
1
Another point of interest is t h eb e h a v i o r of the s a l e s variable o v e rt h es h o r t run. It fluctuates b e t w e e nas u c c e s s i o n of p e a k s and troughs. H o w do t h e s e c o m ea b o u t ?I n order t oa n s w e r this question, ^ t h et i m e series, m u s tb e decomposed. T h e nt h r e ei n d e p e n d e n tc o m p o n e n t s for this behavior b e c o m e visible. They are: 1. Secular trend 2. 3. Periodic variations comprising of s e a s o n a l and cyclical variations Random variations
Graphical Method O n eo ft h ee a s i e s t albeit c r u d em e t h o d so f obtaining s e c u l a r trend i st h eg r a p h i c a l method. It u s e sag r a p hp a p e r for t h e analysis. T h es t e p s involved a r e :
1. 2. 3. 4.
Represent the time variable on X-axis of the graph paper using a proper unit. Represent the other variable data on the Y-axis using a proper unit. Plot the graph. Draw a free-hand curve through the points thus plotted. This curve gives the secular trend.
Time Series Analysis and Index Num bers
Notes
Example 1 Obtain a graphical secular trend for the following time series data. Year Sales (Million Rupees) Solution
600" 550(Mi llio
1990 300
1992 400
1994 1996 350 500
1998 2000 600 500
500 450 400350300-,
01
oo
ale
~i
1990
1 iiri
1992 1994 1996 1998 2 0 0 0 Year
The dotted line shows the trend line. The same line can be extended beyond the year 2000 to forecast the sales figures. Note, however, that since this method is subjective, one may arrive at a different trend for the same time series and hence predictions based on this method may not agree with each other always.
Method of Averages
Method of average uses a-few selected points to obtain the secular trend line. The points may be selected on different bases. Accordingly, we have different forms of method of averages. . Method of Subjective Selection Two points are selected which the analysts deems important. A straight line is men drawn passing through these two points to obtain the secular trend line. Example 2 Obtain a secular trend for the following time series data using subjective selection. Year Sales (Million Rupees) 1990 300 1992 1994 1996 400 350 500 1998 2000 500 600
Puniah Technical University
1 6 9
Solution Assuming that the points (1994, 350) and (2000, 600) are the two mc important points in the data, we get the following trend line shown as dotte
N O T E S
B C
C /5
I 1 9 9 0 1 9 9 2 1 9 9 4
1 1 9 9 6
1 1 9 9 8
R 2 0 0 0
Y E A R
Needless to say, this method being subjective does not yield justified prediction for the future. Method of Semi-Averages As the name suggests, the method relies on two points obtained by computing averages of the time series parted into two equal halves. The method follows the steps given below: 1. 2. 3. 4. Divide the time interval into two equal parts. If the number of observations is odd then the middle value is dropped for the purpose. Compute averages o f the two parts (semi-averages) - A i and A 2 .
Take the middle points of the two parts on the time axis - Ti and T2. Draw the line passing through (A], T1) and ( required semi-average trend line. A 2 ,T 2 ) . This i s the
Example 3 Obtain a secular trend for the following time series^ data using method of semi-average. Year Sales (Million Rupees) Solution Since the number of observations is even, we have two equal parts on the time-axis - (1990-1994) and (1996-2000). The semi-averages are computed as given below: 1. Semi-average i n (1990-1994) =
3 Q 0+ 4 ( X ) +3 5 0 = 3 5 0
1990 300
1992 400
1994 350
1996 500
1998 500
2000 600
Semi-average i n (1996-2000) =
0 + 5
0 + 6
0 = 5 3 3
.33
Time Series Analysis and Index Numbers
Also, the midpoints of the two parts on the time-axis are:
1 9 9 0 + 1994
1. 2. Midpoint of (1990-1994) = Midpoint o f (1996-2000) =
1 9 9 6 + 2 0 0 0
Notes
= 1992 = 1
9 9 8
Thus the trend line should pass through (1992, 350) and (1998, 533.33) as shown below.
it 600550 " lies (Mill ion 500450400350300"
1I
1990 1992
1
1994
1
1996 Year
1I
1998 2000
This method defines a definite line for the secular trend and hence is free from subjectivity. Method of Moving-Average In this method a sub-period is selected As the name suggests, the method relies on two points obtained by computing averages of the Method of Least-Square This method employs fitting regression curves by the method of le&st squares. Method of least square method has already been discussed in the previous unit. Here, we will see the application of the same. Linear Analysis In order to illustrate the procedure, let us use a data as plotted below (Figure 9.2). It involves the dividend payments per share of the Smart, a wellknown discount store chain, for the years 1990 through 1999. Suppose that a potential investor would like to know the dividend payment for 2001.
I
1990
1
1
1
2
1
3
1
4
hI
5 6
1
7
1
8
1
9
1 ;
2000
>
Figure 9.2: Plot of Dividend Values
Think for a moment about the qualitative nature of the time variable. It is expressed in years in this case but could be quarters, months, days, hours, minutes or any other time measurement unit. How does it differ from advertising expenditures, the independent variable that we examine in the preceding section? Is there a difference in the effect that a unit of each has on the dependent variable, or, Rs. 1 million in one case and 1 year in the other? Time, as you can readily see is constant. One year has the same effects as any other. This is not true for advertising expenditures, especially when you leave the linear environment and enter the nonlinear environments as shown in Figure 9.6. Then there may be qualitative difference in the sales impact as advertising expenditures are increased or decreased by unit. Since time is constant in its effect, we may code the variable rather than to use the actual years or other time units x values. This code assigns a 1 to the first time period in the series and continues in unit distances to the nth period. Do not start with a zero as this may cause some computer programs to reject the input. The code is based on the fact that the unit periods are constant, and therefore their sum may be set equal to zero. See what effect this has on the normal equations for the straight line.
Zy = na + b Zx Zxy = x + bXx
2
If Z x = 0, the equations reduce to
l y = w?
2Zxy = bIx
2
which allow the direct solution for a and b as follows

a
This form simplifies the calculations substantially compared to the previous formulas. The code, however, that allows to set Z x = 0 must incorporate the integrity of a unit distance series. Thus if the series is odd-numbered, the midpoint is set equal to zero and the code completed by negative and positive unit distances of x=l where each x unit stands for one year or other time period. If the series is even-numbered, let us say it ran from 1990 to 1999, the two midpoints (1994/1995) are set equal to-1 and +1, respectively. Since there is now a distance of x = 2 between + 1 (-1, 0, +1), the code continues by negative and positive units distance of x = 2 where each x unit stands for onehalf year or other time period. The worksheet is in Figure 9.3 and calculations are as follows.
Code for an Even Series X -9 -7 -5 -3 -1 1 3 5 7 9 0 1991 1992 1993 1994 1995 1996 1997 1998 1999 -4 -3 -2 -1 0 1 2 3 4 0 2.2 2.4 3.0 5.0 6.8 8.1 9.0 9.5 9.9 55.9 -8.8 -7.2 -6.0 -5.0 0 8.1 18.0 28.5 39.6 67.2 16 9 4 1 0 1 4 9 16 60 Code for an Odd Series X Dividend payments in Rs Y
Notes
YEAR 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 Total
YEAR
XY
Figure 9.3: Worksheet 5.59 a= = 6.211 Then b= and

9
67.2
60
=1.12
Y = 6.211 +1.12 x
c
origin 1995 x in 1 year units
The regression equation is plotted in Figure 9.2. Note that in the case of time series analysis, the origin of the code and the x units must be defined as part of the regression equation. In our problem the investor would like to obtain a dividend forecast for 2001. Since the origin is 1995 (x = 0) and x = 1 year units, the code for 2001 is x = 6. Therefore the forecast is y = 6.211+1.12 (6) = Rs 12.9. If the time series had been even numbered, let us say that dividend payments for 1990 had been included in the forecasting study, the definition under the regression equation would have read
c
origin 1994/95
x in 6 month units. Thus, we know that for 1995, x=l; and since we must use x=2 units for ea year, the code value for 2001 would be x=13. Once the y value has b e * obtained, b is tested for significance and the 95% confidence intenr constructed as previously shown.
c
Notes
To stay with the investment environment of this chapter section, let calculate a seasonal index for shares traded on the Stock Exchange from July through July 7, 1999. This period includes the July 4 week-end. Volume o shares (DATA) for each trending day (SEASON) is given in thousands o* shares per hour. The Individual steps of the analysis (OPERATIONS) ar discussed in detail for each column of the worksheet below.
Column (2) Hour 10-11 11-12 12-13 13-14 14-15 15-16 Average Total Variation (IS) 0.965 0.245 -0.885 -1.555 0.395 0.835 (3) Trend Variation 0) 0 0.159 0.318 0.477 0.626 0.795 (4) Seasonal Variation TS-T 0.965 0.086 -1.117 -2.032 -0.241 0.040 -0.383 (5) Seasonal Index 110.6 103.7 94.2 87.1 101.1 103.3 600
(6)
(7)
(8)
(9)
(10)
Trading Volume ('000) 7/2 12.00 10.40 10.55 9.55 11.02 11.58 10.85 7/3 12.25 11.75 10.06 9.46 11.55 12.25 11.22 7/6 15.44 15.04 12.95 12.05 14.82 15.38 14.28 7/7 16.72 16.32 15.44 15.24 16.73 16.69 16.19 Avg. for four days 14.10 13.38 12.25 11.58 13.53 13.97 13.135
Figure 9.4: Worksheet
As you inspect the data columns, you notice the V-shaped season for each trading day. You also notice in the total daily volume that there is a increase in shares traded. Hence, you can expect a positive slope of the regression line. The hourly mean number of shares is indicated also. This is the more important value because we are interested in quantifying a season by the hour for each trading day. Now turn to the operations. In last column the hourly trading activity for the four days has been summed. In this total all time series factors are assumed to be incorporated. You will recall that the positive or negative cyclical and irregular component effect is assumed to cancel out over time. Hence averaging the trading volume over a long term data set eliminates both components, yielding TS=T+S. You">nay ask, are four days a sufficiently long time span? The answer is NO. In a real study you would probably use 15 to 25 yearly averages for each trading hour. In an on-the-job application of this tool, you will have to know the specific time horizon in order to effectively eliminate cyclical and irregular variations. But by and large, what is a long or short time span depend upon situation. In order to isolate the trend component (T) so that it may be subtracted from column (2) in the Table in Figure 9.4, yielding seasonal variation, the slope (b) of the regression line must be calculated. (Remember: b is T.) The necessary calculations are performed below using the mean hourly trading volume for each day. But since we are interested in an index by the hour, the calculated
daily b value must be apportioned to each hour. This is accomplished by a further division by sixthe number of trading hours. The result is entered in column (3). Note that the origin of a time series is always zero. The origin of the time series is always the first period of the season. In our case this is the 10-11 trading hour. Therefore the first entry in column (3) is always zero to be followed by the equal (since this is a linear analysis) summed increment of the apportioned b-value.
Average Hourly Trading Volume Per Day Y 7/1 7/5 7/6 7/7 Total -3 -1 1 3 10.85 11.22 14.28 16.19 52.54 -32.55 -11.22 14.28 48.57 19.08 9 1 1 9 20
Notes
Day
X
Code
*i
xy
Figure 9.5 : Worksheet for Trend Calculation
Ixi/
Ix
19.08
20
= 0.954 and the apportioned b-value is

0.954 = 0.159
It is not necessary to calculate the y-intercept (a) in this analysis unless of course, you wish to combine it with a long-term forecast of daily trading volume. Then, just to review the calculations, you would find:
a =
n
52.54
= 13.135 and y = 13.135 + 0.954 x

c
origin 7/5 and 7/6 x in half trading day units. In column (4) TS - T = S is performed. Column (4) is already a measure of seasonal variation. But in order to standardize the answer so that it may be compared with other stock exchange, for example, it is customary to convert the values in column (4) to a seasonal index. Every index has a base of 100 and the values above or below the base indicate percentages of above or below
Notes
"normal" activity, hence the season. Since the base of column (5) is 100, mean of the column should be 100 and the total 600 since there are 6 tradini hours, In order to convert the obtained values of column (4) to index numbers! each of its entries is added to the total mean and then is divided by column mean added to total mean and multiplied by 100 yielding thjl corresponding entry in column (5). It is customary to show index number with one significant digit. Column (6) shows the seasonal effect of this decision variableshare trading! on the Stock Exchange. Regardless of heavy or light daily volume, the first' hour volume is the heaviest by far. It is 7.4% above what may be considered! average trading volume for any given day. Keep in mind that a very limite data set was used in this analysis and while the season, reaching its low poinl between 1 and 2 p.m., is generally correctly depicted, individual indexi members may be exaggerated. What managerial action programs wouldresult from an analyses such as this? Would traders go out for tea and samosas between 10-11? How about lunch between 1-2? When would brokers call clients with hot or luke-warm tips? Assuming that a decrease in volume means a decrease in prices in general during the trading day, when would a sawy trader buy? When would he sell? Think of some other intervening^ variables and you have yourself a nice little bull session in one of Dalai ^ Street's watering holes. If, in addition, you make money for yourself or firm, then, you have got it.
Student Activity
Fit a least square straight line to the following data set.
X
7 3 9 1 6 4 5 2
Y
4 3 5 3 4 4 4 3 0 5 0 0 5 0 2 3
Nonlinear Analysis
Any number of different curves may be fitted to a data set. The most widely used program in computer libraries, known as CURFIT, offers a minimum of 5 curves plus the straight line. The curves may differ from program to program. So, which ones are the "best" ones? There is no answer. Every forecaster has to decide individually about his pet forecasting tools. We will discuss and apply three curves in this section. They appear to be promising decision tools especially in problem situations that in some way incorporate the life cycle concept and the range of such problems is vast, indeed. If you take a look again at Figure 9., you see that three curves have been plotted. As we know from many empirical studies, achievement is usually normally distributed. Growth, on the other hand, seem to be exponentially distributed. The same holds true for decline. As thel^fe cycle moves from growth to maturitv, a parabolic trend may often be used as the forecasting tool. These aie two of the curves that will be considered. The third one is related to the exponential curve. As you look at the growth stage and mentally extrapolate the trend, your eyes will run off the page. Now, we knowagain from all sorts of empirical evidencethat trees don't grow into the high heavens. Even the most spectacular growth must come to an end. Therefore, when using the exponential forecast, care must be taken that the eventual ceiling or floor (in the case of a decline) are not overlooked. The modified exponential trend has the ceiling or floor build in. It is the third curve to be discussed.
The Parabola Fit
jhe p a r a b o l a is defined by
y = a + bx + cx
c 2
W h e r e a, b and c are constants a and b have b e e n dealt, c c a n be treated as a c c e l e r a t i o n .T h e normal equations a r e (method o fl e a s t square).
ly Zxy lx y
2
= na + blx +cSx
2
=alx + blx + cZx

2 3
= alx + blx + cZx
S e t t i n g x = 0 as previously explained, Ex will a l s o be zero.

ly = Mfl + c l x
2 2 2
I x y = aZx + c l x
T h e r ea r e direct formulas for a and c as well, but b e c a u s e of t h e possible c o m p o u n d i n g of arithmetic error in manual calculations, it is s a f e r to solve a a n d c algebraically in this c a s e . To illustrate t h ep a r a b o l i c trend let us forecast e a r n i n g sp e rs h a r e in dollars for S t o r a g eT e c h n o l o g y Corporation for the y e a r s2 0 0 0a n d2 0 0 1 . S t o r a g e T e c h n o l o g ym a n u f a c t u r e s computer data s t o r a g ee q u i p m e n t , printers, DVDR O M Sa n dt e l e c o m m u n i c a t i o n products. T h ec o m p a n yw a s founded i n1 9 6 9 a n da f t e r going through a period of explosive growth s e e m s to be moving into t h e maturity s t a g e . Data, c o d e and calculations a r es h o w nb e l o wi nt h e usual w o r k s h e e t format.
Yeor Code X 1993 1994 1995 1996 1997 1998 1999 Total | -3 -2 -1 0 1 2 3 0 Earnings Per Share y 0.39 0.54 1.13 1.58 1.72 2.50 1.84 9.70 xy -1.17 -1 0 8 -1.13 0 1.72 5.00 5.52 8.86 x
2
>
9 4 1 0 1 4 9 28
xy 3.51 2.16 1.13 0 1.72 10.00 16.56 35.08
x" 81 16 1 >1 1 16 81 196
8.86
T h e n
b=
28
= 03164
and solving simultaneously 9.70 = 7a + 28c x 4
Notes
=> 38.8 = 28a + 112c 35.08= 28a + 196c 3.72 = 84c c a Therefore y = 1.2085 + 0.3164 x- 0.0443x
c 2
= 0.0443 = 1.2085
origin 1996 x in 1 year units and specifically, y ooo = 1.2085 + 0.3164(5) - 0.0443(5)
2 2
= Rs 1.68, y
2 m
= 1-2085 + 0.3164(6) -0.0443(6) = Rs 1.51
The Exponential Fit This illustrative forecasting study is performed for the Acme Company Ltd that manufactures toy rubber ducks to be used in bathtubs. Over the past few quarters, the company has become a major defence contractor. The Navy is buying an ever increasing number of the ducks as part of its rearmament program. Shipment figures are kept secret to confuse the enemyand the media. Therefore, the data in the accompanying table are hypothetical. We may fit an exponential trend which takes the form y = ab
c x
As previously mentioned, exponential trends are difficult to plot, because you run very quickly off the top of the page. However, when using semi-log paper (the y-axis is in logarithmic scale), the trend appears as a straight line. This phenomenon may be used to good advantage when calculating a and b. Thus using the logarithmic form of the exponential trend log y = log a + x log b
c
The straight line equations may be used, or,

loga =
and logb =
Zx
when = 0. The data set and calculations appear in the worksheet below. Logarithms are obtained from a pocket calculator or any standard table.
Notes
Quarter Since Initial Navy Contract 1 2 3 4 5 6 Total 0
Code
Shipments in Thousands of Units

f
X -5 -3 -1 1 3 5
2 4 9 20 55 110
ipg y 0.30103 0.60206 0.954243 1 30103 1.740363 2.041393 7.241149
xlogy. -1.50515 -1.80618 -0.954243 1.30103 5.221089 10.206965 12.463511
x 9
25 1 1 9 25 70
loga = Then
7.241149 6 1.2069
and
12.463511 logb = 70 = 0.1781
when expressed in logarithmic form. The regression equation is log y = 1.2069 + 0.1781 x
c
origin third and fourth quarters x in half quarter units. The equation may be used in this form for forecasting purposes. Suppose that Acme would like to have a forecast for the next two quarters. The forecasts are log y = 1.2069 + 0.17S1 (7)
7
= 2.4536 and finding the anti-log v? = 284 thousand more rubber ducks and log y = 1.2069 + 0.1781 (9)
8
^>
= 2.8098 and finding the anti-log ys = 645 thousand still more rubber ducks If you transform the logarithmic form of the regression equation, there is something interesting to be seen if you remember the compound interest formula. It works like this log y = 1.2069 + 0.17Slx and finding the anti-log
c
y = (26.2) (2.52)*
c
which m a yb e rewritten
Notes
y = (26.2) (2+0.52)*,
c
a n d you r e c o g n i z e that it t a k e s the form of t h ec o m p o u n d interest formula w h e r et h e rate is 0.51 or 5 1 % . This is Acme's a v e r a g e quarterly i n c r e a s e in its d e f e n c eb u s i n e s s .
The Modified Exponential Fit
In a n yc a s eo t h e rt h a n military procurement, e x c e p t in t h o s e countries that h a v e bled t h e m s e l v e s dry b e c a u s e of it and n o wh a v e neither t h em o n e y for military e x t r a v a g a n z an o r civilian necessities/amenities, t r e e s don't grow into t h e high h e a v e n s . Given this profound observation, t h e r em u s t be a decision tool that p l a c e sac a p or ceiling on overly e x h o r b e t a n t growth f o r e c a s t s .B u t t h e ym a yw a n tt oc o n s i d e r the other option: e x p o n e n t i a le c o n o m i c declines d o not a l w a y s result in a merciful state of down a n do u t but gradually a p p r o a c h a floor which m a y be called subsistence, m a k i n g do, s q u e e z i n g by or o t h e r n i c ea n d flowery allegories. At any rate t h ea s y m p t o t e of t h e modified expotential c u r v e
y = k + ah*,
c
w h e r e k is t h ea s y m p t o t e , provides us with s u c h a tool. T h e r ea r e four c a s e s as s h o w n in Figure 9 . 6 . A least s q u a r e s fit is not efficient in this c a s e .R a t h e ra solution m e t h o d is d i s c u s s e d that is b a s e d on t h et h e o r e mt h a tt h e ratio of s u c c e s s i v e first d i f f e r e n c e sb e t w e e n points on t h ee x p o n e n t i a l curve is c o n s t a n t and equal to t h e slope b.
Case (2)
Cose (4)
Case (1)
Case (3)
Figure 9.6
T h e decision tool is k n o w n as the method of s e m i a v e r a g e s . It is b a s e d on t h e calculation of t h r e es u m s of s u c c e s s i v e points of t h et i m es e r i e s . Therein lies t h e limitation o f this technique, b e c a u s et h en u m b e ro fd a t ap o i n t sm u s tb e divisible by three. T h u s a minimum of six points is n e c e s s a r ya n d if t h e time s e r i e sc o n s i s t so fn = 2 0d a t a points, the two earliest o n e (to p r e s e r v e the m o s t relevant environment) m u s t be eliminated. T h e formulas for a, b and k a r e
178 Self-Instructions I M/it*ri/if
g e n e r a t e d as follows from s i xg e n e r a ly v a l u e s starting with t h e origin of t h e s e r i e s .

Vo
=k + a
Notes
y, =k + ab
y = k + ab
2
then Ej = 2k + a(b+l) Z I and

3 2
=2k + ab (b+l) = 2k + ab (b+l)

4
^2 _*?ll**2
2k = z , V
1
b -l b -1
a = (Z -I )
2 1
or, in the general case involving a time series of n data points and n is divisible by three
nk = Z,
2
b -l
1 n
b-1
(b -l)(b -1)
Suppose a set consists of the following data points.
Y e a r
1995 1996 1997 1998 1999 2000
C o d e
0 1 2 3 4 5
S a e ls U n t i s
100 160 200 230 245 250 } Z, =
n J
260
430
2, = 495
1 7 9
Then
495 - 430 b=
2
Notes b
430 - 260
=0.62 430 - 260
2k = 260=
0.38 -1
260-(-274.19)
=534.19 k a =276.10
0.62-1
=(430-260)(6.38-1X0.38-1) =(170)(-0.99) Student Activity
Fit an exponential curve to the following set of data. X Y 89 10 20 120 30 150 200 40 50 240 60 310 70 400 80 480
= -168.3
and y = 267.10 + (-168.3) (0.62?

c
which makes it a Case 4 curve with a sales ceiling of 267.10. The forecast is made in the usual manner. For 2001 it is y=
2m
267.10 +(-168.3) = 267.10+(-9.56)
(0.62)
= 257.54 units.
Measuring Periodic Variations

Periodic variations occur during smaller period of time intervals. Seasonal variations and cyclic variations are therefore, local in nature and unlike secular trend repeat many times during the entire period of observation. Analysis of periodic variations involves primarilv two stages: 1. 2. Characterizing the seasonal or cyclic pattern Deseasonilising the data
Characterizing the seasonal or cyclic pattern Method of simple averages This method is applicable to the time-series data having only seasonal and random variations. Moreover, the data must be collected for smaller periods (seasons). Averages are computed so that the effect of randomness is eliminated. Therefore, only seasonal variations are left in the data. Seasonal variation is expressed in terms of seasonal index. Following are the steps involved in method of simple average:
1. 2. 3. 4.
Averages are computed for each period of the year. A grand seasonal average is obtained. Seasonal index for a particular month is then obtained. The values are then adjusted accordingly.
Notes
The method is described in the following example. Example 4 Compute the seasonal index for the following time series data. Assume that the cyclic components of variation are absent.
Year 1990 1991 1992 Jan 15 17 13 Feb 17 16 15 Mar 18 15 14 Apr 14 15 16 Jun 14 16 14 Jul 18 18 16 Aug 15 17 16 Sep 14 13 16 Oct 15 17 14 Nov 18 17 16 Dec 14 15 13
Solution Compute the averages as shown below.
Year 1990 1991 1992
Jan 15 17 13
Feb 17 16 15
Mar 18 15 14
Apr 14 15 16
Jun 14 16 14
Jul 18 18 16
Aug 15 17 16
Sep 14 13 16
Oct 15 17 14
Nov 18 17 16
Dec 14 15 13
Total 45 Aj S.I. 15.0 113
38 12.6 95
47 15.6 118
45 15.0 113
44 14.3 108
52 17.3 131
48 16.0 121
43 14.3 108
46 15.3 116
51 17.0 129
42 14.0 106
S.KA) 107.7 90.6
1 1 2 . 5 1 0 7 . 7 103.0 124.9 115.5
103.0
110.6
123.0
101.1
Grand average G =
E4
12
=166.4/12=13.2
A, Seasonal Index for A;= x 100
G
The sum of indices, i.e., ^SJ. =1200. In our case, since J^SJ. =1258?= 1200, it is required to adjust the values so that the sum becomes 1200. This cart be
achieved by multiplying each of the indices by " . The adjusted indices a 1258 shown in the last row of the calculation table as S.I.(A).
Index Numbers
An index number is a statistical measure used to compare the average level o magnitude of a group of distinct but related variables in two or mo situations. Uses of Index Numbers The main uses of index numbers are: 1. To measure and compare changes: The basic purpose of th construction of an index number is to measure the level of activity of phenomena like price level, cost of living, level of agricultural production, level of business activity, etc. It is because of this reason that sometimes index numbers are termed as barometers of economic activity. It may be mentioned here that a barometer is an instrument which is used to measure atmospheric pressure in physics. The level of an activity can be expressed in terms of index numbers at different points of time or for different places at a particular point of time. These index numbers can be easily compared to determine the trend of the level of an activity over a period of time or with reference to different places. 2. To help in providing guidelines for framing suitable policies: Index numbers are indispensable tools for the management of any government or non-government organization. For example, the increase in cost of living index is helpful in deciding the amount of additional dearness allowance that should be paid to the workers to compensate them for the rise in prices. In addition to this, index numbers can be used in planning and formulation of various government and business policies. Price index numbers are used in deflating: This is a very important use of price index numbers. These index numbers can be used to adjust monetary figures of various periods for changes in prices. For example, the figure of national income of a country is computed on the basis of the prices of the year in question. Such figures, for various years often known as national income at current prices, do not reveal the real change in the level of production of goods and services. In order to know the real change in national income, these figures must be adjusted for price changes in various years. Such adjustments are possible only by the use of price index numbers and the process of adjustment, in a situation of rising prices, is known as deflating.
3.
4.
To measure purchasing power of money: We know that there is inverse relation between the purchasing power of money and the general price level measured in terms of a price index number. Thus, reciprocal of the relevant price index can be taken as a measure of the purchasing power of money.
Construction of Index Numbers

To illustrate the construction of an index number, we reconsider various items of food mentioned earlier. Let the prices of different items in the two years, 1990 and 1992, be as given below:
Hem 1. Wheat 2. Rice 3. Milk 4. Eggs 5. Ghee 6. Sugar 7. Pulses
Price in 1990 (in Rs/unit) 300 / quintal Ml kg. 71 litre 11/'dozen 80 /kg. 9/kg. U/kg.
Price in 1992 (in Rs/unit) 360 / quintal 15 /kg. 8 /litre 12 / dozen 88 /kg. 10/kg. 16 /kg.
The comparison of price of an item, say wheat, in 1992 with its price in 1990 can be done in two ways, explained below: (a) By taking the difference of prices in the two years, i.e., 360 - 300 = 60, one can say that the price of wheat has gone up by Rs 60/quintal in 1992 as compared with its price in 1990. 360 (b) By taking the ratio of the two prices, i.e., = 120, one can say that if the price of wheat in 1990 is taken to be 1, then it has become 1.20 in 1992. A more convenient way of comparing the two prices is to express 360 the price ratio in terms of percentage, i.e., 100 = 120, known as Price Relative of the item. In our example, price relative of wheat is 120 which can be interpreted as the price of wheat in 1992 when its price in 1990 is taken as 100. Further, the figure 120 indicates that price of wheat has gone up by 120 - 100 = 20% in 1992 as compared with its price in 1990. The first way of expressing the price change is inconvenient because the change in price depends upon the units in which it is quoted. This problem is taken care of in the second method, where price change is expressed in terms of percentage. An additional advantage of this method is that various price changes, expressed in percentage, are comparable. Further, it is very easy to grasp the 20% increase in price rather than the increase expressed as Rs 60/quintal. For the construction of index number, we have to obtain the average price change for the group in 1992, usually termed as the Current Year, as compared with the price of 1990, usually called the Base Year. This comparison can be done in two ways:
(i)
By taking suitable average of price relatives of different items. The! methods of index number construction based on this procedure are; termed as Average of Price Relative Methods. By taking ratio of the averages of the prices of different items in each year. These methods are popularly known as Aggregative Methods.
Notes
(ii)
Since the average in each of the above methods can be simple or weighted, these can further be divided as simple or weighted. Various methods of index jj number construction can be classified as shown below:
Index Number Construction
I
Average of Price Relatives Methods Aggregate Methods
I
Simple Average of Price Relatives Methods
,
Weighted Average of Price Relatives Methods
I
Simple Aggregate Methods
I
Weighted Aggregate Methods
In addition to this, a particular method would depend upon the type of average used. Although, geometric mean is more suitable for averaging ratios, arithmetic mean is often preferred because of its simplicity with regard to computations and interpretation. Notations and Terminology Before writing various formulae of index numbers, it is necessary to introduce certain notations and terminology for convenience. Base Year The year from which comparisons are made is called the base year. It is commonly denoted by writing '0' as a subscript of the variable. Current Year The year under consideration for which the comparisons are to be computed is called the current year. It is commonly denoted by writing T as a subscript of the variable. Let there be n items in a group which are numbered from 1 to n. Let p denote the price of the zth item in base year and p,. denote its price in current year, where i = 1, 2, n. In a similar way q and q will denote the quantities of
0 i a u
the ith item in base and current years respectively. Using these notations, we can write an expression for price relative of the fth item as
Po,
nd quantity relative of the ith item as

Q ( =
ilL ioo.
x
Notes
Further, P will be used to denote the price index number of period T as compared with the prices of period '0'. Similarly, Q and V would denote the quantity and the value index numbers respectively of period T as compared with period '0'.
0] 0) 01
Price Index Numbers

1. Simple Average off Price Relatives (a) Using arithmetic mean of price relatives the index number formula is given by
I^-xlOO
- ^ P or 'n, p -_ Po,r^n, f n n
P
m
Omitting the subscript i, the above formula can also be written as
X-PjLxiOO 01
(b) Using geometric mean of price relatives the index number formula is given by
( n \
l l o g p ,
= Antilog
V
'=1
(n is used to denote the product of terms.)
Example 5 Given below are the prices of 5 items in 1985 and 1990. Compute-the simple price index number of 1990 taking 1985 as base year. Use (a) arithmetic mean and (b) geometric mean.
Price i n 1 9 8 5 H e m ( R s / u n i t )
P r i c e
i n 1 9 9 0
( R s / u n i t )
1 2 3 4 5
15 8 200 60 100
20 7 300 110 130
Punjab Technical University 18!
Solution
Calculation Table
Notes
item
1 2 3 4 5
Price in 1985 (P )
oi
Price Relative Price in 1990 (P ) Poi

Qi
log
Pi
15 8 200 60 100
20 7 300 110 130
Total '. Index number, using A.M., is P
133.33 87.50 150.00 183.33 130.00 684.16
2.1249 1.9420 2.1761 2.2632 2.1139 10.6201
01
= ^84.16 _ ^5 g3 [10.6201]
and Index number, using G.M., is P
01
= Antilog
2. Weighted Average of Price Relatives In the method of simple average of price relatives, all the items are assumed to be of equal importance in the group. However, in most of the real life situations; different items of a group have different degree of importance. In order to take this into account, weighing of different items, in proportion to their degree of importance, becomes necessary. Let w be the weight assigned to the i th item (z = 1, 2, n). Thus, the index number, given by the weighted arithmetic mean of price relatives, is
t
2>, Similarly, the index number, given by the weighted geometric mean of price relatives can be written as follows: P = Antilog
m
Example 6 Construct an index number for 1989 taking 1981 as base for the following data, by using ^ (a) (b) weighted arithmetic mean of price relatives and weighted geometric mean of price relatives. Prices in Prices in Commodities Weights 1981 1989 A 60 1 0 0 3 0 B 20 20 20 C 40 60 2 4 D 100 1 2 0 3 0 E 120 80 1 0
186
Solution
Calculation Table P .R .(P) Comm odities A B C D _ J . Total Prices in 1981(p )
0
Prices in 1989 ( j )
P
Pi
- - J - x 100
p
Wts Pw (w) 30 20 24 30 10 114 5000.1 2000.0 3600.0 3600.0 666.7 14866.8 logP w log P
Notes
0
2.2219 2.0000 2.1761 2.0792 1.8739 66.657
60 20 40 100
1?.0
100 20 60 120 80
166.67 100.00 150.00 120.00 6667
40 .000
52.226 62.376 18.739 239.48
Index number using A.M. i s
P, == 1 114
0
and index number using G.M. is P = Antilog

m
[239.498] 114
= 126.15
Example 7 Taking 1983 as base year, calculate an index number of prices for 1990, for the following data given in appropriate units, using (a) weighted arithmetic mean of price relatives by taking weights as the values of current year quantities at base year prices and weighted geometric mean of price relatives by taking weights as the values of base year quantities at base year prices.
Commodity Price A B C D E F 82 80 105 102 102 190 1983 Quantity 63 75 92 25 63 61 Price 160 182 185 177 175 140 1990 Quantity 56 53 64 13 54 60
(b)
Solution Let p p
v ]
denote the prices and q q denote the quantities of various

v }
commodities in 1983 and 1990 respectively.

Co Iculation table Commodity A B C D E r P.R.(P) Pot: Po% 4592 5 1 6 6 195.12 4240 6000 227.50 6720 9 6 6 0 176.19 1326 2 5 5 0 173.53 5508 6 4 2 6 171.57 11400 11590 73.68 33786 41392 P ' 895991 964600 1183997 230101 945008 839952 5059649 l o g P Polo * 8 11832 2.2903 14142 2.3570 21696 2.2460 2.2394 5710 14358 2.2344 21642 1.8673 89380
l o P
A p p r o x i m a t e d t o the nearest w h o l e n u m b e r
- P (using A.M.)
01
=149-76 89380 41392
Notes
and P (using G.M.) = Antilog

m
= Antilog [2.1594] = 144.33
3 . Simple Aggregative Method In this method, the simple arithmetic mean of the prices of all the items of group for the current as well as for the base year are computed separately! The ratio of current year average to base year average multiplied by 100 give the required index number. Using notations, the arithmetic mean of prices of n items in current year f Z given by and the arithmetic mean of prices in base year is given b n
Eft, Simple aggregative price index P 01

{
n
Zp
x 1
x 1 0 0
Omitting the subscript i, the above index number can also be written as
0 ] =
^l
x l 0 0
Example 4 The following table gives the prices of six items in the years 1980 and 1981. Use simple aggregative method to find index of 1981 with 1980 as base. Item A B C D E F Solution Let p be the price in 1980 and i>, be the price in 1981. Thus, we have
0
Price in Price in 1980 (Rs) 1981 (Rs)

40 60 20 50 80 100 50 60 30 70 90 100
Sp = 350 and Zp, = 400

0
.-.
p
0 1
= x 150 = 114.29 350
4 . Weighted Aggregative Method This index number is defined as the ratio of the weighted arithmetic means of current to base year prices multiplied by 100.
is
Using the notations, defined earlier, the weighted arithmetic mean of current year prices can be written as = ~
Notes
Similarly, the weighted arithmetic mean of base year prices
;. Price Index Number, P =

01
xlOO^li^-xlOO
T.W:
Omitting the subscript, we can also write - x 100 lp w

0
Nature of Weights
While taking weighted average of price relatives, the values are often taken as weights. These weights can be the values of base year quantities valued at base year prices, i.e., p q , or the values of current year quantities valued at current year prices, i.e., p q , or the values of current year quantities valued at base year prices, i.e., p ^ \ etc., or any other value.
u 0i h h w
Depending upon the choice of weights, different formulae have been derived. We will discuss some of them here.
Laspeyres Index
This price index number uses base year quantities for the weights. Thus, the corresponding formula takes the following form:
p L a
_
'01
xlOO Za>,<7O,
Or
pLa _ Zffgo
MIL ~
xlOO
Paasche Index
This price index number uses current year quantities for the weights. Thus, the corresponding formula takes the following form:
p P a _
ZP I ,9 I,
xlOO
Z m ,
Or
Z/WI
Fisher Index
This price index number uses geometric mean of Laspeyres and Paasche indices to derive the following formula:
p F
_ /
Notes
p L a
0 1
Dorbish and Bowley Index

This price index number uses arithmetic mean of Laspeyres and Paasche indices to derive the following formula:
n D B
P"
_L
P"
'01
Marshall and Edgeworth Index
This price index number uses arithmetic mean of base and current year quantities for weights to derive the following formula:
2>,
r
01 ~ 2>
xlOO
Walsh Index
This price index number uses geometric mean of base and current year quantities for weights to derive the following formula:
01
/r
100
Example 9 For the following data compute various index numbers.

Product A B C D Po 10 8 6 4 qo 30 15 20 10 Pi 12 10 6 6 qi 50 25 30 20
Solution
Calculation Table Product Po 9o Pi A B 9i Poqo piqo Poqi Piqi qoq>
vWi PoVôî
387.3 154.9 19.36
Pl>/?O0I
464.8 193.6
10 3 0 12 50 3 0 0 360 500 600 8 15 10 2 5 120 50 200 250
1500 38.73 375
c
D
6 4
20 6 10 6
30 20
120 40
120 60
180 80
180 120
600 200
24.49 14.14
146.9 56.6
146.9 84.8
580 690
Hence,
960
1150
745.7
890.1
Notes
Mi 690 xlOO = 118.97 P o < ? o 580 p P a _ 1150 x l 0 0 = 119.79 M i l ~ Ml PO?I 960

=
p
o\ =
01
* C = Vl 18.97 x 119.79 = 119.38 119.4
118.97 + 119.79 pDB _l/pi" ,i pM_ = M H ~~ V M l )~

?O+9I
p
M E
2>
xl00=
6 9 Q +
x l 0 0 = 119.48 580 + 9 6 0
1 1 5 Q
0l ~ | f ! ^
0 = ^ i x , 0 0 = 119.36
7 4 5
XAJVWI
Summary
Time series analysis is a long-term forecasting tool. Time series is a series of data collected over a period of time separated by successive time intervals. The time series analysis is employed for two basic purposes - to study and characterize the past behavior of data and to make a plausible forecast for the future. Secular trend is the overall behavior of the time series over the entire period of the observation. Periodic variations occur during smaller period of time intervals. Seasonal variations and cyclic variations are therefore, local in nature and unlike secular trend repeat many times during the entire period of observation. An index number is a statistical measure used to compare the average level of magnitude of a group of distinct but related variables in two or more situations.
Keywords
Measuring Secular Trend: Secular trend is the overall behavior of the time series over the entire period of the observation.
Piiwi/ih
Tprhviir/il
1'Jmwmitv
101
Index Numbers: An index number is a statistical measure used to compare the! average level of magnitude of a group of distinct but related variables in two:or more situations. Current Year: The year under consideration for which the comparisons are td be computed is called the current year.
Review Questions
1. The consumer price index for the working class of a town in 1965 was 222, the index of food being 250, the index of clothing 290 and the index of all other items including fuel and light 120. In 1970, the cost of living index stood at 331, the index of food being 325, of clothing 445 and of all other items 310. Find out the relative importance of (i) food, (ii) clothing and (iii) all other items, in the budget of working class families in the base period. Hint: Let the weights be X, Y and (100 - X - Y) for the three groups respectively. 2. The consumer price index over a certain period increased from 120 to 215 and the wages of a worker increased from Rs 840 to Rs 1,550. What is the gain or loss to the worker in real terms ? Hint: Inflate Rs 840 by using the price indices. 3. The data given below show the percentage increase in prices of a few selected food items and the weights attached to each of them. Calculate index number for the food group.
Food items Weight : Rice Wheat Dal Ghee Oil Spices Milk Fish Vegetables Refreshments : 33 11 8 5 5 3 7 9 9 10
: 1 8 0 2 0 2 1 1 5 2 1 2 1 7 5 5 1 7 2 6 0 4 2 6 3 3 2 2 7 9
InpricT^
Use the above food index and the information given below to calculate the cost of living index number.
Group Index Weight : : : Food ? 60 Clothing 310 " 5 Fuel & Light 220 8 Rent & Rates 150 9 Miscellaneous 300 18
Hint: Obtain price index by adding 100 to thepjercentage increase in price.
Further Readings
Anderson, D.R., D.J. Sweeney, and T.A. Williams, Quantitative Methods for Business, 5th edition, West Publishing Company Freund, John E., and Ronald E. Walpole, Mathematical Statistics, 5th edition, Englewood Cliffs, NJ: Prentice Hall Abraham, B., and J. Ledolter, Statistical Methods for Forecasting, John Wiley & Sons
Probability Theory
I Unit 10 Probability
Notes
Theory
Unit Structure
V Introduction Probability Concepts Permutations Combinations Objective and Subjective Probabilities Revised Probabilities Random Variables and Probability Distribution Discrete Random Variables Continuous Random Variables Binomial Distribution Poisson Distribution Normal Distribution Summary Keywords Review Questions Further Readings
Learning O b j e c t i v e s
After reading this unit you should be able to: Define probability Differentiate between objective and subjective probabilities Use permutations in computing probability Use combinations in computing probability Define random variables and their probability distributions Define and use binomial probability distribution Define and use Poisson distribution Define and use normal distribution
Introduction
If all business decisions could be made under conditions of certainty, the only valid justification for a poor decision would be failure to consider all the pertinent facts. With certainty, one can make a perfect forecast of the future. Unfortunately, however, the manager rarely, if ever, operates in a world of certainty. Usually, the manager is forced to make decisions when there is uncertainty as to what will happen after the decisions are made. In this latter situation, the mathematical theory of probability furnishes a tool that can be of great help to the decision maker. Probability theory is the topic of concern in this unit.
193
Probability Concepts
Since we operate in a world full of uncertainty, mathematicians have always been interested to quantify uncertainties associated with an event so that one may take a better decision when a situation arises. Probability is <jj mathematical measure of uncertainty. Concepts Theory of probability employs certain concepts, which will be dealt first. Experiment An activity that produces some result and which can be repeated in identical! environment is called an experiment. Thus, throwing a dice; circulating an advertisement; tossing a coin; etc. are some of the examples of experiments because these activities can be repeated as many times as one wants each time producing some results. However, those activities that are not repeatable cannot be termed as experiment in the present context. Deterministic experiment The result of an experiment may or may not depend on chance. Those experiments whose outcomes can be predicted are called Deterministic Experiments. Random or Stochastic experiment Consider a phenomenon whose outcome is not predictable in advance, through all the possible outcomes are known such as tossing a coin. In this case we know that either a heads appears or a tails appears. That is we know all the two possible outcomes but we cannot say with certainty that a head or a tail will turn up. So we cannot predict about the actual outcome of this experiment in advance. Such an experiment is called a random or stochastic experiment. Sample space The set of all possible outcome of a random experiment is called its sample space. Each outcome itself is called a sample point. Let us consider the experiment of tossing of a coin, then the possible outcomes are either (H)ead or (T)ail. Thus the sample space contains two points in this case i.e. S = { H , T) In case of an experiment if simultaneous throw of two coins the sample space consists of sample points: HH, HT, TH TT. Thus, the sample space is: S = {HH, HT, TH, TT} Similarly, in case of the experiment of throwing first a coin and then a die we get the following sample space: S=|H1, H2, H3, H4, H5, H6, T l , T2, T3, T4, T5, T6}
A sample space, which consists of a finite number of sample points is called finite sample space otherwise an infinite sample space. All the experiments described above are of finite sample spaces. The sample space associated with the experiment of throwing the coin until a heads appears, is infinite sample space given by: S={H, TH, TTH, TTTH, TTTTH, TTTTTH, Event Event is a subset of sample space of a random experiment. Thus, if we consider the experiment of throwing a die first and then a coin, whose sample j space is: S={1H, 2H, 3H, 4H, 5H, 6H, 1T,2T, 3T, 4T, 5T, 6T} then the following sets are some events of this experiments. E l = j l H , I T } : Event of getting 1 on the die. E2={1H, 2H, 3H, 4H, 5H, 6 H } : Event of getting heads on the coin. E3={2H, 4H, 6H, 2T, 4T, 6 T ) : Event of getting even number on the die. E4={1H, 2H, 3H, 4H, 5H, 6H, 1T,2T, 3T, 4T, 5T, 6T} : Event of getting any outcome. E 5 = j } : Event of getting no outcome. Equally likely outcomes The outcomes of a random experiment are said to be equally likely if each of the outcomes stand equal chance of occurrence. In such cases we cannot find a reason for one outcome to occur in preference to the other outcomes. For example, if we roll a die, all the faces 1, 2, 3, 4, 5, 6 stand equal chance of turning up provided the die is fair. These outcomes are, thus, equally likely. However, if we load the die on one side so that the it becomes heavier on one side then all the outcomes are not equally likely. Note that in case a random experiment with equally likely outcomes is repeated N number of times we can fairly assume that all the outcomes will happen equal number of times. Mutually exclusive events Some events can happen together while others cannot. For instance, in drawing a card from a well-shuffled pack of 52 cards, the events (1) that the card is an ace and (2) that it is of club suite are events that have happened together. In contrast in a flip of a fair coin the events (1) turning up of and (2) turning up of tail cannot happen together. A collection of events in which happening of one rules out the happening of the other are called mutually exclusive events. Exhaustive events The combination of all the possible events of a random experiment is called exhaustive events. Practically it is the same as sample space of the experiment concerned. }
Probability Theory
Notes
195
Random variable A variable whose value is determined by the outcomes of a random experiment is called a random variable. A random variable is also known as a chance variable or stochastic variable. The random variable may be discrete or continues. Thus, "the number of heads turning up" in an experiment of tossing 3 fair coins is a random variable. It can take values 0 , 1 , 2 and 3. The value it takes in a toss depends on chance and hence it is a random variable. Similarly, the rainfall registered in a region in a month is also a random variable because its value can be anything depending on chance. Discrete random variable A random variable that can assume one of the finite number of values is called a discrete random variable. An example of discrete random variable defined on the experiment of tossing 3 coins is "the number of heads turning up" as cited in the example above. It can take one among the four values - 0,1,2 and 3. Continuous random variable A random variable that can take one among infinite values depending on chance is called continuous random variable. For example, the random' variable "rainfall recorded in region in a particular month" can take - say - a value between 25cmm to 28 cm. Since the interval 25 to 28 contains infinite number of values, this random variable is continuous random variable. Definitions of probability Classical definition Classical definition of probability was coined by Bernoulli. This definition takes the conditions of the random experiment into consideration while assigning probabilities to events. For this reason it is also referred to as a priori definition of probability. IF a random experiment results into equally likely, mutually excusive and exhaustive outcomes, with S as its sample space, then probability of an event E is given by: Number of outcomes in favor of the^.cnl E Total number of outcomes Example 1 Find the probability of getting a heads on a toss of a fair coin. Solution Let the event E be defined as "getting a heads". In this random experiment, sample space S = {H, T) The event E = (H) _ n(E) n(S)
Notes
Student Activity
.Three cards are drawn from a well-shuffled deck of 52 cards. Enumerate the outcomes favourable to the following events: (a) All the cards are aces (b) Two cards are aces and the third is a king (c) None of the cards are either red or black (d) All the cards are face cards
Thus,
Probability Theory
n(S) = 2 n(E) = l Therefore, P("gerting a heads") = P(E) = n(S) =2

N o t e s
Example 2
What is the probability of getting at least two heads in a toss of three fair coins? Solution Let the event E be "getting at least two heads" in this random experiment. Here, S=|HHH, HHT, HTH, THH, HTT, THT, TTH, TTT) E={HHH, HHT, HTH, THH} Therefore,
P ( E ) = ^ > = 1 = 0.5 n{S) 8 Permutations

If there are four horses in a race, there are four possible outcomes provided that one is concerned only to name the winner. If the second horse also has to be named, there are 4 x 3 possible outcomes. Representing the horses by A, B, C and D, the possible outcomes are: AB AC AD BA BC BD CA C B C D DA DB DC In general, if there are n horse in a race then there are n(n - 1 ) ways in which the first two places can be filled. The (n-1) horses, which may fill the second place, are the original n horses minus the horse that has been named for the first place. For each different winner the list of (n - 1) candidates for the second place is therefore different, but there are always exactly (n - 1) candidates for each of the n winners. Extending this argument, when the objective is to name the first three horses, the number of possible outcomes must obviously be n(n - l)(n - 2). To name r individuals in sequence out of a total of n individuals, the number of possible outcome becomes n(n-l)(n -2)...(n - r + 1), that is, r integers all multiplied together of which each is 1 less the preceding integer.
Each of these possible outcomes is called a permutation. The number of permutations of r objects from total n objects is the number of different ways in which r objects can be selected in sequence from a total of n objects. It is convenient to employ the expression 'factorial n', which is represented by n? and is defined as: n! = 1 x 2 x 3 x... x (n - 1) x n
anc
QI - \
Notes
The expression 'nPr' is used as a shorthand form for 'the number of. permutations of r objects taken from n distinct objects'. It follows that: n! P =n(n-lHn-2)....(n-r+l) = (n-r)!
n r
In the above discussion about horses:

4!
4 P lx = r
4! 3!
(4-1)! 4!
P =
2
4! = - = 4 x 3 = 12 (4-2)! 2!
.' 4! 4!
3 3
4? = = - = 4 x 3 x 2 = 24 (4-3)! 1! P = 4! = - = 24 (4-4)! 0! 4!
It follows that when using the formula 0! must be taken as equal to 1. This may seem surprising, but it is the answer one arrives at if 1! is divided by 1, which is a reasonable way of finding the factorial of the next lower integer. If we want to find the factorial of -1 then we will have to divide 0! by 0. Dividing 0! by 0 suggests that the factorial of -1 is infinity. This works' correctly for cases where r is greater than n. The number of different ways of selecting 5 objects in sequence, from a total of 4 objects, is: 4!
p =
4!
_
=
4!
_
=
(4-5)!
-1!
00
It is however unwise to treat infinity as if it were a number. The correct method is to remember that P is equal to n(n-l)(n-2)...(n - r + 1), and so
n r
when r is an integer greater than n, one of these terms must be zero, making the product zero. Some mathematical problems can give rise to ratios such as W/(_ y. Using the correct interpretation, this is not infinity divided by infinity but (-2)(-3)(-4) which is -24.
5
Example 3 In a fashion competition, six models have to be listed in order of preference out of a total of ten models. In how many different ways can the selection be made? Solution Here, n = 10 and r = 6. Therefore, the answer is:
1 0
P =
6
10! (10-6)
10x9x8x7x6x5x4! 4!
= 1,51,200
Four-figure mathematical tables often include a table of n! or log n! or both. Log 10! Is 6.5598 and log 4! Is 1.3802, and so the answer is the antilogarithm of 5.1796.
Probability Theory
Combinations
It is sometimes necessary to know the number of different ways in which V objects can be selected from n objects without regard to sequence. For instance, the number of different permutations of given cards from a pack of 52 cards is P . The same five cards dealt in different sequences are different permutations. In practice, the interest is usually in the number of different possible hands irrespective of the sequence in which they were dealt. To obtain this, one must divide by the number of ways in which five cards can be arranged among themselves, which is P .
5 2 5 5 5
Notes
Each different way of selecting r objects from n without regard to the sequence in which Ihey were selected is termed a combination, and each such combination consists of P permutations. The number of combinations is represented by Jl^ and so C divided by P , Since P is r!:
r r n r r r r r
Student Activity
Five cards are drawn from a well-shuffled deck of cards. What is the probability that the two cards of the same suite are king and ace? Prove that: nPr = r!nCr
(n-r)\r\
It will be seen that C works out the same as C . . This is reasonable, since the number of ways of selecting five cards from a pack of 52 is obviously the same as the number of ways of selecting 47 cards and leaving a hand of five cards behind.
n r n n r
Example 4 In how many different ways three bolts can be selected from a box containing eight bolts? Solution The answer is C . It is convenient to adopt the practice of using dots in place
8 3
of multiplication signs when several numbers are all multiplied together:

8! 8.7.6 = 56 C,= = 5!3! 1.2.3
There is no need to write out the factorials in full. The number of integers to be multiplied in both numerator and denominator is the smaller of r and (n-r). The denominator always cancels completely, since nCr must be an integer.
Objective and Subjective Probabilities

Most of us are familiar with the laws of chance regarding coin flipping. If someone asks about the probability of a heads on one toss of a coin, the answer will be one-half, or 0.50. This answer is based on common experience with coins, and assumes that the coin is a fair coin and that it is "fairly" tossed. This is an example of objectivity probability.
Notes
A subjective interpretation of probabilities is often useful for busine decision-making. In the case of objective probability, definitive historic information, common experience (objective evidence), or rigorous analysis libehind the probability assignment. In the case of subjective interpretatior quantitative historical information may not be available; and instead off objective evidence, personal experience becomes the basis of the probability assignment. For managerial decision-making purposes, the subjective interpretation T frequently required, since reliable objective evidence may not be available. Assume that a manager is trying to decide whether or not to build a newf factory, and the success of the factory depends largely on whether or not there? is a recession in the next five years. A probability assigned to the occurrence! of a recession would be a subjective weight, which would be assigned afters There would certainly be less agreement on this probability than there wouldl be on the probabilities of drawing a red ball, or of a fair coin coming up heads. Basic Statement of Subjective Probability Two fundamental statements about probabilities are: 1. 2. Probabilities of all the various possible outcomes of a trial must sum to one. Probabilities are always greater than or equal to zero (i.e., probabilities are never negative) and are less than or equal to one. The smaller the probability, the less likely is the chance of the event happening.
The first statement indicates that if A and B are the only candidates for an office, the probability that A will win plus the probability that B will win must sum to one (assuming a tie is not possible). The second statement results in the following interpretations. If an event has a positive probability, it may possibly occur; the event may be impossible, in which case it has a zero probability; or the event may be certain to occur, in which case the probability is equal to one. Mutually Exclusive Events Two or more events are mutually exclusive if onlvsone of the events can occur on any one trial. The^probabilities of mutually exclusive events can be added to obtain the probability that one of a given collection of the events will occur. Example 5 The probabilities shown in Table 10.1 reflect the subjective estimate of a newspaper editor regarding the relative chances of four candidates for a public office (assume a tie is not possible).
Table 10.1: Election Probabilities
Probability Theory
Event: Elect
Candidate A Candidate B Candidate C Candidate D
Probability
0.18 0.42 0.26 0.14 1.00 Notes
These events are mutually exclusive, since in one election (or in one trial) only one event may occur; therefore the probabilities are additive. The probability of a Democratic victory is 0.60; of a Republicanjdetory, 0.40; or of either B or C winning, 0.68. The probability of both B and C winning is zero, since only one of the mutually exclusive events can occur on any one trial. Dependent and Independent Events Events may be either independent or dependent. If two events are (statistically) independent, the occurrence of one event will not affect the probability of the occurrence of the second event. When two (or more) events are independent, the probability of both events (or more than two events) occurring is equal to the product of the probabilities of the individual events. That is: P(A and B) = P(A).P(B) it A, B independent where P(A and B) denotes probability of events A and B both occurring together P(A) = Probability of event A P(B) = Probability of event B The above equation indicates that the probability of A and B both occurring is equal to the probability of A multiplied by the probability of B, if A and B are independent. If A is the probability of a heads on the first toss of the coin, and B is the probability of a heads on the second toss of the coin, then: P(A)= Vt P(B)= Vi P(A and B)= Vz x
Vi =
The probability of A and B occurring (two heads) is one-fourth. P(A and B) is the joint probability of events A and B. Where appropriate, the word and can be omitted to simplify the notation, and the joint probability can be written simply as P(AB). Conditional Probability To define independence mathematically, we need the symbol P(B\A). The symbol P(B\A) is read "the probability of event B, given that event A has
occurred." P(B\A) is the conditional probability of event B, given that even A has taken place. Note that P(B I A) does not mean the probability of event divided by A - the vertical line followed by A means "given that event A ha occurred." It follows that, P(B I A) = P(B) if A, B independent
That is, the probability of event B, given that event A has occurred, is equal tc4 the probability of event B if the two events are independent. With two! independent events, the occurrence of the one event does not affect the probability of the occurrence of the second [in like manner, P(A 1 B)=P(A)]. Two events are dependent if the occurrence of one of the events affects the probability of the occurrence of the second event. Lets take an example. Flip a fair coin and determine whether the result is heads or tails. If heads, flip the same coin again. If tails, flip an unfair coin that has a three-fourths probability of heads and a one-fourth probability of tails. Is the probability of heads on the second toss in any way affected by the results of the first toss? The answer here is yes, since the result of the first toss affects which coin (fair or unfair) is to be tossed the second time. Another example of dependent events involves mutually exclusive events. If events A and B are mutually exclusive, they are dependent. Given that event A has occurred, the conditional probability of B occurring must be zero, since the two events are mutually exclusive. Example 6 Assume we have three boxes, which contain red and black balls as follows: Box 1 Box 2 Box 3 : : 3 red and 7 black 6 red and 4 black 8 red and 2 black
Suppose we draw from a ball from box 1; if it is red, we draw a ball from box 2. If the ball drawn from box 1 is black, we d r a w ^ ball from box 3. The diagram in Figure 10.1 illustrates the game. Consider the following probability questions about this game: 1. What is the probability of drawing a red ball from box 1? This probability is an unconditional or marginal probability; it is 0.30. (The marginal probability of getting a black is 0.70). Suppose ive draw a ball from box 1, and it is red; what is the probability of another red ball when we draw from box 2 on the second draw? The answer is 0.60. This is an example of a conditional probability. That is, the probability of a red ball on the second draw if the draw from box 1 is red is a conditional probability.
2.
Suppose our first draw from box 1 was black; what is the conditional probability is 0.80. The draw from box 1 (the conditioning event) is very important in determining the probabilities of red (or black) on the second draw. Suppose, before we draw any balls, we ask the question: What is the probability of drawing two red balls? This would be a joint probability; the event would be a red ball on both draws. The computation of this joint probability is a little more complicated than the above questions, and some analysis will be of value. Computations are as follows: P(A and B) = P(B\A)xP(A)
Probability Theory
Notes
3 Red 7 Black
Box 1
Box 2
Box 3
Figure 1 0 . 1
Table 10.2 and Figure 10.2 show the joint probability of two red balls as 0.18 [i.e., P(R and R) or more simply P(RR), the top branch of the tree]. The joint probabilities may be summarized as follows: Two red balls A red ball on first draw and a black ball on second draw A black ball on first draw and a red ball on second draw Two black balls P(RR) P(RB) P(BR) P(BB) = 0.18 = 0.12 = 0.56 = 0.14 1.00
Table 1 0 . 2 : Probabilities Calculations Marginal Event RR RB BR
BB
Conditional P(A) P(R) = 0 . 3 0 P(R) = 0 . 3 0 P(B) = 0 . 7 0

P(B) = 0.70
Joint P(B|A)=P(A and B) P(R|R) = 0.60 P(B|R) = 0.40 P(R|B) = 0.80 P(B|B) = 0.20
>
P(RR) = 0 . 1 8 P(RB) = 0 . 1 2 P(BR) = 0 . 5 6

P(BB) = 0 . 1 4
Red P(R\R) = 0.60
Joint Event probabilities RR P(RR) = 0.18
Notes
Red P(R) = 0 . 3
Second draw - box 2

Black P ( B | R ) = 0.40 RB P(RB) = 0.12
First draw - box 1

Red P(B | R) = 0.80 BR P(BR)=0.56
Black P(B) = 0.7
Second draw - box 3

Black P ( B | B ) = 0.20 BB P(BB) = 0.14 Figure 10.2: Tree Diagram
Figure 10.2 is called a tree diagram. This is a very useful device for illustrating uncertain situations. The first fork shows that either a red or a black may be drawn, and the probabilities of these events are given. If a red is drawn, we go to box 2, where again a red or black may be drawn, but with probabilities determined by the fact that the draw will take place in box 2. For the second forks, we have conditional probabilities (the probabilities depend on whether a red or a black ball was chosen on the first draw). At the end of each path are the joint probabilities of following that path. The joint probabilities are obtained by multiplving the marginal (unconditional) probabilities of the first branch by the conditional probabilities of the second branch. Table 10.3 presents these results in a joint probability table; the intersection of the rows and columns are joint probabilities. The column on the right gives the unconditional probabilities {marginals) of the outcome of the first draw; the bottom row gives the unconditional or marginal probabilities of the outcomes of the second draw. Table 5-3 effectively summarizes the tree diagram. Now, let us compute some additional probabilities: 1. Probability of one red and one black ball, regardless of order: = 0.56 + 0.12 - 0.68 Probability of a black ball on draw 2: Explanatory calculation:
2.
Probability of red-black = 0.12 Probability of black-black = 0.14 Probability of black on draw 2 = 0.26 3 Probability of second draw being red if first draw is red: = 0.60 If first draw is red, we are in the R row of Table 5-3, which totals 0.30. The question is,
Probability Theory
Notes
Student Activity
What proportion is 0.18 of 0.30? The answer is 0.60; or in terms of the appropriate formula: P(R 1 Rj) = P(R and Rj) = 0.18 = 0.60
2 2
P(R,)
0.30
Table 10.3: Joint Pobability Table Second ^ " " ^ ^ ^ Draw First Draw
A box contains 30 bulbs out of which 5 are defective. A customer draws a sample of 3 bulbs at random one after the other. If the sample contains even one defective bulb the box is rejected. What is the probability that the box is not rejected?
P(RR) 0.T8
P(RB) 0.12 P(BB) 0.14 0.70

1.00
0.30
Marginal Probability of outcome on

Second Dfow 0.74 0.26
Revised Probabilities
Having discussed joint and conditional probabilities, let us investigate how probabilities are revised to take account of new information. Suppose we do not know whether a particular coin is fair or unfair. If the coin is fair, the probability of a tail is 0.50; but if the coin is unfair, the probability of a tail is 0.10. Assume we assign a prior probability to the coin beings fair of 0.80 and a probability of 0.20 to the coin being unfair. The event "fair coin" will be designated A and the event "unfair coin" will be designated A . We
v 2
toss the coin once; say, a tail is the result. What is the probability that the coin is fair? The conditional probability of a tail, given that the coin is fair, is 0.50; that is PftaillA}) = 0.50. If the coin is unfair, the probability of a tail is 0.10; P(tail! A ) = 0.10
2
da
P(BR) 0.56
Let us compute the joint probability P(tail and A ). There is an initial 0.1
x
Ran
probability that A is the true state; and if A is the true state, there is a 0.!
l x
conditional probability that a tail will result. The joint probability of state AL
Notes
being true and obtaining a tail is (0.80 x 0.50) = 0.40. Thus: P(tail and = P(A,) x P(tail I AA = 0.80 x 0.50 = 0.40
The joint probability of a tail and A is equal to:

2
P(tail and A ) = P(A ) x P(tail I A ) = 0.20 x 0.10 = 0.02

2 2 2
A tail can occur in combination with the state "fair coin" or in combination! with the state "unfair coin". The probability of the former combination is 0.40,** of the latter, 0.02. The sum of the probabilities gives the unconditional? probability of a tail on the first toss; that is, P(tail) = 0.40 + 0.02 = 0.42: P(tail and A ) = 0.02
2
P(tail and A ) = 0.40

x
P(tail)
= 0.42
eit
If a tail occurs, and if we do not know the true state, the conditional probability of state A being the true state is:
x
P(Aj I tail) = Pftail and AA = 0.40 = 0.95 P(tail) 0.42

v
Thus, 0.95 is the revised or posterior probability of A given that a tail has occurred on the first toss. Similarly: P(A I tail) = P(tail and A ) = 0.02 = 0.05
2 2
P(tail) In more general symbols:
0.42
P(A I B) = P((A; and B)

;
P(B) Conditional probability expressed in this form is known as Bayes theorem. It has many important applications in evaluating the worth of additional information in decision problems. In this example, the revised probabilities for the coin are 0.95 that it is fair and 0.05 that it is unfair (the probabilities were initially 0.80 and 0.20). These revised probabilities exist after one toss when the toss results in a tail. It is reasonable that the probability that the coin is unfair has decreased, since a tail appeared on the first toss, and the unfair coin has only a 0.10 probability of a tail.
Random Variables and Probability Distribution

t\ probability function is a rule that assigns probabilities to each element of a get of events that may occur. If, in turn, we can assign a specific numerical value to each element of the set of events, a function that assigns these numerical values is termed a random variable. The value of a random variable is the general outcome of a random (or probability) experiment. It is useful to distinguish between the random variable itself and the values that it can take on. The value of a random variable is unknown until the event occurs (i.e., until the random experiment has been performed). However, the probability that the random variable will be any specific value is known in advance. The probability of each value of the random variable is equal to the sum of the probabilities of the events assigned to that value of the random variable. For example, suppose we define the random variable Z to be the number of heads in two tosses of a fair coin. Then the possible values of Z, and the corresponding probabilities, are: Possible Values of Z
0 1 2
Probability Theory
Probability of Each Value

Va
Random variables can be grouped into probability distribution, which can be either discrete or continuous. Discrete probability distributions are those in which the random variable can take on only specific values. The table above is an example of such a distribution since the random variable Z can be only 0, 1, or 2. A continuous probability distribution is one in which the value of the random variable can be any number within some given range of values - say, between zero and infinity. For example, if the random variable was the height of members of a population, a person could be 5.3 feet, 5.324 feet, 5.32431 feet, and so on, depending on the ability of instruments to measure. Some additional examples of random variables are shown in the Table 10.5. A discrete probability distribution is sometimes called a probability mass function (p.mi.) and a continuous one is called a probability density function (p.d.f.) Graphs of the two types of distributions are shown in Figure 10.3 and 10.4. For a discrete distribution, the height of each line represents the probability for that value of the random variable. For example, 0.30 is the probability that tomorrow's demand will be 0.2 tons in Figure 5-5. For a continuous random variable, the height of the probability density function is not the probability for an event. Rather, the area under the curve over any interval on the horizontal axis represents the probability of taking on a value in that interval. For example, the shaded area on the left in Figure 6.4 represents the probability that tomorrow's demand will be in the interval between 0.1 and 0.2 tons.
Table 10.4: Examples of Random Variables Random Variable Description of the Values (denoted by a of the Random Variable Variable capital letter) U Possible outcomes from throwing a pair of dice X Possible number of heads, tossing a coin five times Y Possible daily sales of a newspaper, where S represents the inventory available T Time between arrivals of calls at 911 Emergency Call Center L Lite of an electronic component of a computer Discrete or Continuous Values of the Random
Notes
Discrete Discrete Discrete Continuous Continuous
2, 3
12
6,1,2,3,4,5
0,1,2, . . . 5
Otoa otoa
0.30 0.30
0.20
CO -Q O
0.10 0.10
0.10
0.10 0.10
0.10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Tomorrow's demand in tons Figure 10.3: Discrete Probability Distribution (probability mass function or p.m.f.)
Shaded area is P(tomorrow's demand is between 0.1 and 0.2 tons) Area under entire curve sums to 1 Shaded area is P(tomorrow's demand greater than 0.5 tons)
0.2
0.3
0.4
0.5
Tomorrow's demand in tons Figure 10.4: Continuous Probability Distribution
Now let us take a look at the first discrete probability distribution, the binomial probability distribution and how it is derived. The binomial probability distribution is the base on which several other probability distributions are based upon.
Thus, a random variable is a real value of function defined over the sample space S i.e. X ( S ) : S -> R.
Discrete Random Variables

A random variable X is said to be discrete if it takes finite or countable number of values. In the sample space Q (it is the notation of sample space). A countable set is one that can be placed into one to one correspond with the set of integers. In other words a real valued function defined on a discrete sample space is called discrete random variable. For example, two balls are drawn in succession without replacement from an urn containing four red balls and three black balls. The possible outcomes and the values of y random variable Y, where Y is the number of red balls, are: Simple Event RR RB BR BB y 2
>
1
0
In this experiment, the sample space contains finite number of elements.
Distribution Function Let X be a random variable defined over a sample space S then a point function F(.) defined on the real line and given by: F{x) = {X<x} VxeR
is called the distribution function of random variable X. In other words the distribution function of random variable X is the probability of {X < x}. Thus, F(x) = P{X< x) = P{-<c,x} = P{-:<X<x} The distribution function some time is called as cumulative distribution function. Following results follow from the properties of distribution function. 1. 2. 3. 4. F(x) is monotonic non-decreasing function F(-oc)=Owhere/ (-oc)=
r
\imF(x)
F(-KK)=1 where F ( + x ) = limF(x)

Jf->oc
F(.v) is right continuous
Probability Mass Function (P.M.F.) Let X be a discrete random variable and R be the range space of X that, consists of a countable infinite number of values - say (x\, xi, x$, x ) , therl
n
Notes
with each possible outcome Xj we associate a number p(x ) = p(X = x ) which

j (
must satisfies following properties: (i) (ii) Such a function p(x), if exists, is called function the Probability Mass Functio~ (p.m.f.) or simply the probability function of the r.v. X. and the collection of pairs {x,, p(x )}V7 is called the probability distribution of the random variable X.
i
P(x,)>0Vi
For example, in a simultaneous the terms of two fair dice. Let X denote the outcomes that's the sum of two numbers then probability distribution of X can be given as:
X = .v
P
10
26
11 736
12 736
(x)=p(x=x)y
2 / 736
3/ 736
4/ 736
5/ 736
6/ 736
5 / 736
4 / 736
736
Example 7 A random variable X has following problem distribution: X=x

P(X=X)= p(x)
0 0
1 K
2 2K
5 K
2
6 2K
2 2
7 (7K + K)
2K 3K
Determine the value of K. Also, evaluate P(X <6), P(X > 6 ) a n d p ( 0 < X < . 5 ) . Find out the distribution function. Solution (i) To find the value K As bv definition:
0 F K + 2K + 2K + 3K + K + 2K + IK + K = 1 10A- +<?* = L 10A- +qk-\ = 0

2 2
(10A-L)(T+I)=0
A= o r k = -\
10
k=
10
!fl
Srif-Inttrurtmrr/i/
Material
k = -1 is rejected because probability can't be negative (ii) P(X < 6) =P{x = 0) + P(x = 1) + P(x + 2) + P(x = 3) + P{x = 4) + p(x = 5) . 1 2 2 3 1 = 0 + + + + + - 10 10 10 10 100 _ 10 + 20 + 20 + 30 + 1 _ 81
Probability Theory
Notes
ioo
p(x>6) = \-P(x<6)
=1
~~7oo
81._ 19
100~100 F{Q<x<5)=P(x = \) + P(x = 2) + P{x = 3) + &(x = 4) 5 X 0 1 2 3 4 * 5 * ~/100 6 7 Example 8 Suppose that the random variable X assumes values 0 , 1 , 2,....with probability -,, respectivelv. Obtain the distribution function of X.. 3 6 2 Solution -2/ 7^+A- = %
0
= /(*<*)
0=0 k* 0=0
V AO
k -V * /10
3
2k-V
l K
- 3/ /10
-)k -2/ ~/10

/10
8
*
8
*=Xo
2
* * =%o
+1
9k + \0k =1
X=x P(X = x) = p{x)
0 1/
/3
1 1/ /6
2 1/ /2
To find the distribution function of X,
11
11
F(x) = -,- + -,- + -+3 3 6 3 6 2 1 36

Notes
~3'6'6
=1,1,
3'2'
Say
1/2
1/2,
1/3 1/3 1/4
{X = x) 0 1 2 Hence the
jP !
F(x)
X X X
required distribution function
1/ /3
X
1 obtained
is
asF(x) = f (x ) = p(X<x)
Continuous Random Variables

If a random variable assumes each and every possible value with in a given interval with certain probability, then it is called continuous random variable. In other words we can say that a.r.v. defined on a cont. sample space is said to be continuous random variable. Probability Density Function (P.D.F.) "> If x is a continuous random variable defined over the set of real numbers R. Then of function denoted by / (x) is said to be the p.d.f. by X if (i)
+TX.
f(x)>0VxeR \f(x)dx = \
(ii)
(iii)
jf(x)dx=P(a < X < b) = F(b)-F(a) =P[a < X < b] = [a < X < b]
jf the random variable is continuous then the probability that it takes a particular value say X = a is zero, i.e., P{X =a) = 0. We define a p.d.f. as: i
| | n /
Probabdity Theory
A,Ô
(Ax)-F Ax
Notes
and it satisfies three properties as indicated above. Cumulative Distribution Function Let X be a continuous r.v. defined over R and having p.d.f. f(x) then the cumulative distribution function or simply the distribution function of let X is given by: F(x) = P(X<x)=]f{x)dx
-oo
Relation between c.d.f. fix) and p.d.f. f(x) Let X be a continuous random variable with c.d.f. F(x), then
X
F{x) = P(X<x)=\ f{x)dx

-oo
(By definition)
F\x)= m
a
Fix
= \im-.{P(x<X <x + h)} =/(x) => dx i-F{x)=f(x)
This relation facilitates us with the fact that if we are given the c.d.f. F(x), then by differentiating it we can easily find out the p.d.f. f (x) of the continuous r.v. X. It follows from this that:
(i) 0<F(.x)<l;-oo<.r<oo
(ii)
F'(.r) = F(x) = P(x)>0
dx => F(x) is non decreasing function of x.

x
(iii)
F (-oo )=lim/ (x)= lim

r
\ F{x)dx
-or.
- r
\f{x)dx=
Punjab
Technical University
213
(iv)
F(+oo) = lim F(x) = lim

-00
f F{x) dx
oc
= \f{x)dx=
(v)
P(o <X <b) =P(a <X<b)=P(asX
<b) =\f(X)dx
a b
P(a<X<b) = =
=P(a
<X<b)=P(asX
<b)-\f{X)dx
a
p(X<b)-P(X<a) F(b)-F(a)
Binomial Distribution
Binomial distribution is one of the simplest and yet extremely useful theoretical distribution of discrete random variables. It was discovered by J.Bernoulli. Binomial distribution is probability distribution expressing the probability of experiments that results into only two outcomes - success or failure. There are great usages of this distribution in business process and the social sciences as well as other areas. The type of process which gives rise to this distribution is usually referred to as Bernoulli trial or as a Bernoulli process. The mathematical model for a Bernoulli process is developed under a very specific set of assumptions involving the concept of a series of experimental trials. These assumptions are: 1. An experiment is performed under the same conditions for a fixed number of trials, sav n. In each trial, there are only two possible outcomes of the experiment. They are called "success" or "failure". The sample space of possible outcomes on each experiment trial is S = (failure, success) 3. The probability of success denoted by p remains "constant from trial to trial. The probability of a failure denoted by q is equal to (1-p) If the probability of success is not the same in each trial, we will not have binomial distribution. For example, if 3 balls are drawn at random from an urn containing 8 white and 15 red balls. This is a binomial experiment if each ball is replaced before another is drawn. If the balls are drawn without replacement, the probability of drawing white ball changes each time a ball is taken from the urn and we no longer have a binomial experiment.
2.
4,
The trials are statistically independent, i.e. the outcomes of any trial or sequence of trials do not effect the outcomes of subsequent trials. This provides us some type of responses such as suppose 10 coins are tossed together, what is the probability of obtaining exactly two heads? For instance, if a coin is tossed once there are two outcomes, namely tail or heads. The probability of obtaining a heads or P ^ probability of obtaining a tail or q = . Thus (p + q) = 1 terms of the binomial {q + p). In the same way, if the two coins are tossed simultaneously there are four possible outcomes. AB TT AB TH AB HT AB HH
= a n
Probability Theory
Notes
d the are
there
The probabilities corresponding to these results are TT

q q
7
TH
I
2
HT
P I P
HH
P
P
q
2
p
These are terms o f the binomial ( q + p f because

( q + p )
2
= q
+ p
2qp
If
p = fl=-,wehave
2 1 1 i
+2 2
= - + - + 4 2 4
Similarly if three coins A, B and C tossed following are the possible outcomes and the probabilities corresponding to the result are: ABC TTT ABC TTH
g p
ABC ABC THT

9P
2
ABC THH
ABC ABC HTH HHT

q p
ABC HHH
HTT
a'
these are terms o f binomial ( q + p f

( q + p f = q * + 3q p + 3qp + q *
2 2
Where p = q = we get + 2 1 8
6
[2 3 8 1 8
3 8
=-+-+-+-
These probability can be calculated by direct count also, e.g. the chance of 1 2 getting 3 tails in a single toss of 3 coins is -, the chance of getting 2 tails is * 8 8
Notes
the chance of getting 1 tail (combined with 2 heads) is - and the chance of 8 getting no tail is 1/8. In general, in n tosses of a coin the probability of the various possible events are given by the successive terms of the binomial expansion (q + p) which is: (q + pf =q"+ n ,q p + ^q"' p + ....+ n
Ci nA 2 2 c n
q"p' +
p'
We can represent these terms in the form of a probability distribution table as follows: Number of heads X 0 1 2 3 Probability p(X = r) \Pq"=q" n q"~*p
c
n /~ p
c
N
n
\ p q
=p
Since by expanding the binomial (q + p) , we obtain probability of 0,1,2,3. heads, the probability distribution is naturally called the Binomial Probability Distribution or simply the Binomial Distribution. The general form of the distribution is as follows:
m
p(r) = n q
c
p
N
where p = probability of success in a single trial and q = l-p N = Number of trials r = Number of successes in n trials If we want to obtain the probable frequencies of the various outcomes in N sets on n trials, the following expressions shall be used N(q + p)" N(q + p) =N q +n q p + n q ''p +.... + n q p +... + p
1 c c c n
116
Obtaining Co-efficient of the Binomial For obtaining co-efficients from the binomial expansion, the following rules may be remembered. To find the terms of the expansion of (q + p) The first term is q" The second term is n q"' \p In each succeeding term the power of q is reduced by 1 and the power of p is increased by 1. The co-efficient of any term is found by multiplying the co-efficient of the preceding terms by the power of q so obtained by one more than the power of p in that preceding term. By expanding (q + p)" we get (q + pY = q' + n q p + n q^p + ....n
nA 1 C{ c c ] n
Probability Theory
Notes
p' +....px
Where n ,n ....are called the binomial coefficient

r r
e.g.
[q + pf
=q
+5q p + \0q p +\0q p +5qp +p
coefficient here will be 1, 5,10,10, 5 , 1 . Mean and Variance of Binomial Distribution Let, if p is the probability of success and q the probability of failure in one trial then in n independent trials the probabilities of 0,1,2,3,4,.... n successes are given by the 1 , 2 , 3 , n + l * term of the binomial expansion (q + p) ,
st n d r d n
X 0 1 2
P(x) Y
xp(x)
Qxq"
i 1 x
nq
*-i
\ 9
l n 0
n-i 2 , <i P
2 x 1
p"
np"
with I p(x) = 1
217
We know the arithmetic mean is given by the relation:
Z x p(x)
Notes
1.x. p(x) = 0.
l
q" +nq -'p +

2 2
ln{n
~ q"- p
X)
+...np"
= nq" p + n(n - 1 ) q"' p + ....np" Taking np common
-np
j>-\
= np(q + r
P
np(\y- =np
Thus Ex p(x) = np Thus the mean of binomial distribution is np. Similarly, we can derive that the standard deviation of binomial distribution is yjnpq variance is npq. Importance of Binomial Distribution The binomial probability distribution is a discrete probability distribution that is useful in describing an enormous variety of real life events. For example, a quality control inspector wants to know the probability of defective light bulbs in a random sample of 15 bulbs whether 15% of bulbs are defective. He quickly obtain the answer from tables of the binomial probability distribution. The binomial distribution can be used when: The outcome or result of each trial in the process are characterized as one of two types of possible outcomes in other words, they are attributes. The possibility of outcome of any trial does not change and is independent of the results of previous trials. Example 8 A coin is tossed six times what is the probability of obtaining four or more heads? Solution When a coin is tossed the probability of heads and tail are equal, i.e. p = q - \ / 2 . Now the various possibilities for all the events are the terms of the expansion ( q - p )
( q + p f = q
b
and therefore, the
+ 6 q
p +
\ 5 q ' p
+ 2 Q q
+ \ 5 q
p *
+ 6 q
p "
the probability of obtaining 4 heads is
Probability Theory
\5q p*=l5x
0.234 ,2)
Notes
the probability of obtaing 5 heads is

2
6<zp =15x
5
= 0.094 ,2j ,2,
the probability of obtaing 6 heads is

'1^
2)
= 0.016.
.. The probability of obtaing 4 more heads is 0.234 + 0.094 + 0.016 = 0.344. Example The incidence of a certain disease is such that on the average 20% of workers suffer from it. If 10 workers are selected at random, find the probability that: 1. 2. exactly 2 workers suffer from the disease not more that 2 workers suffer from the disease
Solution Probability that a worker suffer from the disease p = - and hance q = . 5 5 By Binomial probability distribution, the probability that out of 10 workers, r workers suffer from a disease is given by: P(r) = n ,q ~ p
c n r
Student Activity
1 A fair die is thrown 10 times. Find the probability that: (a)A score of 5 or less occurs on exactly 3 throws. ( b ) A score of more than 3 occurs on exactly 4 throws.
=10C
M
r
"
(i)
The Prob. that exactly worker will suffer is given by p(r) = 1 0 C

:
= ^ = 0.302
(ii)
The probability of not more than 2 workers P(0) + P(l) + P(2) = - i [ l 0C 4' +10C, 4 + 10C 4 ] = 0.678
9 8 0 2
2. A random variable follows binomial distribution. If its mean is 10 and standard deviation is 2, what is the probability of success?
Poisson Distribution
Mostly Poisson distribution is used in queuing theory. It was developed by Simeon Denis Poisson (1781-1840). The distribution is used to describe the behavior of rare events. For example number of d e r a i l e d per day, number of printing mistake in a book etc.
Thus, when p (probability of an event) is extremely small and n (the num of trials) is very large such that their product np=m is a constant, then random variable follows the Poisson distribution. The same is defined as: em
m r
Notes
rl
where r = 0 , 1 , 2, 3, 4, - 5 , e = 2.7183 (the best of Natural Log) m = the mean of the Poisson Distribution i.e., np or the averanumber of occurrences of an event. .." . . The Poisson distribution is a discrete distribution with a single parameter | As m increases the distribution shifts to the right as shown in fig. below for values from m = 0.3 to m = 4.0.
i n ~
10
12
All Poisson probability distributions are skewed to the right. This is the reason why the Poisson probability distribution has been called the probability distribution of rare events. The Poisson distribution is concerned with certain processes that can be described by a discrete random variable. The probabilities of 0,1,2,.... successes are given by the successive terms of expansions. m +m++. 3!
2!
No. of successes(x)
0
Prob.P(x)
No. of Success(x)
Prob p(x)
4
-m
e 4!
me me
~2!~
3
-m
e
3!
Mean & Varianc of P.D. We know since p is very small in Poisson Distribution the value of q is almost equal to 1. So, we can put 1 in place of q in the contents of Binomial distribution. We will focus on mean, standard deviation or variance. The Poisson distribution is given as: No. of successes (x) Probability p(x) e
0 1 2 3
Probability Theory
Notes
4. m e~
4 m
me""
1!
"
2 !
3!
4!
Now we have to calculate mean and various P(x)

0
xp(x)
e me~
2!
m 1 m y
me me
x 2 3 x 2 x 1
mV
3!
me
2
-x3
m
4!
x 4
4 x 3 x 2 x 1
R
x 4
Mean = 2Zxp (x) Lxp = 0 + me

v
r,
-m
+m e
-m
m e +
-m
me
-m
+ ....
2 x 1
3 x 2
m = me
+m+
2 !
h 3!
K...1
= me" t" - m Hence the mean of the Poisson distribution is m Variance or u = v, - v,

2 2
V =Z{X P(X)}
2
P(x) 0
1
x (x)
P
e me' ~2\
1
0 me m~e~
m'e~
x 4
22
Quantitative
Techniques 3! Notes
m e -x9 3x2x1
9 -
4!
me
-x!6
4x3x2x1
rs Lx p{x) = Q + me
m
+ + 3 1!
+ 2! 3!
+.
= me- (l + 2m + 3 + 4 + ...)
2! 3!
x p(x) = me (l + m + + + ...) 2! 3!
/ 2 3 ^
m + 2 + 3 + ... 2! 3!
= me
' m e" +m \ + m + + .. 2! ) = me- (e

m m m m
+me )
= me' .e (\ + m) = m(\ + m) =m+m u =v -v

2 2 2 2
= m + m - (m) - m [v, = m] Thus o or u, = m and a= Jm Example 9 The mean of the Poisson distribution is 2.25.Find the other constants of the distribution. Solution We have the mean or m =2.25 a = yfm = yf225=l.5 u = m = 2.25
2 2
M, = m = 2.25 P.JV.P. u = m + 3m = 2.25 + 3(2.25)

4 2 2
= 17.44
22 Self-Instructional Mat e r i a i
Probability Theory
m
2
2.25
= .444
Notes
P = 3 + = 3 + .444 m = 3.444 Characteristics of Poisson distribution a) It is a discrete distribution and is a limiting form of the binomial distribution when n is large and p or q is small. Mean and variance are equal. Is usually definitely positively skewed but cannot be negatively skewed As n becomes very large the Poisson distribution approximates to the Normal Distribution The mean = np
b) c)
d)
f)
Applications of Poisson distribution The Poisson distribution is similar to the Binomial but is used when n, the number of items or events, is large or unknown and p, the probability of an occurrence, is very small relative to q, the probability of non-occurrence. A rule of thumb is that the Poisson distribution may be used when n is greater than 50 and the mean np is less than 5. Some examples follow but it is important to realize that the Poisson distribution only applies when the events occur randomly, i.e., they are independent of one another. Example 10 Customers arrive randomly at a service point at an average rate of 30 per hour. Assuming a Poisson distribution calculate the probability that: a) b) no customer arrives in any particular minute. exactly one customer arrives in any particular minute.
Solution The time interval to be used is one minute, then the mean m = 3 0 / 6 0 = 0.5
P(r).m
-0.5
a)
P(no customer) = P(r=0, m=0.5) = Table Vl(a)
x(0.5) 0!
e " =0.6065 from
05
223
.-0.5
b)
P(l customer) = P(r=l, m = 0 . 5 ) = Table VI(a)
X(0.5V
J
-=
x 0.5 =0.3033
Notes
Example 11 A firm buys springs in very large quantities and from past records it is kncf that 0.2% are defective. The inspection department sample the springs batches of 500. It is required to set a standard for the inspectors so tha" more than the standard number of defectives is found in a batch consignment can be rejected with at least 9 0 % confidence that the supply* truly defective. How many defectives per batch should be set as the standar Solution With 0.2% defective and a sample size of 500 m = 500X0.2 = 1. To find the probability of 0, 1, 2, 3, etc. or more defectives the probabiliti from Appendix 5 are deducted from 1 as follows: P(0 or more defectives) = certainty =1
Student Activity
1. Customers arrive randomly at a retail counter at an average rate of 10 per hour. Assuming a Poisson distribution calculate the probability that: (a) no customer arrives in any particular minute. (b) exactly one customer arrives in any particular minute. 2. Assuming that the probability of a fatal accident in a factory during a year is 1/1200, calculate the probability that in a factory employing 300 workers, there will be at least two fatal accidents in a vear. [Take e-- =0.7787]. '
25
P(l or more defectives) = 1 - 0.3679 = 0.6321 P(2 or more defectives) = 1 - 0.7358 = 0.2642 P(3 or more defectives) = 1 - 0.9197 = 0.0803 P(4 or more defectives) = 1 - 0.9810 = 0.0190 These probabilities mean, for example, that there is a 26.42% chance that 2 or: more defectives will occur at random in a batch of 500 with a 0.2% defect ratej If batches with 2 or more were rejected then there can be 73.58% (1 - 0.2642)? confidence that the supply is defective. As the firm wishes to be at least 90% confident, the standard should be set at 3 or more defectives per batch. This level could only occur at random in 8.03% * of occasions so that the firm can be 91.97% confident that the supply is truly defective.
Normal Distribution
Till recently we have been making curves which illustrated some of the forms that a frequency distribution may assume. These cocves were based upon data of a few tens or hundreds of cases; each was a sample drawn from a much iarger, possibly infinite, universe. Being a sample, a given curve would not necessarily have exactly the same shape as the curve for the universe, but if the sample is properly selected, the curve for the sample will tend to be of the same general shape as the curve for the universe. The normal curve represents a distribution of values that may occur, under certain conditions, when chance is given full play. In every case the necessary conditions include the existence of a large number of causes, each operating independently in a random manner.
224
Iâphically, the normal distribution looks Uke a bell shaped curve: Normal Probability Distribution
Probability Theory
Notes
Mean
Figure 10.5
There are certain properties of the normal distribution that you can notice from the above curve: 1. It is symmetrical on both sides of the mean, i.e. mean, median and mode coincide at the central value. The curve never touches the x-axis and extend to -infinity on the left hand side to +infinity on the right hand side.
2.
The normal distribution is an extremely important distribution. It is easier to manipulate mathematically than many other distributions and is a good approximation for several of the others. In many cases, the normal distribution is a reasonable approximation for binomial probability distribution for business decision purposes; and in the following chapters, we shall use the normal distribution in many of the applications. Despite its genera] application, it should not be assumed that every process can be described as having a normal distribution. The normal distribution is a function of z, the standard normal variate, and is defined as:
1
-x'n
Here the value of z is given by: _ (Value - Mean) Standard Deviation (x - u) a
The normal distribution is completely determined by its expected value or mean (denoted by n) and standard deviation (a); that is, once we know the mean and standard deviation, the shape and location of the distribution is set. The curve reaches a maximum at the mean of the distribution. One half of the area lies on either side of the mean. The greater the value of s, the standard deviation, the more spread out the curve?
With any normal distribution, approximately 0.50 of the area lies within 0.67 standard deviation from the mean; about 0.68 of the area lies within 1.0 standard deviations; and 0.95 of the area lies with 1.96 standard deviations.
a
a) b c)
d)
Notes
Example 12
Assume that your working hours X are distributed normally with m = 5 and s = 2. What is the probability of your working 9 hours or more than 9 hours?
Solution
First we standardize the specified X value. Thus,
fori*; 9-5
z=
,
S
= = 2 a 2 Looking in Appendix 3, we find that Q(z), the probability of obtaining a value of z as large or larger than specified, is Q(z) = Q(2) = 0.02275 Alternatively, we may say that 2.275 per cent of the area in the distribution is to the right of z=2. This means that there is 2.275 per cent probability that you would be working for 9 or more than 9 hours in the day. This could be looked at in another way. What is the probability that you would be working less than 9 or 9 hours in a day? The solution of this question can be found by subtracting the above valuefrom 1, so that we get the value which lies on the left of the 9 hour line and not on the right in the normal distribution. The answer is 0.97725 or 97.725 per cent probability that you would be working for 9 or less than 9 hours on any given day.
Normal Probability Distribution
a)
b)
c)
Mean = 5
Figure 1 0 . 6
Example 13
An assembly line contains 2,000 components each one of which has a limited life. Records show that the life of the components is normally distributed with a mean of 900 hours and a standard deviation of 80 hours. a) b)
c
Probability Theory
Notes
What proportion of components will fail before 1,000 hours? What proportion will fail before 750 hours? What proportion of components fail between 850 and 880 hours? Given that the standard deviation will remain at 80 hours what would the average life have to be to ensure that not more than 10% of components fail before 900 hours?
d)
Solution a)
'1,000 - 900'
2=
80
1.25
(i.e. this means that the value being investigated, 1000 hrs, is 1.25 standard deviations away from the mean of 900 hours). If Appendix 3 is examined it will be seen that the value for a z score of 1.25 is 0.3944. As one half of the distribution is less than 900, the proportion which fail before 1,000 hours is 0.5 + 0.3944 = 89.44%. If required this could be expressed as the number of components which are expected to fail, thus 2,000 X .8944 = 1788.8 which be rounded to 1789 b) 900 - 750 80 1.875
From the tables we obtain the value 0.4696. In this case as we require the proportion that will fail before 750 hours, the table value is deducted from 0.5. .". Proportion expected to fail before 750 hours.
= 0.5 = 0.4696 = 0.0304 i.e. 3.04% c)
Student Activity
The average monthly sales of 5,000 firms are normally distributed with mean sales Rs. 36,000 and standards deviation Rs. 10,000. Calculate: (a) The number of firsm with sales over Rs. 50,000. (b) Percentage of firms with sales between Rs. 38,500 and Rs. 41,000.
When it is required to find the proportion between two values (neither of which is the mean) it is necessary to use the tables to find the proportion between the mean and one value and the proportion between the mean and the other value. Then find the difference between the two proportions. 900-850 80 900-880 80 1.625 which gives a proportion of 0.2340
= 0.25 which gives a proportion of 0.0987
.'. Proportion between 850 and 880 is

0.2340 - 0.0987 = 0.1353 i.e. 13.53%
227
This part of the example illustrates the proportion between two values on the same side of the means. If the two values are on opposite sides of the mean, the calculated proportions would be added. This problem is the reverse of the earlier questions based on the same principles. The earlier problems started with the mean and standard deviation, found the z score and then the proportion from the tables. We now start with the proportion and work back, through the tables, to find a new mean value. If not more than 10% should be under 900 it follows that 90% of the area of the curve must be greater man 900. Bearing in mind that the tables only show values for half the distribution (because both halves are identical) we have to look in the tables for a value close to 0.4 (i.e. 0.9 - 0.5). It will be seen that three is a value in the Table in Appendix 3 of 0.3997 i.e. virtually 0.4. This value has a z score of 1.28. Thus 1.28 = mean - 900 80
Notes
.-. 102.4 = mean - 900 .. mean = 1002.4 hours. Thus if the mean life of the components is 1002.4 hours with a standard deviation of 80 hours, less than 10% of the components will fail before 900 hours.
Summary
Probability is a measure of uncertainty associated with an event. A subjective interpretation of probabilities is often useful for business decision making. In the case of objective probability, definitive historical information, common experience (objective evidence), or rigorous analysis lie behind the probability assignment. In the case of subjective interpretation, quantitative historical information may not be available; and instead of objective evidence, personal experience becomes the basis of the probability assignment. Probabilities of all the various possible outcomes of a trial must sum to one. Probabilities are always greater than or equal to zero (i.e., probabilities are never negative) and are less than or equal to one. 'The smaller the probability, the less likely the chance of the event happening. There are two kinds of probability distributions, discrete and continuous. Discrete probability distributions are those where only a finite number of outcomes are possible. Continuous probability distributions are those which represent a continuouslv variable random variables.
Keywords
probability: Probability is a mathematical measure of uncertainty. Experiment: An activity that produces some result and which can be repeated in identical environment is called an experiment. Deterministic experiment: Those experiments whose outcomes can be predicted are called Deterministic Experiments. I I Sample space: The set of all possible outcome of a random experiment is called its sample space. Event: Event is a subset of sample space of a random experiment Equally likely outcomes: The outcomes of a random experiment are said to be equally likely if each of the outcomes stand equal chance of occurrence. Mutually exclusive events: A collection of events in which happening of one rules out the happening of the other are called mutually exclusive events. Exhaustive events: The combination of all the possible events of a random experiment is called exhaustive events. \ Random variable: A variable whose value is determined by the outcomes of a random experiment is called a random variable. A random variable is also known as a chance variable or stochastic variable. Discrete random variable: A random variable that can assume one of the finite number of values is called a discrete random variable. Probability distributions: Discrete probability distributions are those in which the random variable can take on only specific values. Discrete Random Variables: A countable set is one that can be placecTihtolbne to one correspond with the set of integers. Binomial distribution: Binomial distribution is one of the simplest and yet extremely useful theoretical distribution of discrete random variables. It was discovered by J.Bernoulli.
Probability Theory
Notes
Review Questions
1. From past experience it is known that a machine is set up correctly on 9 0 % of occasions. If the machine is set up correctly then 95% of good parts are expected but if the machine is not set up correctly then the probability of a good part is only 30%. On a particular day the machine is set up and the first component produced and found to be good. What is the probability that the machine is set up correctly?
:
2.
If the probability of obtaining heads when tossing a certain coin is Vi, what is the probability of obtaining heads four times in nine tosses?
antitative Techniques
3.
If the probability of obtaining a 6 when throwing a certain die is Vi, what is the probability of obtaining a 6 four times in nine throws? In how many different ways the first three places can be filled in a race in which there are 11 horses? Five bolts are selected at random from a box containing six sound and three faulty bolts. What is the probability of obtaining (I) five sound, (ii) four sound and one faulty, (iii) three sound and two faulty, (iv) two sound and three faulty bolts? A batch of 5,000 electric lamps have a mean life of 1,000 hours and a standard deviation of 75 hours. Assume a Normal Distribution. a) b) c) d) How many lamps will fail before 900 hours? How many lamps will fail between 950 and 1,000 hours? What proportion of lamps will fail before 925 hours? Given the same mean life, what would the standard deviation have to be to ensure that not more than 20% of lamps fail before 916 hours?
1
4.
Notes
5.
6.
7.
Thirty chief executive officers in a certain industry are classified by age and by their previous functional position as shown in the table below:
Age Under 55 4 1 4 9 55 and older 14 5 2 21 Total 18 6 6 30
Previous Functional Position Finance Marketing Other Total
Suppose an executive is selected at random from this group. a. What is the probability that the executive chosen is under 55? What type (marginal, conditional, joint) of probability is this? What is the probability that an executive chosen at random is 55 or older and with Marketing as the^ previous functional position? What type of probability is this? Suppose an executive is selected, and you are told that the previous position was in Finance. What is the probability that the executive is under 55? What kind of probability is this? Are age and previous functional position independent factors for this group of executives?
b.
c.
d.
further Readings
gRTufte, The Visual Display of Quantitative Information, Graphics Press p,N.Mishra, Quantitative Techniques for Managers, Excel Books
Probability Theory
Notes
pH Erickson and T A Nosanchuk, Understanding Data, McGraw Hill
Si
P u n i a b
Technical University 231
Unit 1 1 T h e o r y o f
Notes
Estimation and Test of Hypothesis

Unit Structure
Introduction Theory of Estimation Point Estimation (Properties of Good Estimators) Methods of Point Estimation Interval Estimation Sampling Distributions Sampling Theory The Quantitative Models of Inferential Decisions Statistical approaches to inferential decision-making The General Inferential Decision Algorithm Specific Decision Areas Chi-square Distribution The Z and t Distributions The F-Distribution Concluding Comments Summary Keywords Review Questions Further Readings
Learning Objectives
After reading this unit you should be able to: Describe theory of estimation Apply point estimation techniques to statistical problems Apply interval estimation techniques to statistical problems Explain the conceptual model for inferential decisions Explain the quantitative models of inferential decisions Read, comprehend and apply algorithms Apply the general inferential decision algorithm Specify decision areas Solve the one-sample mean problem with a small sample size Apply and interpret t-test, z-test etc. Solve the more than two-sample mean problem ?
duction
of the most useful applications of the theory of statistics is to enable one pake intelligent and informed guess for different characteristics of a Ljulation under study using a small sample selected from the population in JJieway. ^ftjfpothesis is a statement (that may be true or false) aimed at explaining jjjpie event or phenomenon the applicability of which depends on testing the on the samples taken from the population. The root of term lies from the orient Greek, hypotithenai meaning "to put under" or "to suppose". The gjentific method requires that one can test a scientific hypothesis. Scientists anerally base such hypotheses on previous observations or on extensions of gjentific theories. This unit deals with various aspects of estimation and j^pothesis testing.
Theory of Estimation and Test of Hypothesis
Notes
Theory of Estimation
fjet X be a random variable with probability density function (or probability mass junction)/(X; 0,, 0 ,.... 0,,), where 0,, 0 ,.... Q are k parameters of the population.
2 2 k
Oven a random sample X
,X
,X
from this population, w e may b e

2
interested in estimating one or more of the k parameters 0,, 0 ,
0 In order
r
to be specific, let X be a normal variate so that its probability density function can be written as N(X : u, a ) . We may be interested in estimating u or a or both on the basis of random sample obtained from this population. It should be noted here that there can be several estimators of a parameter, e.g., we can have any of the sample mean, median, mode, geometric mean, harmonic mean, etc., as an estimator of population mean u. Similarly, we can use either S -
k*.-Z)'
or
-l^n*.-*)'
as an estimator of
population standard deviation a. This method of estimation, where single statistic like Mean, Median, Standard deviation, etc. is used as an estimator of population parameter, is known as Point Estimation. Contrary to this it is possible to estimate an interval in which the value of parameter is expected to lie. Such a procedure is known as Interval Estimation. The estimated interval is often termed as Confidence Interval.
Point Estimation (Properties of Good Estimators)

As mentioned above, there can be more than one estimators of a population parameter. Therefore, it becomes necessary to determine a good estimator out of a number of available estimators. We may recall that an estimator, a function of random variables X X , , X, is a random variable. Therefore, we can say that a good estimator is one whose distribution is more concentrated around the population parameter. R. A. Fisher has given the following properties of a good estimators. These are: (i) Unbiasedness (ii) Consistency (hi) Efficiency (iv) Sufficiency.
UnbiasedneSS
Ah estimator t (X,, X ,
2
XJ is said to be an unbiased estimator of
Notes
parameter 9 if ( t) = Q. If E ( I ) * 0, then t is said to be a biased estimator of 0. The magnitude of bias E( f) - 0. We have seen in 20.2 that E ( x ) = u , therefore, X is said to be aJSf? unbiased estimatorbf population mean u. Further, refer to 20.4.1, we nolf that E ( 5 ) = o n
2 2
, where = ^ ( X , - X ) . Therefore, S i s a b i a s l
-1 estimator of of* The magnitude of bias Consistency
" \
It is desirable to have an estimator, with a probability distribution, that comesj closer and closer to the population parameter as the sample size is increased An estimator possessing this property is called a consistent estimator. estimator f(X,, X ,...... X ) is said to be consistent if its probability distribution
2 n
converges to 0 as n -> oo. Symbolically, we can write P(t -> 0) = 1 as n - o o . Alternatively, t is said to
n n
be a" consistent-estimator of 0 if E{t ) -> 0 and Var(t ) -> 0, as n - o o .

n n
We may note that X is a consistent estimator of population mean u because _ E(x)=p and
V4r(x) =
>0asn->oo.
Note: A consistent estimator is necessarily an unbiased estimator. Efficiency Let r, and f be two estimators of a population parameter 9 such that both are
2
either unbiased or consistent. To select a good estimator, from f, and f , we

2
consider another property that is based upOrtdts variance. It fj and t are two esnmators of a parameter 0 such that both of them are either
2
unbiased or consistent, then f, is said to ba more efficient than f if Var(t) <

2
Var(t ). The efficiency of an estimator is meaûred*by its variance.

2
For a normal population, we know that both the sample mean and median are unbiased estimator of population mean. However, their respective variances are and n
n
' -\ o 7i a / where a is population variance. Since < n 2 n therefore, sample mean is said to be efficient estimator of population mean.
CJ
2
Remarks: The precision of an estimator = 1/ S. E. of estimator.
An estimator having minimum variance among all the estimators of a population parameter is termed as Most Efficient Estimator or Best Estimator. If an estimator is unbiased and best, then it is termed as Best Unbiased Estimator. Further, if the best unbiased estimator is a linear function of the sample observations, it is termed as Best Linear Unbiased Estimator (BLUE). It may be pointed out here that sample mean is best linear unbiased estimator of population mean. Cramer Rao Inequality This inequality gives the minimum possible value of the variance of an unbiased estimator. If t is an unbiased estimator of parameter 9 of a cpntihuous population with probability density function /(X, 9 ) , then Var{t)>nE 1 dQ
Notes
Sufficiency An estimator t is said to be a sufficient estimator of parameter 9 if it utilises all the information given in the sample about 9. For example, the sample mean X is a) sufficient estimator of |J. because no other estimator of fi can add any furtheriinformation about p. Let
X,, %
y
...... X n b e
a random sample of n independent observations from a

2 2
population with p.d.f. (or p.m.f.) given by / ( X ; 9 , , 9 ) , where 9, and 9 are two parameters. The joint probability distribution of X,, X ,
2
X , denoted by
n
L(X; 9 , , 9 )i is given by :
2
L(x e ,e )=/(x ;e
; ] 2 1
1 /
e )
2
/(x;9,,9 )
2
An estimator t is said to be sufficient for 9, if the conditional p.d.f. (or p.m.f.) of X,, X ,
2
1 I
X given t is independent of 9 , , i.e.,

r
j 2 1 2 5
/ ( X ; 9 , 0 | ) x f f ( X ; 0 , 0 ) x .... x / ( X ; 9 9 )
= h(X ,X , .... X ) whereg(f, 9 , )

] I n t
is p.d.f. (ot p.m.f.) of t and h is a function of sample values that is^hdependent of 9 , . We jmay note that each of the functions g(t, 9 , ) and h(X X , X) may
y 2
or may npt be function of 9 .

2
Alternatively, we can write the sufficiency condition as

/(X,;09 )
2
/(X ;00 )
2 2
/(X ; 9,, 9 ) =
2
g(t,
2
9,)
h(X ,
} w
X ,
2
X),
which
implies that if the joint p.d.f. (or p.m.f.) of X,, X , sufficient estimator of 0,.
X c a n be written as a
function of t and 0, multiplied by a function independent of 8 , , then t is
Sufficient estimators are the most desirable but are not very commonly available. The following points must be noted about sufficient estimators:
Notes
1. 2. 3.
A sufficient estimator is always consistent. A sufficient estimator is most efficient if an efficient estimator exists. A sufficient estimator may or may not be unbiased.
Example 1 I X,, X , X is a sample of n independent observations from a normal population with mean \x and variance a , show that X is a sufficient estimator
2 n 2
of p but S = ~Z(Î ~ X) Solution
is not sufficient estimator of a .
The probability density function of a normal variate is given by
T^-"* /(^;U,A) = - 1 =E ^ A V 2 7 I
Thus, the joint probability density function of X], X ,

2
X is given by
n
/ ( ^ ;
, o ) x / ( Z
, o ) x
....
X/|A'; A)
M)
,oV2tT,
We can write X,u = (X, - X)+ (X - u) Squaring both sides and taking sum over rj observations, we get
X(X,. - u) -1(X, - X ) + X (X - u) 2 1 (X, - X)(X - u)

2
= I(X, - X )
+ n(X - u) + 2 ( X X x 2+
U)L (X, - X )
2 LAST T E R M I S ZER
= I ( , - ) "(X - U) (
= nS + n(X-[if
2 2
O)
Therefore, we can write
( X , - u) = ^-S - -AR(X - u)
Hence/( ;U O)X/(X ;n O)X .... X/(X;u,a)

XL / 2 /
OV27T
L V 4 -4P^) C =F
2
S2
-AI^- m)
2O
<
2O
0V2J1 j
^(X ,A)X/7(S ,A)

2 /M
r
gjnce h is independent of u, therefore X is a sufficient estimator of p However, S is not sufficient estimator of a because g is not independent of
2 2
Notes
further, if we define S = ( X . - u) , then

Student Activity
/ ( X , ; u , a ) x / ( X ; u , a ) x .... x /(X;u,a):
2 2
2a'
1. Show
that
is
an
2
unbiased estimator o f r s
Thus, the newly defined S becomes a sufficient estimator of. a . We note that ft(X,, X , . . . . . . X ) = 1 in this case.
2 n
2. What are the most desirable characteristics . of an estimator?
The above result suggests that if u is known, then we should use S =Z(X,-u) n . estimator of a .
2 2
rather than S = y ( X - - X ) n '

v
because former is better
Methods of Point Estimation

Given various criteria of a good estimator, the next logical step is to obtain an estimator possessing some or all of the above properties. There are several methods of obtaining a point estimator of the population parameter. For example, we can use the method of maximum likelihood, method of least squares, method of minimum variance, method of minimum X , method of moments, etc. We shall, howeyer, use the most popular method of maximum likelihood.
2
Method of Maximum Likelihood Let X,, X ,

2
X be a random sample of n independent observations from a

n
population with probability density function (or p.m.f.) / ( X ; 0), where 0 is unknown parameter for which we desire to find an estimator. Since X,, X ,
2
X a r e independent random variables, their joint probability

n
function or the probability of obtaining the given sample, termed as likelihood function, is given by
L = / ( X , ; 0) . / ( X ; 0) /(X ; 0) = I W . ^ ) We have to find that value of 0 for which L is maximum. The conditions for
2
maxima of L are
= 0 and ^ - ^ < 0 T h e value of 0 satisfying these dQ d

a
conditions is known as Maximum Likelihood Estimator (MLE). Generalizing the above, if L is a function of k parameters 0,, 0 ,
2
B , the first
k
dL dL order conditions for maxima of L are: 50, C0,
cL c0
t
x j
n
gj
v e s
\ simultaneous equations in k unknowns 9,, 9 ,

2
8^, and can H R
solved to get k maximum likelihood estimators.

Notes
Sometimes it is convenient to work using logarithm of L. Since log L is a monotonic transformation of L, the maxima of L and maxima of log L occur a f | the same value. Properties of Maximum Likelihood Estimators 1.
2.
The maximum likelihood estimators are consistent. The maximum likelihood estimators are not necessarily u n b i a s e d . If a maximum likelihood estimator is biased, then hy slight modification^ it can be converted into an unbiased estimator. If a maximum likelihood estimator is unbiased, then it most efficient.
w i l l
3.
also
1
b e
4.
A maximum likelihood estimator is sufficient provided sufficient! estimator exists. The maximum likelihood estimators are invariant under functional transformation, i.e., if f is a maximum likelihood estimator of G, then /(f) would be maximum likelihood estimator offiQ).
1
5.
Example 2 Obtain a maximum likelihood estimator of n (the proportion of successes) in a population with p.m.f. given by ^ _ y ; ) = "Q TC (1- TC )"~ ' where X denotes
x X n
the number of successes in a sample of n trials. Solution Since ? C i t (1 - it)""* *

x x st n e
probability of X successes out of n trials, therefore,

x x
this is also the likelihood function. Thus, we can write I - "C it (l - it)"* Taking logarithm of both sides, we get logl = log" C + X log 7t + (n - X)log(1 - n)
x
Differentiating zv.r.t. it, we get dlogL X " - X _ - - 9 + u an it l Tt

n
for maxima of L.
or
X ( l - it) - (n - X)7t = 0
X This gives k = , where denotes an estimator of it. d logL It can also be shown that ." j
nit
2 <
x " when * = .
n
238
Interval Estimation
Using point estimation, it is possible to provide a single quantity as an estimator of a parameter. Any point estimator, even if it satisfies all the characteristics of a good estimator, has a limitation that it provides no information about the magnitude of errors due to sampling. This problem is taken care of by the method of interval estimation, that gives a range of the estimator of the parameter. The method of interval estimation is based upon the sampling distribution of an estimator. The standard error of the estimator is used in the construction of an interval so that the probability of the parameter lying within the interval can be specified. Given a random sample of n observations X,, X ,
2 2
Notes
X , we can find two

n
values /, and l such that the probability of population parameter 9 lying between /, and l is (say) rj. Using symbols, we can write P(/, < 0 < / ) = m
2 2
Such an interval is termed as a Confidence Interval for 0 and the two limits /, and I are termed as Confidential or Fiducial Limits. The percentage probability or confidence is termed as the Level of Confidence or Confidence Coefficient of the interval. For example, the level of confidence of the above interval is 100n,%. The level of confidence implies that if a large number of random samples are taken from a population and confidence intervals are constructed for each, then 100r|% of these intervals are expected to contain the population parameter 0.
2
As compared to point estimation, the interval estimation is better because it takes into account the variability of the estimator in addition to its single value and thus, provides a range of values. Unlike point estimation, interval estimation indicates that estimation is an uncertain process. Determination of an Approximate Sample Size for a given Degree of Accuracy Let us assume that we want to find the size of a sample to be taken from the population such that the difference between sample mean and the population mean would not exceed a given value, say e, with a given level (confidence. In other words, we want to find n such that P ( | x - n | < e ) = 0.95 (say) .... (1)
Student Activity
1. A random sample of 500 pineapples was taken from a large consignment and 65 were found to be bad. Estimate the proportion of bad pineapples in the consignment and obtain the standard error of the estimator. Deduce that the percentage of bad pineapples in the consignment almost certainly lies between 8.5 and 17.5. 2. A group of 1000 students was interviewed from Delhi University. 358 of them were found to be science students. Estimate the proportion of science students in the University.
Assuming that the sampling distribution of X is normal with mean u and

c J c
C-
" x ~ R , w e can write YLN

\
1.96 <
</JN
< 1.96
0.95 or P
X-u
o/YFN
<1.96
0.95
22
or P X - u < 1 . 9 6
= 0.95 .... (2)
Notes
Comparing (1) and (2), we get
e = 1.96-^L or n
Remarks: 1.
(1.96a V e ;
3.84a e
The sample size required with a maximum error of estimation, e and z a with a given level of confidence is n = * , where z is the value of standard normal variate for a given level of confidence and a is the variance of population.
2
2.
For a given level of confidence and a , n is inversely related to e , the square of the maximum error of estimation. This implies that to reduce e to , the size of the sample must be k times the original sample size.
2
3.
The lesser the magnitude of e, the more precise will be the interval estimate.
Example 3 What should be the sample size for estimating mean of a normal population if the probability that sample mean differs from population mean by not more than 3 0 % of standard deviation is 0.99. Solution Let n be the size of the sample. It is given that P ( | X - u | < 0.30a) = 0.99 ....(1)
Student Activity
1. A survey of middle class families of Delhi is proposed to be conducted for the estimation of average monthly consumption (in Rs) per family. What should be the size of the sample so that the average consumption is estimated within a range of Rs 300 with 95% level of confidence. It is known that the standard deviation of the consumption in population is Rs 1,600. 2. What should be the size of the sample in the above case so that the average consumption is estimated within a range of Rs 300 with 90% and 9 9 % levels of confidence respectively?
Assuming that the sampling distribution of is normal with mean u and a S.E. , we can write
X
X - u <2.58 Comparing (1) and (2), we get
0-99 (from table of areas)
.... (2)
'2.58 0.30a = 2.58^L z z > = = 73.96 or 74 (approx.) 0.30 Jn Confidence Interval for Population Standard Deviation Let S = ^jX(X, - X j be the sample standard deviation of a random sample of size n drawn from a normal population with standard deviation a. It can be
>
shown that the sampling distribution of S is approximately normal, for large values of n, with mean a and standard error ~J^ - ^ > taken as a standard normal variate. Example 4 A random sample of 50 observations gave a value of its standard deviation equal to 24.5. Construct a 9 5 % confidence interval for population standard deviation o. Solution . Since V 2M a is not known, we use its estimate based on sample. Thus, we can write 24.5 2.45. S.E.(S) = It is given that S = 24.5 and n = 50 (large). We know that S.E.(S) =
= us
a/4ln
Notes
V2 Vioo
Hence 95% confidence interval for a is given by 24.5 - 1.96 x 2.45 < a < 24.5 + 1.96 x 2.45 or 19.7 < a < 29.3
Sampling Distributions
Assuming that experimental conditions are carefully defined so that the subjects and their achievement are compatible in each case, the behavior is quantified and shown in graphical form in Figure 11.1, where the achievement (x) is plotted on one axis and the frequency of the achievement (f) attained by number of subjects on the other. Just to remind you the bar form presentation is known as histogram and the linked midpoints of the bars as a frequency polygon. The smoothed frequency polygon is known as the frequency curve or simply as a frequency distribution. All of the performance variables cited above result in similar pictorial presentations. The actual curves may not be exactly symmetricalthere may be minor skewnessbut thev all have the appearance of the normal distribution. The normal variable is one of the most widely encountered stochastic variables.
Hrstogrorn and Frequency Polygon
Frequency Curve
Figure 1 1 . 1 : Frequency Distribution of a Normal Variable
Notes
Similarly, other variable clusters may be detected in the decisiorvenvironment. For example, to name and show just two more, any growth process, e.g., growth of a person, growth of product sales, etc., is usually exponentially distributed. Behaviour that is related to arrivals like cars arriving at a traffic light, customers arriving at a bank teller's window, telephone calls arriving at a switchboard, etc., is usually Poisson distributed as shown in Figure 11.2.
Exponential
Poisson
Figure 11.2 Exponential and Poisson Distributions
The major distributions will be examined in this unit.

Student Activity
1 Probability distribution of number of heads in a toss of two fair coins is given below
Number of heads (X)
P(X)
0.25 0.50 0.25
PfX)
2. As the number of coins increases, the probability distribution tends more and more to normalityDraw histograms for 1 0 , 1 5 or more coins to demonstrate that it actually is the case.
Sampling Theory
For time and cost reasons inferential decision-making is usually based on a sample in which the variable may be studied. A sample is a small portion of the population which is used to study the behavior of the population. Then from the study of the sample behavior, inferences concerning the parent population behavior may be made. A conceptual model depicting these relationships is shown in Figure 11.3.
Sample Statistics: n p Sample Size Sompie Proportion Population Parameters: N P Population Size Population Proportion
j,? _ j2 _ , . . .
S a m p l e Arithmetic M e a n S a m p l e Variance S a m p l e S t a n d o r d Deviation S a m p l e O b s e r v a t i o n (Datci)
u a' a X
P o p u l a t i o n Arithmetic M e a n P o p u l a t i o n Variance P o p u l a t i o n Standard Deviation P o p u l a t i o n O b s e r v a t i o n (Data)
Notes
sampling
Figure 11.3: Conceptual Model for Inferential Decision Making
Given a well-defined variable or set of variables about which a decision has to be made (in the present case TV-viewing) the population is identified. Obviously the population consists of all people who have access to a television set. The variable is quantified by obtaining the parameters as shown in Figure 11.3. One may equate this operation to a bookkeeping function of decision-making: to generate data sets. First the population size (N) is ascertained. Then, depending upon the problem, which in inferential decision-making may be classified as a proportion, mean or variance problem, the pertinent data are obtained. Keep in mind, though, that except in rare cases these data are unknown for the population because of time or cost reasons and appropriate sample statistics are used as shown in the model. A proportion problem would be one in which the decision had to do with how many viewers of the population or sample were tuned into a certain show and not to others at a certain time. Means, it will be recalled as a basic statistics topic, are measures of central tendency, or, some average behavior. There are a number of ways of measuring averages. Medians, modes and arithmetic, geometric and harmonic means may be recalled. Only the arithmetic mean is used in the inferential decision formulas. Mean problem in the TV viewing setting may be to find the mean number of hours per day that viewers spend in front of the tube. Finally variances are measures of dispersion or scatter of the data within the data set. Dispersion identifiesMhe consistency of a behavior. Again there are several measures of dispersion such as the range, average deviation, standard deviation or its squared version the variance and quantiles like percentiles, quartiles and so on. Only standard deviations and variances are used in inferential decision making as discussed here. Empirical sample sizes are based on given specifications concerning confidence level (z), allowable sampling error (E) and either variance s or proportion (P) depending on the type of problem. The appropriate formulas are shown below only for review purposes. These formulas typically result in sample sizes that are considered to be too large by most decision makers who
2
operate under time and cost constraints. They are used in those cases WI special care must be exercised such as decisions involving possf subsequent litigation. Their form is zo- , z P(l-P) n =() andn = E E
2 2
Notes
In both a priori and empirical samples, the sampling technique must result' a random sample. But the sample should also be quantitatively representati of the parent population.
The Quantitative Models of Inferential Decisions

Relationships between the decision maker's strategies and the ultima decision of the states of nature concerning the initial hypothesis is shown Figure 11.4.
States of Noteie E H E R J H Y P O T H E S I S IS T R U E A C C E P T H Y P O T H E S I S DECISION S I R A L E G Y R E J E C T H Y P O T H E S I S I N C O R R E C TD E C I S I O N TYPE 1 E R R O R ALPHA RISK C O R R E C TD E C I S I O N H Y P O T H E S I SI S F A L S E I N C O R R E C TD E C I S I O N TYPE II E R R O R B E T A RISK C O R R E C T D E C I S I O N
Figure 11.4: Decision Pay-Off Matrix
Inspection of Figure 11.4 shows that it represents a pay-off matrix. If a decision maker accepts a hypothesis that ultimately turns out to be true or similarly rejects a hypothesis that ultimately turns out to be false, a correct decision has been made. Laurels, bonuses and profits go to that decision maker. Quite the opposite goes to the decision maker who accepts a false hypothesis or rejects a true one. Obviously, an error has been made that led to the incorrect decision. In order to differentiate between the two different types of error, they are known as Type I and II errors as indicated. Any decision maker who makes a decision in an uncertain environment knows that he is vulnerable to committing these errors and will, therefore, try to guard against that risk. Again, in order to be aBle to differentiate between the two types of risk, they are called alpha and beta risks as indicated. Depending upon the underlying distribution, this risk can be calculated or determined before the decision maker acts. Thus, if its odds or probability is sufficiently low, the pertinent decision strategy is played, as shown in detail later. Obviously, since either decision strategy will be used, the decision probability is 1.00. If an increases b decreases, in order to reduce both a and b, the sample size must increase. It may be noted that (a + b) need not to be equal to 1, which can be easily understand from Table 11.1.
Table 11.1
Stales of Nature Effect Hypothesis is True Accept Hypothesis Decision Strategy Retect Hypothesis Summation of Probability a (Alphc) Loss of Opportunity 1 (l-B Hypothesis is Folse fj (Beta) Loss of Money/Cash
IM
Notes
Before turning to these quantitative aspects, a few words may be said about * the qualitative differences between the two types of errors. Nobody likes to make an incorrect decision. An incorrect decision is bad per se. But is there a qualitative difference? Could it be that one is worse? The answer can be seen by looking at the actual pay-offs. Let us take a familiar example from the American automobile industry. When Ford Motor Co. accepted the hypothesis in the late 1950s that it could sell the Edsel, it committed a Type II error because the market proved that hypothesis to be false and Ford lost some $750 million. When, at about the same time and throughout the following two decades, the entire American auto industry, excepting American Motors, rejected the hypothesis that there was a large domestic market for well-built smaller cars, it committed a Type I error. The states of nature determined that a demand existed and the vacuum was filled by imports. The American auto industry lost billions of dollars because of this incorrect decision. But, so may be asked, did the companies actually "lose" this money as Ford lost its $750 million? Of course they lost it. Granted, Ford's loss was a financial onean out-of-pocket loss. In the case of the imports the industry "only" lost the opportunity to earn the billions of dollars that we forfeited to foreign companiesa loss, nevertheless, which was accentuated by a multiplier effect reflected in fewer jobs and fewer subcontracts. One may generalize from this example a Type 1 error usually results in an opportunity loss while a Type II error typically results in a financial loss in business decisions. Which one is the more serious error? Well, it could be argued that a large financial loss can put a firm out of business. This is true. But let us take Ford's example again. The company "shrugged off" this loss and began to look immediately for new opportunities. They found one in the spectacularly successful Mustang. In no time at all the Edsel losses were recouped and quickly forgotten. How about the imports? The chances are very good that Detroit will never forget that mistake; for, when an opportunity has been lost, it is usually gone forever. This is especially true in a free market environment which may allow second, third and fourth chances as the sequels to successful movies clearly show. But it is typically the first one in the market who accepts a true hypothesis who gains the most, particularly in a situationto return to the auto illustrationwhere massive capital investments are necessary.
Statistical Approaches to Inferential Decisionmaking
Notes
There are two general statistical approaches to inferential decision-makingJj They are known as Classical and Baysian. The primary difference between thef two approaches is that Classical methodology is exclusively based on the! frequency of an event's occurrence in determining the probability OH occurrence while Baysian analysis incorporates subjective aspects. One i j j strictly empirical while the other allows a-priori considerations? When a!i decision maker is allowed to bring subjective components to the decision! making taskthat is, his belief or bias must be injected into the decision! process. This bias is unacceptable to many decision makers who prefer to base their decision on the empirical data alone. The algorithms that are now! discussed are rooted in classical methodology.
The General Inferential Decision Algorithm

There are four distinct steps in this algorithm. For every decision they arei repeated in the order in which they are listed below. 1. 2. 3. 4. Formulating Hypotheses Selecting a Confidence Level Selecting the Decision Making Tool Making the Decision
Formulating Hypotheses There are two types of hypotheses that must be formulated at this point. One is known as the null hypothesis and the other as the alternative hypothesis. The null hypothesis may be likened to a buoy that guides the boatsman into the harbor. To moor the boat to a dock is what really counts. That latter task may be likened to the alternative hypothesis. The null hypothesis exists for statistical testing purposes only. In the alternative hypothesis, the decision maker's hunch concerning the decision situation is reflected. Selecting a Confidence Level The meaning of 95% confidence is that if a decision is made one hundred times, ninety-five times it will be correct and five times false. Obviously, the decision maker who selects 99% confidence in this second step of the decision algorithm wishes to be more confident. Less risk is taken. As a general rule of thumb, it may be said that the 95% confidence level is chosen for all internal decisions, that is, decisions that pertain to the organisation proper, whereas the higher 99% level is applied when outside publicespecially when there is a possibility of a legal review of the decision's qualityis involved. The choice, however, is left to thedecision maker. The confidence level is usually communicated in terms of the risk that is involved. Thus, in the present illustration, the levels would be shown as
Student Activity
There are two brands of car types - A and B. A sample of 100'tyres of type A has average life of 37,500 Km and 75 tyres of type B has average life of 39,000 Km. You wish to determine whether type B is better in terms of lifetime. Formulate the null and alternate hypotheses.
246
P.05 or P.
01
Which is read P at the .05 or .01 level. This means that risk probability is 5% or 1%. Selecting the Decision Making Tool The third step of the algorithm is the most important one. At this point the decision formula or the test, as it is called, must be chosen that is appropriate for the variable's distribution, the sample size and the number of samples that are involved. The details will be discussed in each of the specific decision areas that are examined later. For present purposes it should be recalled that any stochastic variable has its own distribution and that many variable distributions possess similar shapes. The mathematical properties of many of these major distributions have been identified in terms of the area under the distribution for given decision values (x-values) and therefore, the probabilities for their occurrence. If the distribution is "known" a parametric tool or test may be employed. When they are unknown, nonparametric tests are used. Some of the parametric tests are T-test, F-test, etc., whereas nonparametric test in % test (Chi-square). For some tests, the sample size is a determining factor. It may be recalled that a distinction is made between large and small samples. Also, the number of samples may preclude the application of a given test. For this reason it already has been indicated that decisions involving proportions, means or variances are classified into 1-sample, 2-samples and more-than-2-samples tests. In the anti-acid illustration a one-sample proportion test would be used as shown later. Making the Decision After the quantitative analysis in the third step of the algorithm has been performed, the decision is made. Returning to the beginning of the algorithm and the null and alternative hypotheses that have been formulated for the decision situation, three options are available. A look at the decision matrix in Figure 10.4 facilitates an understanding of these options. Remember that the decision strategies are to accept or reject the hypothesis. Using the null and alternative hypotheses in such a way that only one can be accepted or rejected, the options present themselves as follows: Option i : Accept the null hypothesis and reject the alternative hypothesis. Option 2 : Accept the alternative hypothesis and reject the null hypothesis. The details of how to select the proper decision option is shown below later. At this point, however, the ties that exist between the two types of risks and two types of hypotheses should be noted. Acceptance of a null hypothesis makes the decision maker vulnerable to commit a Type II error. Thus, the decision carries a beta risk. On the other hand acceptance of an alternative hypothesis entails an alpha risk and a potential Type I error. Realise for a moment the meaning of the null and alternative hypotheses in decision terms.
2
Notes
Using the anti-acid illustration, acceptance of the null hypothesis would mean that we have substantiated the competitor's claim. Then our own test, which consisted of selecting some subjects and studying the effect of the competitor's product on their heartburn, would,leave us with no choice but to accept "the equality condition", that is, Ho : P=80%. In other words, we could not find a difference between our findings and the competitor's claim. As we will see in a moment, small differences may have been observed but these differences are considered to be insignificant. If on the other hand had we been able to accept the alternative hypothesis, Ha : P<80%, we would have observed significant differences. Thus, inferential decision making always involves a test about the significance of differences. The answer at the end of the quantitative analysis conducted in Step 3 of the algorithm, always is either significant or insignificant at the chosen risk level. More will be said about the concept of significance in the following discussion.
Specific Decision Areas

The time has come when the various types of inferential problems may be classified as either proportion, mean or variance problems. Each type of problem has its own type or types of tests. Each illustration may be seen as a blueprint for the solution of similar problems. We will go through the calculations manually because they convey the reasoning or concept behind inferential decision-making. In the real life situation virtually all decisionmaking tools that are discussed in this book are computerised. X Distribution
2
Chi-square, represented by x , is the distribution of sum of squares of a number of independent variables with standard normal distributions. Let XI,
X2, ...... X
be n independent normal variates with means

L 2 ; i
UI, U2,
u and
n
standard deviations o , o , . . . . a , then the random variable formed by adding the squares of standardized distributed according XI, X2, X as given below is
N
tox
+
X -u.
2
1 /
Or
(V
V
i
MV
Xn
This theoretical distribution is one of the most widely used distributions in inferential statistics. It can be proved that distributions approximate to the chisquare distribution if the null hypothesis is true under reasonable assumptions.
Chi-square distribution is characterized by a single parameter (n). The degree of freedom of this distribution is (n), the number of variates.
Notes
. 4
Various statistical characteristics of this distribution are listed below. Parameter Degrees of freedom Mean Median Mode Variance Skewness Kurtosis
12/
n> 0 n n n y. n-2 In (Approximately) For n>=2
The chi-square distribution is commonly used for the following purposes: Testing goodness of fit of an observed distribution to a theoretical one Testing the independence of two qualitative data.
The Proportion Problem: Chi-Square Test Proportion problems are of the "so many out of so many" variety. Consider some examples: 3 out of 5 doctors prescribe a certain aspirin There were so many democratic, republican, liberal and socialist votes cast out of a total number of votes in a certain election districts So many defective parts per shift were manufactured by the morning versus afternoon shifts.
In each case, the intent of the decision is to test for differences against another claim citing another proportion of physicians who prescribe that aspirin, results in other election districts or quality control experience ratios of other shifts or companies.
All proportion problems may be solved by means of the Chi-squared (x ) distribution whose form is
Notes
x =Z
\ o -
where observed values (o) and expected values (e) are related. The degrees of freedom of the distribution are found by: df=(r-V(k-l) where r = rows and k = columns are given by the table in which the observed and expected values may be recorded. The best way to understand the meaning of the expected values is to see them as the values that would be "true" or expected if the null hypothesis were true. If the degrees of freedom equal 1, Yate's correction must be applied to the formula which then appears as
x =
|o - d - 0.5
One-Sample Proportion Problem with Two Response Choices Let us see the x test in operation in the case of one-sample proportion problem with two response choices. Considering the ongoing illustration let us test the hypothesis that the mean is equal to 80%. Ho: P=80% and Ha:P<80%
2
In order to perform the analysis data must be generated. Suppose that we test a sample of n = 200 persons who suffer from acute heart burn conditions. These conditions were induced by us under controlled experimental conditions. Upon receiving the competitor's medication, readings were taken on the proportion of subjects who were relieved of their symptoms. Let us say that 144 indicated that they had found relief. The decision situation up to this point is shown in Figure 11.5
Figure 1 1 . 5 : The O n e - S a m p l e Proportion Test
250
Note the known population parameter of 8 0 % - orexpressed as a proportion0.80. Remember that population parameters are usually not known because of time and cost constraints. In this case the population proportion (P) is known because it is given by the claim. The sample proportion (p) has been found by the experiment that was performed - it is 144/200 or p=0.72. There is a difference between the population and sample proportions but the question is whether this difference is due to chance phenomena or due to the fact that the sample was drawn from a different parent population, that is, one that shows in general a less than 80% effectiveness result. This possibility is indicated by the broken line circle. A difference between the sample statistic (p) and population parameter (P) that is caused by chance is known as an insignificant difference, whereas a difference that is caused by a different parent population (than claimed) is a significant difference. Note that an insignificant difference would lead to the acceptance of the null hypothesis, whereas a significant difference leads to the acceptance of the alternative hypothesis. In order to find out whether the experimental results are significant, the % test is now performed. In table format the observed as well as the expected values are shown. Remember that the expected values would be true jf the null hypothesis were true. Thus, 80% of the 200 subjects or 160 would have been expected to have found relief. The remaining calculations in the table are self-explanatory. Since there is only one sample ( K = l ) and two experimental outcomes (r=2), the degrees of freedom in this case are d f=(2-l) (1-1) which, by definition not allowing less than 1 degree of freedom, equals 1. Therefore, the Yates' correction must be employed. One additional note, the sum of the observed and expected outcomes is always the same.
o e |oe| -05 (lo-el-05)
2
Notes
lo-el-0.5 , ) e
1 50 6 01
Relief N o Relief
14/1 56
160 40
155 155
240
25
24 0 2 5
Total
200
200
X,
2
+7 51
Given this Experimentally Obtained Value (EOV) of x l=7-51 where the subscript denotes the applicable degrees of freedom, it must now be compared to the mathematically expected value (MEV) of a standardised % distribution. These distributions may be generated for each positive degree of freedom value (d) with density function.
2
(X
fix)
)
12
e Ti\)
0 < x
< cc
(f )
Tables for these distributions are available as shown in Appendix 4. By matching the decision related risk (P = 0.5) and degrees of freedom values (df=l), MEV=3.84 is found.
MX)
Notes
j
1
MEV3.84 7.51
O. 01 O O \ _ ^
M E V 6.63 7.51
Figure 11.6: Decision with Chi-Squared Distribution
The decision may now be made. It is sketched out in Figure 11.6. The shaded area between MEV and is known as the area of rejection. If EOV falls into this region the null hypothesis is rejected and consequently the alternative hypothesis is accepted. In this case a significant difference has been demonstrated. Note that the area of rejection in terms of the area under the distribution equals 5%. This is the alpha risk that the decision maker is willing to take when accepting the alternative hypothesis. Similarly if 99% confidence had been stipulated in Step II of the algorithm, MEV=6.63 would have applied to the decision. As can be seen from Figure 11.6, the decision maker can now be highly certain that a significant difference exist, because EOV still falls into the area of rejection. Of course if E O V falls into the non-shaded area sometimes called the area of acceptancethe null hypothesis would have to be accepted and any difference between the claimed population proportion and the experimentally obtained sample proportion would be called insignificant. To summarise the decision making process by using the inferential decision algorithm, it may be shown as follows:
Steps: 1
Ho : P= 80% Ha : P < 80%
II III IV
- -> ->
P-05 or P. 01 = c ^ e 7.51>3.84 or 6.63 (EOV>MEV) A significant difference exists. Reject the Ho. The claim is false and should not be made. ^ ^
252
Student Activity
The record for a period of 180 days, showing the number of electricity failures per day in Delhi are shown in the following table:
No. of failures No.of days

Determine, by using % variate?:
: :
0 12
1 39
2 47
3 40
4 20
5 6 17 3
7 2
Notes
test, whether the number of failures can be regarded as a Poisson
One-Sample Proportion Problem with More than 2 Response Choices In a number of decision situations that involve the one sample proportion problem, more than two proportional outcomes are possible. For example, the items in an attitude survey may be tested for their discrimination value by showing that a significant difference exists between the responses such as "yes", "don't know", "no" in terms of the actually observed responses and uniformly distributed expected responses. If the actual responses were uniformly distributed, obviously the item would not possess discrimination value, since an expressed attitude should result in more "yes" or "no" answers as the case may be. Let us take one item of a 15-item attitude survey, say number 14, for which 36 responses have been obtained with proportions of "yes", "don't know" and "no" as indicated in following the table. Using the inferential algorithm, the decision process would be as follows:
Step: I
H P
o yes
- P,
a '
don t know yes
= P
no
H P
- P
don't k n o w
- P
no
n m
P.WP
FFL
Sample/
Responses o P e (c-e) e 14.08 2.08 5.33
2
Hes Don't know No
25 7 4
1/3 i/3 1/3
12 12 12
Total
36
1.00
36
Note that since the expected values are uniformly distributed, their probability of occurrence the same, namely, 1/3 because there are three response choices. Each expected value is found by multiplying the probability of the occurrence of each choice by the total number of responses yielding 12 in each case. Note further that the degrees of freedom equal =[df=(3-l)(l-l)=2] so that Yates' correction does not have to be employed and the experimentally obtained value is EOV = yj: = 2 1 . 4 9 . Step: IV MEV=5.99 or 9.21 21.49>5.99 or 9.21 (EOV>MEV)
A significant difference exists. Reject Ho. The item possesses good discrimination value.
Notes
This application of the one proportion test has another interesting side to it. It is intuitively clear that if a given data set can be analysed in terms of an underlying uniform distribution as in the above illustration, any given data set can be tested against any stipulated known distribution like the normal, binomial, Poisson, hypergeometric or exponential distributions, to name commonly encountered ones. This is indeed the case. All that is necessary is to generate the probabilities of the stipulated distribution either by looking them up in readily available tables or by computer sub-routine. The % analysis is the same as shown above. It should be noted, however, that in the distribution testing case, a special degree of freedom formula must be used. It is df=r-m-l where r stands for rows as before and m for the number of parameters, e.g., mean and variance, that define the stipulated distribution. In the case of the uniform distribution, we did not have to concern ourselves with the parameters because it is defined by f(x,N)=l/N, x=l,2,3,...N and m =0
2
This type 2of % analysis represents the empirical test in Step-Ill of the algorithm for the selection of the appropriate distribution that underlies the variable about which the decision must be made. As pointed out previously, the selection of the distribution is usually made on a priori grounds. Here then is the empirical test for the decision maker who may have doubts about the applicability of a certain distribution stipulated a priori for a given data set. Two-Samples Proportion Problem with 2 Response Choices In decisions involving one-sample problems there is always one known population parameterthe proportion, mean or varianceand one corresponding sample statistic found by experimentationagain the proportion, mean or variance. The known population parameter often involves a claim which may be based on large-scale experimentation like in the previous anti-acid illustration, a company policv which sets certain goals like a fixed minimum return on investments that may be undertaken, or an engineering or scientific specification like the tensile strength of the steel that goes into a given construction project. In each decision situation the sample statistic is compared against this known or given parameter and tested for the significance or insignificance of the difference between them. A conceptual model of this methodology was shown in Figure 11.5. In the two-samples problems a comparison of two sample statistics, that is, proportions, means or variances, is made and their differences tested for significance. The pertinent conceptual model is shown in Figure 11.7. It should be noted that in this case an insignificant difference means that the samples were drawn from the same parent population of data exhibiting the variable characteristics under examination. A significant difference indicates that the parent populations from which the samples were drawn differ in their variable characteristics. They may possess more or less of them.
254

Significant Difference
Notes
Insignificant Dif--*nce
Significant Difference
Figure 11 .7: The Two-Sample Proportion Test
To illustrate the type of proportion decision problem that compares two sample proportion, where each sample has two response choices, let us suppose that we are interested in finding out whether a certain cigarette brand is perceived as a "male" or "female" brand. A survey was taken among male and female smokers who were asked whether they "fairly regularly" smoked this particular brand. Thirty-five out of 78 men and 19 out of 116 women indicated a preference for the brand. The decision algorithm reads as follows:
Step I - H :P<f*=P$
0
Po*Po
Note that the null hypothesis stipulates no difference in preference whereas the alternative hypothesis reflects the decision maker's hunch that there may be a difference. It is not necessary to stipulate the direction of this difference. Hence, the two-sided alternative hypothesis is used. 02Z7Smoke Don't Smoke Total O o/e* 19/33.25 97/82.75 116
( f
o/e 35/21.68 43/56.32 78
Grand total
194
1
Note that the data presentation within the table is somewhat different from the one-sample problem. These x tables are known as row by column o r r x k tables. In this case we have a 2x 2 table with df=(2-l)(2-l)=l, hence Yates' correction must be used. Each cell of the table contains two datathe observed and expected values. Remember that the expected values would be "true if the null hypothesis were true". In order to find them for the S M O K E response, the pertinent proportion of the two samples are pooled. Then this pooled proportion is applied to both samples as if the null hypothesis were true. In order to calculate the pooled proportion
2
x.+x, P=-
79+35
116+78 ~
54
: 0.278 194
+ 11,
255
Notes
is performed. Thus, if the null hypothesis were true 27.8% of both females and male smokers would prefer the brand. A technical term may be observed at this point. The total number of subjects participating in the experiment (n=194) is known as the grand total. The expected value for SMOKE is found by calculating (0.278)(116)=33.25 and'(0.278)(78)=21.68 and entered in the table. The D O N ' T SMOKE expected values are found by subtraction since the sum of the observed and expected values is the same. Then
2 X, =
(\9-33.25 \-0.5) (I 35-21.68 I -0.5) (I 97 - 82.75 I -0.5) (I 43 - 56.32 I -0.5) + + zrzz + = 19.80 21.68 82.75 56.32 33.25 or, TV MEV = 3.84 19.80>3.84 (EOV>MEV) EOV= x =19.80
1 2
A significant difference (indeed, even at .01) exists between the sample proportions. Reject H .
0
Student Activity
The employees in 4 different firms are distributed in three skill categories shown in the following table. Test the hypothesis that there is no relationship between the firm and the type labor. Let the level of significance be 5%.
As seen by the size of the sample proportions, this brand is perceived as a "male cigarette". The x -table as constructed for the proceeding illustration may be extended for more than two samples with two response choices. Rather than a being 2 x 2 table, it would then become a 2 x k table, where k (columns) indicated the number of samples. The only difference would be that the degrees of freedom no longer equal one and therefore Yates' correction need not applied. More Than Two-Sample Proportion Problem with more than Two Response Choices The last type of proportion problem that may be encountered by the decision maker is an r x k experiment where r, k> 2. It is obvious that in this situation the degrees of freedom are always greater than one. Hence, the Yates' correction is not used. More important, however, is the computational fact that because there are more than two response choices, the sets of expected values cannot be found by subtraction but must be calculated for each response choice or row in the y} table. Again, a simple example may illustrate the computational procedures in the light of the inferential algorithm. Suppose the personnel department of a large organisation is confronted with some kind of attitudinal problems, say, high absenteeism. It wants to find out if the attitudinal factors that cause this symptom are equally distributed throughout the various departments and sections of the organisation or if there are significant differences among them. Obviously this knowledge is of great importance in designing the action programs that may alleviate this problem. Let us say that an attitude survey, addressing itself to this problem, has been administered throughout the organisation and the responses have been scored and quantified in such a way that it is possible to classify the employees of each unit of the organisation as possessing either HIGH, MEDIUM or Low morale. To make it simple only three units (A, B, C) are
2
256
analysed. The experimental findings and the sample sizes are shown in the 3x3 table below: Steps: I - H : P,
B
I totes
"Hi.MD.LD M AT O
IF LIMF/
Response HI MD LO Total 10/9 10/10.98 10/9.99 30
a/s>
o/e
Grand Total
35/21 35/25.62 0/23.31 70
0/15 10/18.30 40/16.65 50 150
The pooled proportions for each row are found by means of the previously shown formula P. = or
p
x,+x,+x,
10 + 35 + 0 *lto=
030
10 + 35 + 0
=0.366
10 + 0 + 40 = =0.333 150 e = 0.30x30 = 9 P

Pl AH1
and so on. Then the expected values are calculated as previously shown. Small rounding errors may occur when totals are checked. Performing the calculations in the usual manner, the y} experimentally obtained value (EOV) with (r-l)(k-l)=(31)(3-1)=4 degrees of freedom is obtained.
EOV = ] =Y}^L 87.79
IV
MEV = 9.49 87.79>9.49 (EOV>MEV)
A significant difference (indeed, at P.01) exists among the three samples units of the organisation. Therefore H rejects.
0
The attitudinal climate in the three units is different and analysis must be conducted now to find the reasons for this difference.
Punjab 7 echmcal
Uvhirrutv
7V7
table format. Appendices 3 and 6 show the z and t tables. From these tables it can be seen that the t-values equal the z-values for given confidence levels at infinity, or, to see the comparison in terms of the shapes of the distributions, with an infinitely large sample the two distributions are the same. In practice an infinitely large sample is not necessary. As pointed out earlier, as long as the sample size is large (n 50), the decision tool for mean problems is the zdistribution. For small samples (n<50) the t-distribution must be used. A number of additional observations should be made concerning the zdistribution. The height of the normal frequency distribution shown in Figure 10.9 is given by density function . f(x)=Y =
aV2n
Notes
for any given x-value, where p is the ratio of the circumference of a circle to its diameter ( VZTT =2.5066), e the basis of natural (Naperian) logarithms (e=2.7183), m the known population arithmetic mean, and s the known population standard deviation. Of course, as already shown in the application of the x distribution to decision problems, it is the area under the curve rather than its height that is of interest. In order to have a testing criterion against which an infinitely large number of normal distributions (-oo<x<oo) may be measured, the standard normal distribution which transforms the %scale into a z-scale by is used. Note that z appears in the exponent of density function. Given the z-value for any x-value, the area percentage under the distribution or probability for c may be looked up in Appendix 3. Note that the point of inflection occurs at ls. With a corresponding area percentage of 15.866 per cent as shown in Figure 10.9. Of course, the area undej any frequency distribution is 100% or total probability equal to 1.00. The z- and tdistributions are symmetrical so that the probability of occurrence of a x-value above or below the arithmetic mean is P=0.5. Note also that unlike the positive x value, z-or t-values may be positive or negative. Thus, it is customary to refer to distributions that extend from 0 to + a as one-tailed distributions, whereas distributions extending from 0 to o o are known as two-tailed distributions. As will be seen shortly, this characteristic of the distributions has important implications for the formulation of the alternative hypothesis. Depending upon the direction chosen, it may be either one-sidedthat is, "greater than" with positive z or tvalues or "less than" with negative z or t-valuesor two-sided in which case the direction is unimportant and only the "not equal to" option is of importance in the decision situation.
2 2 2
The One-Sample Mean Problem with a large Sample Size: T-test and Z-test Whenever a proportion problem is encountered in the decision situation, the X distribution should come to mind immediately as the distribution that quantitatively governs this type of problem. For the mean problem it is the z or t-distribution depending upon the sample size and assuming that the variable is normally distributed. If the variable is not normally distributed
2
259
The Z and t Distributions

The vast majority of decision situations that are encountered by the industrial decision makerindeed in the life in generalare somehow related to averages. A student may be admonished to maintain a grade point average, insurance premiums are determined by average life expectancies, automobile clubs tell travellers the average travel time required to cover the distance between cities A and B and even the time-honoured solution to any business argument "let's split the difference" reflects nothing more than an averaging process. Then it should come as no surprise that the empirical distribution that governs the behaviour of averagesto return to the specific type of average: the arithmetic meanis the cornerstone of inferential decision making. This distribution is known as the normal or z-distribution. Like y} it is a continuous distribution. But its most important statistical characteristic is that all known distributions (the parametric tests) approach its shape if certain conditions are met. Fundamental in each case is a large sample size n. The wonderous natural law that is reflected in the normal distribution was first observed and quantified by the French mathematician de Moirve. But Gallic Esprit was not enough get the good tale told to the rest of the world. It took the independent observation and quantification plus Germanic professorial ponderosity and specificity of Gauss to get the message across. Ever since the world has had a bell; for, the normalsometimes even called gaussiancurve is bell-shaped as shown in Figure 11.8.
Z-Distnbulion
Notes
of
./'
\
i H+o
'v. ^ X-Scole |i+2o H+3<J!- .ot ,:al e
H-3o n-3o |i-0 -I I 2 L 95% conlidence interval [ = 0 05) between prj 1 96 99% confidence interval | =001) between 1 = 1 2.56
-5 -2 -1 C 1 2 3 1 1 9391 confidence interval I = 0 05) between \=1 2.042 with dt=30 99% tont,defle interval | =001) between t= 2 75 w* df = 30
Figure 11.8: The z-Distribution and t-Distribution
Later the distribution that governs the behaviour of a normal variable within the constraints a small sample size, the t-distribution, was developed by the Irishman Gosset. As can be seen from Figure 11.8, the t-distribution is flatter, that is, the area for the 95% and 9 9 % confidence intervals is wider and hence the standard deviation values are larger than for the z-distribution. The tail of t-distribution is thicker than that of z-distribution where it is thiner as we move away from mean. As for all parametric tests, significance values or area under the curve probability values have been calculated and are available in
m
or
table format. Appendices 3 and 6 show the z and t tables. From these tables it can be seen that the t-values equal the z-values for given confidence levels at infinity, or, to see the comparison in terms of the shapes of the distributions, with an infinitely large sample the two distributions are the same. In practice an infinitely large sample is not necessary. As pointed out earlier, as long as the sample size is large (n 50), the decision tool for mean problems is the zdistribution. For small samples (n<50) the t-distribution must be used. A number of additional observations should be made concerning the zdistribution. The height of the normal frequency distribution shown in Figure 10.9 is given by density function
]
Notes
J I / x - M2
f(x) = Y:
(Z)
for any given x-value, where p is the ratio of the circumference of a circle to its diameter ( V2TT =2.5066), E the basis of natural (Naperian) logarithms (e=2.7183), m the known population arithmetic mean, and s the known population standard deviation. Of course, as already shown in the application of the I distribution to decision problems, it is the area under the curve rather than its height that is of interest. In order to have a testing criterion against which an infinitely large number of normal distributions (-oo<x<co) may be measured, the standard normal distribution which transforms the %scale into a z-scale by is used. Note that z appears in the exponent of density function. Given the z-value for any x-value, the area percentage under the distribution or probability for c may be looked up in Appendix 3. Note that the point of inflection occurs at ls. With a corresponding area percentage of 15.866 per cent as shown in Figure 10.9. Of course, the area under any frequency distribution is 100% or total probability equal to 1.00. The z- and tdistributions are symmetrical so that the probability of occurrence of a x-value above or below the arithmetic mean is P=0.5. Note also that unlike the positive x value, z-or t-values may be positive or negative. Thus, it is customary to refer to distributions that extend from 0 to + a as one-tailed distributions, whereas distributions extending from 0 to oo are known as two-tailed distributions. As will be seen shortly, this characteristic of the distributions has important implications for the formulation of the alternative hypothesis. Depending upon the direction chosen, it may be either one-sidedthat is, "greater than" with positive z or tvalues or "less than" with negative z or t-valuesor two-sided in whjch case the direction is unimportant and only the "not equal to" option is of importance in the decision situation.
1 2 2
The One-Sample Mean Problem with a large Sample Size: T-test and Z-test Whenever a proportion problem is encountered in the decision situation, the X distribution should come to mind immediately as the distribution that quantitatively governs this type of problem. For the mean problem it is the z or t-distribution depending upon the sample size and assuming that the variable is normally distributed. If the variable is not normally distributed
2
259
(draw a histogram and inspect visually or perform the % test that was discussed earlier), non-parametric methods must be used as discussed later. As seen from the density function of the z-distribution, the distribution is completely defined by the two population parameters u (mean) and a (standard deviation). In most decision situations they are not known for reasons discussed earlier and must be estimated by the appropriate sample statistics X and a. As already encountered in the one-sample proportion problem, in the one-sample mean problem the population parameter is known and a known sample statistic is tested against it for significance. A visual presentation is found in later figures. Whenever such population parameters are tested for significance given a sample statistic that was found by performing an appropriate experiment, the decision situation usually involves a claim, policy or technical specification as previously discussed for the proportion problem. In order to illustrate the application of the z-test in a decision problem, let us suppose that we have to make a decision concerning the large-scale purchase of light bulbs. We are interested in buying a certain brand of long life 60 watt bulbs that have a specified average life of 2500 hours. However, before we commit ourselves, we want to be at least 9 5 % certain that the bulbs perform indeed as specified. Therefore, we obtain a sample of say n=100 bulbs and either test their mean life span ourselves or ask one of the many testing laboratories to do the job for us. The life span of each bulb (an x-value) is thus recorded and the mean for the sample calculated. There are two methods of recording the data. Either each x-value is recorded such as Xj= , xi = , X 3 . . . . , x j o o = hours. In this case an ungrouped data set results. Or the x-values may be grouped within intervals of say 200 hours and only the frequencies are recorded from a tally sheet. The result only known as a grouped data set. Note that in this case the actual x-values are not known. Only the frequencies of light bulbs having a life span that fell into a certain interval are known. It is clear that the calculated values for the mean and standard deviations for ungrouped data sets are more precise and, therefore, ungrouped data sets should be used whenever possible. In some cases, however, either the data are available in grouped form only or the size of the data sets (n = thousands) makes it impractical to store them in ungrouped form.
=
At any rate the decision maker must be able to calculate the mean and standard deviation for both types of data sets. In the present large sample size situation, the pertinent calculation methods for grouped data will be reviewed. In the next sectionexamining small sample sizesthe mean and standard deviation formulas for ungrouped data sets will be considered. It may be recalled from basic statistics readings that the mean and standard deviation of a grouped data set may be found on the basis of the individual midpoints of each class interval. This approach must be used when the class intervals are not equal. It is also used for computer programming and data input. However, most class intervals are kept equal or can be made equal. For this reason the more efficient formula for manual calculations that utilises just
one general data set midpoint and simple coded midpoints for each class is illustrated here. The two formulas are x=M+-c and
i
Notes
(C) Where M represents the general midpoint which may be placed into any class but which should be placed in the modal class (frequency density is maximum) in order to facilitate calculations, the d-values represent a simple unit distance integer code whose origin is placed at M, c is the width of the class intervalchosen here to be 200 hours and f the frequency of observations in each class. The grouped data set resulting from testing the life span of 100 light bulbs and the work sheet for the necessary calculations is as follows:
O ojs I N T E R V A L INHOURS (C) FREQUENCIES OF L I G H TB U L B SF T ) M F D FD
2
2000-2199
17
-17
17
2200-2399
35
2300
2400-2599
32
+1
+32
32
2600-2799
16
+32
64
T O T A L
100
47
113
Then, 47 x = 2300 + (200) = 2394 hours

1UU
S=
And
100
(200) = 190.69 hours
Now the difference between the experimental value of 2394 hour\ and the claimed value of 2500 hours may be tested for significance by the inferential decision algorithm. Steps: 1: H :u=2500
0
Note that the population mean of this particular brand of light bulbs is tested. H: u <2500
261
A one-sided alternative hypothesis should be selected because we would not want to purchase bulbs that do not have a minimum average life of 2500 hours as claimed. Of course, if they burn longer than 2500 hours, that is all right with us and does not require a special test or decision. Steps: I: II. Ill H : u =2500
0
Notes
P.05 X-\i
x-u
x-u S in
Note the z configuration. In the basic formula x, m and s must be known. In our problem x is a sample mean, m is known because of the claim, but the population standard deviation is a j = s / V = s which is known as the standard error of the mean. Hence, the experimentally obtained value (EOV) is found by
EOV = Z =
c r i v
2394-2500 _ _, RRR = -5.56a
IV.
190.69/VlOO The mathematically expected value (MEV) is looked up in Appendix 3 and found to be -1.64 o\ Therefore, EOV < MEU or -5.56 < -1.64 disregarding the sign A significant difference exists These light bulbs should not be purchased because the test has not verified the claim.
For conceptual reasons and review purposes, the decision is shown again in Figure 11.9.
0.9500
-5.560
-1.640
-5.560
-233Q
Figure 11.9: The z-Test with One-Sided Alternative Hypothesis
Note that the sign of the alternative hypothesis determines the location of the region of rejection. In this case it is "less than" the claimed mean value and therefore the a-risk is under the negative tail of the distribution. As Figure 10.10 shows, the experimentally obtained value is significant at both P.05 and P.01. Look at Figure 11.9 and see how very far this value (z= -5.56s) is removed from the claimed mean and how very small the actual a risk
$tH-?Kitrurtumal MATERIAL
happens to be in this case when the alternative hypothesis is accepted. In fact, it is so small that it cannot be looked up in Appendix 3 which extends only to z = 3.00s. Now suppose that rather than to test a one-sided alternative hypothesis, we had formulated a two-sided one. This is somewhat difficult to justify in the case that involves a claim, but let us say that while interested in purchasing bulbs with a certain minimum life span, at the same time we did not want a bulb that burned too long. Perhaps we wanted to keep our maintenance people (or does the union contract call for electricians?) happy and employed, because it is they who change the burned out bulbs. The "less than" and "greater than" probabilities are both reflected in the hypothesis. The calculations in Step III of the algorithm remain the same, but note the decision difference as shown by Figure 11.10.
-5.56rj - 1 . 9 6 a
(j
+ 1.96a
- 5 . 5 6 c -2.58a
+2.58a
Figure 11.10 : The z-Test with a Two-Sided Alternative Hypothesis
Notice how the a risk is now split between the negative tail (less than) and the positive tail (greater than) while the area of acceptance remains unaffected at either 95% or 99%. Under each tail the area of rejection is now 2.5% and 0.5%, respectively. These new probability values result in different mathematically expected values. From Appendix 3 they are found to be 1.96s and 2.58sl respectively. Again the difference is significant. But more important is the fact that these MEVs are fixed for all decision problems involving the zdistribution at the indicated confidence levels, because in the absence of degrees of freedom, differences in sample sizes do not affect them. And since the 95% and 99% confidence levels are widely used in industrial decision making these z-values should be memorised. The table below may provide some comfort in this endeavor.
95% Confidence One-Sided Alternative Hypothesis 99% Confidence
+
or 1.64
+
or 2.33
Two-Sided Alternative Hypothesis
+ 1.96
+ 2.58
The insight that has been gained from an examination of Figures 11.9, 10 and 11 may now be distilled and used as some further enlightening portion concerning an understanding of the concept of significance. Notice, for example, that when a 99% confidence interval is constructed for the sample mean = 2394 hours by the familiar
Quantitative Techniques XZ
4n
Notes
or 2394 2.58 190.69 100 Yielding an upper confidence limit of 2443.20 hours and a lower confidence limit of 2344.80 hours the claimed population mean u. = 2500 hours does not fall into this interval. This indicates that the sample was drawn from a parent population whose mean was significantly less than the claimed population mean of 2500 hours and we can make this statement with 99% confidence (z=2.58). How large a sample mean would have had to be obtained in the worst case to make the difference between it and the claimed population mean of LI = 2500 insignificant? Well, just set the upper confidence limit equal to 2500 and solve
190.69 2500-2.58~r==2450.80 100
Student Activity
A manufacturer claims that the average mileage of scooters of his company is 40 kms/litre. A random sample of 20 scooters of the company showed an average mileage of 42 kms/litre. Test the claim of the manufacturer on the assumption that the mileage of scooter is normally distributed with a standard deviation of 2 kms/litre.
In other words we could not have rejected the null hypothesis if the sample mean had been * = 2450.80 instead of x = 2394 and been 99% confident that the observed difference between the claimed population mean, u = 2500 and the new sample mean of * = 2450.80 were significant. However, by shortening the interval to z=1.96, but being only 95% confident, a significant difference could still be demonstrated or 2450.80 1 . 9 6
1 9
100 which yields an upper confidence limit of "only" 2488.18 hours thereby not including the claimed 2500 hours. The lower confidence limit does not really concern us here. It is clear, therefore, that the selection of the a risk in Step II of the algorithm should not- be undertaken lightly. Granted the decision maker is usually interested in accepting the alternative hypothesis because it reflects his hunch that led to the experimentation and decision process in the first place. By selecting a very short interval, which means a high a risk, the alternative hypothesis can be accepted much more readily. So why not use Pn.50 which is tantamount to flipping a coin? That is^precisely why you, the reader, studies these decision tools. You pay a price by working with wide confidence intervals, but you can sleep well at night knowing that 95% or 99% of the time you are right in an uncertain world.
The One-Sample Mean Problem

The center discussion of the preceding section is applicable again here. The only difference in the decision situation is a small sample size. Since the priori definition of the sample size is usually based on time and cost considerations as earlier pointed out, small sample size conditions are very frequently encountered. Indeed, the t-test which is the appropriate decision tool in this case assuming a normallv distributed variable is often used for large sample conditions as well. Many computer libraries, for example, do not contain z-
tests and, as will be seen shortly, in the case of a one-sample mean problem, the right-hand side of the test equation is the same. Only the mathematically expected values differ as already shown in Figure 11.9. To illustrate the application of the t-test, let us again test a claim and select a problem that would call for a "greater than" alternative hypothesis. Think of a bunch of eight people who consider themselves "on a diet". They are healthy, perfectly alright but bored. So every day they sip a concoction that contains 1347 calories from a 0.20 litre crystal vessel. "Uma, imagine there are more the 1347!?" That was all Amit, the leading decision maker of the group, needed to hear. He knew an old college chum who enhanced his Food Safety Administration salary by moonlighting in people's kitchens, delicatessens and chest stands. One day this fellow showed up with necessary calorie counting apparatus. The sample of eight had their filled crystal vessels in front of them. Caloric means and standard deviations were about to be calculated. Because only eight vessels were involved, it was decided to treat them as ungrouped data. The eight measurements checked in as follows -{1353, 1348,1354,1344, 1345,1349,1356,1346}. The formulas for the mean and standard deviation are _ Zx x= n
Notes
and
s=
1*>--(L*J
n-\ n-1
2
It may be recalled that the variance (S ) is the square of the sum of the squared deviations (SS) divided by the degrees of freedom. In the case of the small sample size df=n-l. Expanding SS binomially [(a-b) =a -2ab+b ] results in the second formula for S which is easier to use for manual calculations, because it uses totals rather than deviations. Inputting the data, the following values are obtained
2 2 2
10795 x r , ^ - = 1349,38
1
and
114,566.643-
( 1 0 7 9 5 )
8A
Uma blanched. There were indeed more than 1347 calories. Amit was annoyed because he could not follow the meaning of 0.38 calories if?the mean, although he should have remembered that each caloric observation could have resulted in a fractional measurement since calories are continuous data. The FSA man calmed the situation by suggesting a significance test which proceeded according to the algorithm. Step: I H : n = 1347
0
H : M > 2347 (God forbid)

a
Poos
jc-M E O V = t = 1 3 4 9 . 3 8 4 . 4 7 / 8 1 3 4 7 = + 1 . 1 5 1 a r = = s i Vn
111
iantitative
Techniques
IV
EOV <MEV or + 1.51 < 1.895 The difference is insignificant, i.e., null hypothesis Ho is accepted.
Notes
The concoction may not taste like much, but it sure has only 1347 calories. Everybody let out a sigh of relief. The dieters agreed that some government employees weren't such bad chaps after all. They toasted each other with what was left in their crystal vessels. The FSA man smiled.
A
J-2 3 o O1.S10 2 3(60 9 1310
Figure 11.11: The t-Test with a O n e - a n d Two-Sided Alternative Hypothesis
The decision is reproduced in Figure 11.11. The t-distribution has n-1 degrees of freedom for the one-sample problem. By locating the proper a risk column, the mathematically expected t-value (1.895) is found for 7 degrees of freedom in Appendix 6. If the alternative hypothesis had been two-sided (perhaps our dieters may have been concerned about too learn a mixture), MEV= 2.365 as shown. Note that if you continue downward in these columns to successively larger sample sizes, eventually the t-values will equal the z-values at infinity, or, 1.645 and 1.96, respectively. At this point the two distributions are superimposed as previously discussed. But not further how much wider the tinterval is compared to the z-interval. This is the price one pays for the small sample size which makes the decision environment even more uncertain.
Student Activity 500 units from a factor}' are inspected and 12 are found to be defective; 800 units from other factory are inspected and 12 are found to be defective. Can it be concluded at 5% level of significance that the production at second factory is better than in the first factory? Hint: Use one tailed test. Student Activity The following data show that the cost per square foot of floor area concerning randomly selected 7 schools and 5 office blocks from those completed during the period 1984 to 1989:
Building Type Cost per sq. ft School : 28 31 26 27 23 Office Blocks : 37 42 34 37 35

Hint: Use two tailed t-test.
3 8 37
Can we conclude that the mean cost per sq. ft is same for the two type of buildings?
The F-Distribution
The general decision environment of the proportion problem arises because of the different impact ratios produce not only in terms of a quantitative aspect (3 out of 5 prescribe x) but also because a ratio represents an esthetic stimulus, (could it be 10 out of 10 or 5 out of 10 for y?) Human beings relate to symmetry or expected unexpected asymmetry in predictable and therefore measurable ways. The mean problem addresses itself to some aspect of performance. Rather than to measure shapes of distributions, the mean problem addresses itself to the quantity of performance with an eye on
sufficiency or insufficiency. The variance problem deals with consistency of performance, sufficiency and insufficiency are not important. Important is whether performance is consistent or not. The general parametric decision tool for variance problems is the F-distribution. The F-test is a ratio of sample variances obtained from a normally distributed variable. This means two things immediately to the decision maker. First, for all non-normal variables F cannot be used and second, a one-sample variance test cannot be performed because for a ratio a minimum of two sample variances is necessary. A short statistical background review may place this very important decision making tool into perspective. Indeed, the F-test opened a new era in inferential decision making, and entire books have been written about its applications. Given normally distributed random variables, xj, xj, X3,... x with a mean and variance equal to m and 02 respectively, they may be standardized as in the z distribution with m=0 and cs -l by
n 2
Notes
x,-u z=2 U = Z, 2 2 + Z, + . . . + Z
*^
2
u follows a X distribution. If there are n independent random variables, there are 5 =n degrees of freedom. Note the X density function previously shown.
2
If we now replace the population mean by the sample mean, we can write
-12 M
and
m
X
L
-X
=Z 1=1 L
Which has a X2 distribution with k-1 degrees of freedom. With the sample size n,
ns u=
Where n s is a convenient way of writing the sum of the squared deviations and u has a X distribution with n-1 degrees of freedom. Now suppose that we have a second sample v, then
2
ntitative Techniques
Each having X distributions with v j = n i - 1 and V 2 = n i - 1 degrees o f freedom. Then
Notes
F=-
With v i and V 2 degrees o f freedom. This i s the ratio o f sample variances with independent X . distributions and their respective degrees of freedom. Rewriting F we obtain
2
tc
Mil
2
I M ,
^2
Since the null hypothesis in the two-sample variance test is Ho: AI = 02 , F becomes V,
2 2 2
I(x-x) n,-l S(x-x) n -l

2 2
S S
F=-
2 2
The density function of F is given by
(v,-2)/2
/(F; = s f-Lf'^ >o
f ^ l " .
2 2
6 ^ (
Again, as in the case of X , it can be seen t h a t / ( F ) varies as its numerator and denominator degrees of freedom vary. Like X , F is a unimodal distribution that is skewed to the right. Therefore, only positive F values are possible. As the degrees of freedom increase, F approaches the normal distribution. Unlike in the application of z or t, we do not have to be concerned about the sample size. The One-Sample Variance Problem As already mentioned in this particular problem F cannot be used, yet. It takes two to F. In this case there is only one sample. However, we just saw that the variances in the F ratio follow independent X distributions. Therefore, we may use the X distribution for the one-sample variance problem.
2 2
As the Japanese worker follows a regular stretch exercise regimen during the work day to keep himself alert and happy, do the same and go back to the one-sample mean problem with a large sample size. Meanwhile we sit back and enjoy this activity, because we do not have to come up with a new illustration. For, we want to use that sample variance and test it against a hypothesised population variance of the light bulbs. Say these bulbs are to be used in some electric billboard where every time the bulbs have to be changed, a scaffold has to be erected at substantial cost. Obviously in this case we are not only interested in their mean life span, but we also like consistency of performance. What good is it that some bulbs may burn for decades while others burn our within a short period of time. Wouldn't make for an attractive billboard? Would it? Let us say that the manufacturer claims a standard deviation not exceeding 175 hours. We found 190.69 hours. Is there a significant difference. Run the algorithm. Step: I H : o
0 2
=30,625hours
2
Ha
'
>
30,625
hours
I I
P.05
2
s (n-l)
2
CT
X__=
III
36,363(100-1) 30,625 = 117.55 Note the x configuration. The variance ratio has been formed with one sample and one population variance. This ratio is x distributed with n-1 degrees of freedom. In this case df=100-l=99
2 2
IV
EOVkMEV
or
U7.55<124.3
The closest MEV is used because the table does not shoxv 99 degrees of freedom. There is an insignificant difference.
The bulbs meet the variance specification.
Well, at least the manufacturer met the variance specification after having flunked the mean life test. The Two-Sample Variance Problem: The Parametric Case Now that there are two sample variances, the F test may be used. As already pointed out, this particular type of decision problem is of special importance for the application of the t and z test which assume that the variances of the parent populations a ] and a 2 are the same. Let us then use the data for the two-sample t-test and see if it was properly applied to that group of (almost) unhappy dieters. The standard deviation for gold and silver were 0.3 and 0.2
2 2
calories, respectively. The two sample sizes were n c = 1 3 and ns=17. Here is the algorithm. Step: .1
H H
0
: c : a
2 G
= a
2
2 s
Notes
* a
2 s
II
P.05
ii-i
S'GREATER ~ S
2
III
SMALLER
n - i
0.3' 0.09
Q 2
Fv-1
0.04
= 2.25
Note that in this application of the F test the larger variance (larger of mean squares which is the sum of squares divided by the degrees of freedomin other words the variance) is placed in the numerator so that F > 1 in all cases. Note also the numerator and denominator degrees of freedom for F as shown. They must correspond to the sample sizes from which the variances were obtained in each case. IV EOV < MEV or 2.25 < 2.42
The mathematically expected F values are shown for 9 5 % and 99% confidence levels in Appendix 7. Locate the numerator degrees of freedom (12) and read down in that column to the denominator degrees of freedom (16). There is an insignificant difference. The variances of the gold and silver popidations are the same and the t test has been properly used. A significant difference had been obtained, a non-parametric test like the MannWhitney U test should be used instead.
Concluding Comments
The little black bag that we set out to fill is now equipped for every decision environment that you may encounter involving proportions, means or variances. You have the decision tools. Needless to say that there are other tools that may be used or tools that have been applied in one case and could be applied in another. An ANOVA, for example,>may be performed in the two-mean comparison situation rather than the t-test. If you did that, you could show that t = F. But ANOVA would be a less efficient decision tool and the tools that have been discussed come recommended at least from this source. Perhaps you do not realise how much stuff you carry in this little bag. Check the overview that is provided as Table 11.2 and start using the tools in some of your own personal job-related or academic decision situations. To get you started there are some case studies that follow.
2
Student Activity
Two independent samples of sizes 10 and 12 from two normal populations have their mean square deviations about their respective means equal to 12.8 and 15.2 respectively Test the equality of variances of the two population.
Selfln>iruawmd Material
Notes
_ 0 -3 8 oI 0
E
V) C 0
>
>
O
o
o o
o _o o
>
o -0 o
"A A <
u.
o u
D Z Q 6
,o J
-Q
0 O
is.
D I 'E
> O
C N O
~ D 4 ) . a 0 O> cr J; 6 0 I NL / OO D ! I O Z 3 o
ii
-o V
o .
o
<T
i/t
A 6
N I 0 ) A 1 IT / 15 E
! *
N 1
O
6
HI
0 1 3
b
0 o 0 )
A >
s
E
JMBE AMPL
MORE
J A
a
!Z
1
C M
o a o
Summary
The population associated with studies, investigations and explorations are often very large - larger than one can practically handle each of the members of the population individually. It makes studying the population extremely difficult. One of the most useful applications of the theory of statistics is to enable one to make intelligent and informed guess for different characteristics of a population under study using a small sample selected from the population in some way. Sample mean, median, mode, geometric mean, harmonic mean, e t c , can be used as estimator of population mean. The method of estimation, where single statistic like Mean, Median, Standard deviation, etc. is used as an estimator of population parameter, is known as Point Estimation. A hypothesis is a statement (that may be true or false) aimed at explaining some event or phenomenon the applicability of wTiich depends on testing the same on the samples taken from the population. A statistical hypothesis test, or more briefly, hypothesis test, is an algorithm to state the alternative (for or against the hypothesis) which minimizes certain risks. Propositions come in the form of an assertion of a correlation between, or among, two or more things, but without necessarily asserting a cause-and-effect relationship. Empiricial sample sizes are based on given specifications concerning confidence level (z), allowable sampling error (E) and either variance s or proportion (P) depending on the type of problem.
2
Notes
There are two general statistical approaches to inferential decision making - Classical and Baysian. There are two types of hypotheses - the null hypothesis and the alternative hypothesis. The null hypothesis (Ho) treats the claim as valid. It does not take an exception.
Keywords
Cramer Rao Inequality: This inequality gives the miramum possible value of the variance of an unbiased estimator. Sufficiency: An estimator t is said to be a sufficient estimator of parameter q if it utilises all the information given in the sample about.
Review Questions
1. 2. 3. What do you mean by the term "Estimation"! What are the properties of good estimators? Write a note on the difference between point estimation and internal estimation.
Self-Instructional
Material
Further Readings
R S Bhardwas, Business Statistics, Excel Books Abraham, B., and J. Ledolter, Statistical Methods for Forecasting, John Wiley & Sons Freund, John E., and Ronald E. Walpole, Mathematical Statistics, 5th edition, Englewood Cliffs, NJ: Prentice Hall
Theory of Estimation and J est of Hypothesis
N o t e s

Quantitative Techniques

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quantitative Techniques

Uploaded by

Copyright:

Available Formats

QUANTITATIVE TECHNIQUES

Mathematics and Statistics in B u s i n e s s Decisions

Role of Mathematics and Statistics in Business Decisions

Seq-lmmn usual Materia!

Role of Mathematics and Statistics in Business Decisions

Managerial Decision M o d e l s and Algorithms

njab Technical University 7

Figure 1.4: Phases of Decision Making

The Design of C o n c e p t u a l Models

, respectively. (There are

6 possible combinations of spots showing on 2 dice thai yield a seven and 2

Role of Mathematics and Statistics in Business Decisions

Role of Mathematics and Statistics in Business Decisions

The Concept of & S e t

'uniai> Technical University 13

T h set of all straight lines in a plane.

The set of odd numbers between 3 & 19 is |5, 7, 9 , 1 1 , 1 3 , 1 5 , 1 9 }

Notes A = {x I 3 x - 12x = 0, x is a natural number).

We can also say, B is a super set of A, and write it as BD A.

E.g.: If A - {a, b. cj. 1 hen its subsets are

A = (1,2,3...n} is called index set. Here the suffix r e A of A is called

THEN WHAT C A N YOU SAY ABOUT N(C)?

From (1) and (2), x e A o x e B. => A = B.

Notes Theorem 4 If A c B and B c C . then A c C .

(A) = (1 + 1)" (A) = 2".

[by Binomial theorem]

Universal set U = {1, 2, 3...25) A = { 1 , 2 , 3 , 1 0 , 14), 13 = {2, 3) Clearly B c A.

Universal set U - { 1 , 2 , 3...25) No common elements A = { 1 , 2 , 3) B = {4, 5, 6)

Universal set U = 11,2, 3...25) A = {3, 4, 6, 7} B = {4, 6, 10, 12)

The elements 4, 6 of the sets A & B are common.

Punjab Technical Unr;:ers;<y 19

A = |1,2, 3,4, 5} B = {2, 4, 5, 6,11,12) A u B = |1,2, 3, 4, 5, 6 , 1 1 , 1 2 ) .

The Venn-diagram for A rTf54's^given shaded area. Clearly A n B is a subset o

>ove figure by the id B. Also An A =

20 Self Instructional Material

If A = {a, b, c, d, e} & B = (d, e, p, qj T h e n A - B - |a, b , cj.

7 h complement of the set A for the universal set U is generally

denoted by A', and thus A' = {x : x < A) Notes or A' = {x : x e U but x t A}

A -B ={1,2, 3} - { 2 , 3 , 4, 5} = {1} B - A = {2, 3, 4,5} - { 1 , 2 , 3} = {4,5} ML

.-. A A B = (A - B) u (B - A) = { l } o {4, 5} = {T4,5}.

Where A is the set of accountants and M that of the marketing executives.

Punjab Technical University 23

From (1) & (2), we have A u A = A. 2. T o prove A u B = B u A Let x e A u B =>xeAorxeB

=> x e B or x e A => x e B u A i.e., A u B c B u A (1)

Let y e B u A => y e B or y e A => y e A o r y e B => y e A u B i.e., B u A c A u B (2) S

From (1) & O, we have A u B = B u A . 3. T o p r j v e ( A u B ) u C = Au ( B u C ) Letxe(AuB)uC

=> x e A or (x e B or x e C) => x e A or (x e B u C) 24 Self-Instructional Material

i.e., x e A u ( B u C ) Let y A u (B u C) => y e A or (y

y e A or (y e B or y e C) => (y e A or y e B) or (y e C) ^(yeAuB)or(yeC) =>ye(AuB)uC i.e., Au(BuC)c(AuB)uC (2)

Laws of Intersection of Sets

i.e., A c A n A From (1) and (2), we have A 2.

BnA => y => y => y

B and y e A A and y AnB (2)

i.e. B n A c A n B From (1) and (2), we have AnB=BnA 3.

i.e., ( A o B ) n C c A n ( B n C ) Let y e A n (B n C) => y

x e (AuB)n(AuC) Au(BnC)c(AuB)n(AuC) (1)

(y e A or y eB) and (y e A or y e C) y e A or (y e B and y e C) y e A or (y e/B n C) yeAu^nC) (2)

(AuB)n(AaQcAu(BnC) From ( 1 ) and (2), we have A u (B njC) = (A u B) n (A u C).

(ii) A u A n A (iii)Uuf (iv) U n ^ (v) UuU

3. Obtain sets. (xi)

(xii) A u B (xiii) A nB (xiv) A n B (xv) A wB Where, A= 1 1 , 2 , 3 , 4 , 5 , 6| B= 11,3,5)

From (1) and (2), we have A n ( B u C ) = ( A n B ) u ( A n C).

U => x e A or x A'. => x e (A u A')

i.e. A c (A')' From (1) and (2), we have (A')' = A.