Statistics Both Set

MBA SEMESTER 1 MB0040 STATISTICS FOR MANAGEMENT Assignment Set- 1 (60 Marks) Ques1: (a) Statistics is the backbone
e of decision-making. Comment. (b) Statistics is as good as the user. Comment. Ans a) Statistics is the backbone of decision-making :With the proper application of statistics and statistical software package on the collected data , managers can take effective decision, which can increase the profits in a business. The word decisionsuggests a deliberate choice made out of several possible alternative courses of action after carefully considering them. The act of choice signifying solution to an economic problem is economic decision making. It involves choices among a set of alternative courses of action. Decision-making is essentially a process of selecting the best out of many alternative opportunities or courses of action that are open to a management. The choice made by the business executives are difficult, crucial and have far-reaching consequences. The basic aim of taking a decision is to select the best course of action which maximizes the economic benefits and minimizes the use of scarce resources of a firm. Hence, each decision involves cost-benefit analysis. Any slight error or delay in decision making may cause considerable economic and financial damage to a firm. It is for this reason, management experts are of the opinion that right decision making at the right time is the secret of a successful manager. Due to advanced communication network, rapid changes in consumer behaviour, varied expectations of variety of consumers and new market openings, modern managers have a difficult task of making quick and appropriate decisions. Therefore, there is a need for them to depend more upon quantitative techniques like mathematical models, statistics, operations research and econometrics. Decision making is a key part of our day-to-day life. Even when we wish to purchase a television, we like to know the price, quality, durability, and maintainability of various brands and models before buying one. As you can see, in this scenario we are collecting data and making an optimum decision. In other words, we are using Statistics. Again, suppose a company wishes to introduce a new product, it has to collect data on market potential, consumer likings, availability of raw materials, feasibility of producing the product. Hence, data collection is the back-bone of any decision making process. Many organisations find themselves data-rich but poor in drawing information from it. Therefore, it is important to develop the ability to extract meaningful information from raw data to make better decisions. Statistics play an important role in this aspect. Statistics is broadly divided into two main categories. The two categories of Statistics are descriptive statistics and inferential statistics. Descriptive Statistics: Descriptive statistics is used to present the general description of data
which is summarised quantitatively. This is mostly useful in clinical research, when communicating the results of experiments. Inferential Statistics: Inferential statistics is used to make valid inferences from the data which are helpful in effective decision making for managers or professionals. Statistical methods such as estimation, prediction and hypothesis testing belong to inferential statistics. The researchers make deductions or conclusions from the collected data samples regarding the characteristics of large population from which the samples are taken. So, we can say Statistics is the backbone of decision-making. Ans (b) Statistics is as good as the user: Statistics is used for various purposes. It is used to simplify mass data and to make comparisons easier. It is also used to bring out trends and tendencies in the data as well as the hidden relations between variables. All this helps to make decision making much easier. Let us look at each function of Statistics in detail: 1. Statistics simplifies mass data The use of statistical concepts helps in simplification of complex data. Using statistical concepts, the managers can make decisions more easily. The statistical methods help in reducing the complexity of the data and consequently in the understanding of any huge mass of data. 2. Statistics makes comparison easier Without using statistical methods and concepts, collection of data and comparison cannot be done easily. Statistics helps us to compare data collected from different sources. Grand totals, measures of central tendency, measures of dispersion, graphs and diagrams, coefficient of correlation all provide ample scopes for comparison. 3. Statistics brings out trends and tendencies in the data After data is collected, it is easy to analyse the trend and tendencies in the data by using the various concepts of Statistics. 4. Statistics brings out the hidden relations between variables Statistical analysis helps in drawing inferences on data. Statistical analysis brings out the hidden relations between variables. 5. Decision making power becomes easier With the proper application of Statistics and statistical software packages on the collected data, managers can take effective decisions, which can increase the profits in a business. Seeing all these functionality we can say Statistics is as good as the user. Ques2: Distinguish between the following with example. (a) Inclusive and Exclusive limits: Inclusive series is the one which doesn't consider the upper limit. For example 00-10 10-20 20-30 30-40
40-50 In the first one (00-10), we will consider numbers from 00 to 9.99 only. And 10 will be considered in 10-20. So this is known as inclusive series. Exclusive series is the one which has both the limits included. For example 00-09 10-19 20-29 30-39 40-49 Here, both 00 and 09 will come under the first one (00-09). And 10 will come under the next one (b) Continuous and discrete data: The numerical data that we will use in this course falls into 1 of 2 categories : discrete and continuous. A type of data is discrete if there are only a finite number of values possible or if there is a space on the number line between each 2 possible values. Ex. A 5 question quiz is given in a Math class. The number of correct answers on a student's quiz is an example of discrete data. The number of correct answers would have to be one of the following : 0, 1, 2, 3, 4, or 5. There are not an infinite number of values, therefore this data is discrete. Also, if we were to draw a number line and place each possible value on it, we would see a space between each pair of values. Ex. In order to obtain a taxi license in Las Vegas, a person must pass a written exam regarding different locations in the city. How many times it would take a person to pass this test is also an example of discrete data. A person could take it once, or twice, or 3 times, or 4 times, or . So, the possible values are 1, 2, 3, . There are infinitely many possible values, but if we were to put them on a number line, we would see a space between each pair of values. Discrete data usually occurs in a case where there are only a certain number of values, or when we are counting something (using whole numbers). Continuous data makes up the rest of numerical data. This is a type of data that is usually associated with some sort of physical measurement. Ex. The height of trees at a nursery is an example of continuous data. Is it possible for a tree to be 76.2" tall? Sure. How about 76.29"? Yes. How about 76.2914563782"? You betcha! The possibilities depends upon the accuracy of our measuring device. One general way to tell if data is continuous is to ask yourself if it is possible for the data to take on values that are fractions or decimals. If your answer is yes, this is usually continuous data.
Ex. The length of time it takes for a light bulb to burn out is an example of continuous data. Could it take 800 hours? How about 800.7? 800.7354? The answer to all 3 is yes. (C) Qualitative and Quantitative data: Qualitative data is a categorical measurement expressed not in terms of numbers, but rather by means of a natural language description. In statistics, it is often used interchangeably with "categorical" data. For example: favorite color = "blue" height = "tall" Although we may have categories, the categories may have a structure to them. When there is not a natural ordering of the categories, we call these nominal categories. Examples might be gender, race, religion, or sport. When the categories may be ordered, these are called ordinal variables. Categorical variables that judge size (small, medium, large, etc.) are ordinal variables. Attitudes (strongly disagree, disagree, neutral, agree, strongly agree) are also ordinal variables, however we may not know which value is the best or worst of these issues. Note that the distance between these categories is not something we can measure. Quantitative data is a numerical measurement expressed not by means of a natural language description, but rather in terms of numbers. However, not all numbers are continuous and measurable. For example, the social security number is a number, but not something that one can add or subtract. For example: favorite color = "450 nm" height = "1.8 m" Quantitative data always are associated with a scale measure. (d) Class limits and class intervals: Class limits If we divide a set of data into classes, there are clearly going to be values which form dividing lines between the classes. These values are called class limits. Class limits must be chosen with considerable care, paying attention both to the form of the data and the use to which it is to be put. Consider our grouped distribution of heights. Why could we not simply state the first two classes as 160165 cm, and 165170 cm, rather than 160 to under 165 cm, etc.? The reason is that it is not clear into which class a measurement of exactly 165 cm would be put. We could not
put it into both, as this would produce double counting, which must be avoided at all costs. Is one possible solution to state the classes in such terms as 160164 cm, 165 169 cm? It would appear to solve this problem as far as our data is concerned. But what would we do with a value of 164.5 cm? This immediately raises a query regarding the recording of the raw data. Class Intervals The width of a class is the difference between its two class limits, and is known as the class interval. It is essential that the class interval should be able to be used in calculations, and for this reason we need to make a slight approximation in the case of continuous variables. The true class limits of the first class in our distribution of heights (if the data has been rounded) are 159.5 cm and 164.4999 ... cm. Therefore the class interval is 4.999 ... cm. However, for calculation purposes we approximate slightly and say that because the lower limit of the first class is 159.5 cm and that of the next class is 164.5 cm, the class interval of the first class is the difference between the two, i.e. 5 cm. Ques3: In a management class of 70 students three languages are offered as an additional subject viz. Hindi, English and Kannada. There are 28 students taking Hindi, 26 taking Kannada and 16 taking English. There are 12 students taking both Hindi and Kannada, 4 taking Hindi and English and 6 that are taking English and Kannada. In addition, we know that 2 students are taking all the three languages. i) If a student is chosen randomly, what is the probability that he/ she is taking exactly one language? Ans :P(H) = 28/70 P(E)=16/70 P(K)=26/70
P(HK)=12/70
P(EK)=6/70
P(HE)=4/70
P(HKE) = 2/70 P(H U K U E) = P(H) + P (E) + P(K) - P(HK) - P(HE) P ( EK) + P(HKE) =28/70+16/70+26/70-12/70-6/70-4/70 +2/70 =50/70= 5/7 Ques 4: List down various measures of central tendency and explain the difference between them? Ans: Condensation of data is necessary for a proper statistical analysis. A large number of big numbers are not only confusing to mind but also difficult to analyze. After a thorough scrutiny of
collected data, classification which is a process of arranging data into different homogenous classes according to resemblances and similarities is carried out first. Then of course tabulation of data is resorted to. The classification and tabulation of the collected data besides removing the complexity render condensation and comparison. An average is defined as a value which should represent the whole mass of data. It is atypical or central value summarizing the whole data. It is also called a measure of central tendency for the reason that the individual values in the data show some tendency to centre about this average. It will be located in between the minimum and the maximum of the values in the data. There are five types of average which are Arithmetic Mean, Median, Mode, Geometric and Harmonic Mean Arithmetic Mean. The Arithmetic mean or simply the mean is the best known easily understood and most frequently used average in any statistical analysis. It is defined as the sum of all the values in the data .Median: Median is another widely known and frequently used average. It is defined as the most central or the middle most value of the data given in the form of an array. By an array, we mean an arrangement of the data either in ascending order or descending order of magnitude. In the case of ungrouped data one has to form an array first and then locate the middle most value which is the median. For ungrouped data the median is fixed by using, Median = [n+1/2] the value in the array. Mode: The word mode seems to have been derived French 'a la mode' which means 'that which is in fashion'. It is defined as the value in the data which occurs most frequently. In other words, it is the most frequently occurring value in the data. For ungrouped data we form the array and then fix the mode as the value which occurs most frequently. If all the values are distinct from each other, mode cannot be fixed. For a frequency distribution with just one highest frequency such data are called uni modal or two highest frequencies [such data are called bimodal], mode is found by using the formula, Mode = l+cf2/f1+f2 Where l is the lower limit of the model class, c is its class interval f1 is the frequency preceding the highest frequency and f2 is the frequency succeeding the highest frequency. Relative merits and demerits of Mean, Median and Mode: Mean: The mean is the most commonly and frequently used average. It is a simple average, understandable even to a layman. It is based on all the values in a given data. It is easy to calculate and is basic to the calculation of further statistical measures of dispersion, correlation etc. Of all the averages, it is the most stable one. However it has some demerits. It gives undue weightages to extreme value. In other words it is greatly influenced by extreme values. Moreover; it cannot be calculated for data with open -ended classes at the extreme. It cannot be fixed graphically unlike the median or the mode. It is the most useful average of analysis when the analysis is made with full reference to the nature of individual values of the data. In spite of a few shortcomings; it is the most satisfactory average. Median: The median is another wellknown and widely used average. It is well-defined formula and is easily understood. It is advantageously used as a representative value of such factors or qualities which cannot be measured. Unlike the mean, median can be located graphically. It is also possible to find the median for data with open ended classes at the extreme. It is amenable for further algebraic processes. However, it is an average, not based on all the values of the given data. It is not as stable as the mean. It has only a limited use in practice. Mode: It is a useful measure of central tendency, as a representative of the majority of values in the data. It is a practical average, easily understood by even laymen. Its calculations are not difficult. It can be ascertained even for data with open-ended classes at the extreme. It can be located by graphical means using a frequency
curve. The mode is not based on all the values in the data. It becomes less useful when the data distribution is not uni-model. Of all the averages, it is the most unstable average Ques5: Define population and sampling unit for selecting a random sample in each of the following cases. a) Hundred voters from a constituency b) Twenty stocks of National Stock Exchange c) Fifty account holders of State Bank of India d) Twenty employees of Tata motors. Ans: Population: A population is a collection of data whose properties are analyzed. The population is the complete collection to be studied, it contains all subjects of interest. A population can be defined as including all people or items with the characteristic one wishes to understand. Because there is very rarely enough time or money to gather information from everyone or everything in a population, the goal becomes finding a representative sample (or subset) of that population. A sample is a part of the population of interest, a sub-collection selected from a population. Random sampling: A sample is a subject chosen from a population for investigation. A random sample is one chosen by a method involving an unpredictable component. Random sampling can also refer to taking a number of independent observations from the same probability distribution, without involving any real population. The sample usually is not a representative of the population from which it was drawn this random variation in the results is known as sampling error. In the case of random samples, mathematical theory is available to assess the sampling error. Thus, estimates obtained from random samples can be accompanied by measures of the uncertainty associated with the estimate. This can take the form of a standard error, or if the sample is large enough for the central limit theorem to take effect, confidence intervals may be calculated. Types of random sample

A simple random sample is selected so that all samples of the same size have an equal chance of being selected from the population. A self-weighting sample, also known as an EPSEM (Equal Probability of Selection Method) sample, is one in which every individual, or object, in the population of interest has an equal opportunity of being selected for the sample. Simple random samples are self-weighting. Stratified sampling involves selecting independent samples from a number of subpopulations, group or strata within the population. Great gains in efficiency are sometimes possible from judicious stratification. Cluster sampling involves selecting the sample units in groups. For example, a sample of telephone calls may be collected by first taking a collection of telephone lines and collecting all the calls on the sampled lines. The analysis of cluster samples must take
into account the intra-cluster correlation which reflects the fact that units in the same cluster are likely to be more similar than two units picked at random.
The most widely known type of a random sample is the simple random sample (SRS). This is characterized by the fact that the probability of selection is the same for every case in the population. Simple random sampling is a method of selecting n units from a population of size N such that every possible sample of size an has equal chance of being drawn. An example may make this easier to understand. Imagine you want to carry out a survey of 100 voters from a constituency with a population of 1,000 eligible voters. With in a constituency, there are "old-fashioned" ways to draw a sample. For example, we could write the names of all voters on a piece of paper, put all pieces of paper into a box and draw 100 tickets at random. You shake the box, draw a piece of paper and set it aside, shake again, draw another, set it aside, etc. until we had 100 slips of paper. These 100 form our sample. And this sample would be drawn through a simple random sampling procedure - at each draw, every name in the box had the same probability of being chosen. If you are collecting data on a large group of people (called a "population"), you might want to minimize the impact that the survey will have on the group that you are surveying. It is often not necessary to survey the entire population. Instead, you can select a random sample of people from the population and survey just them. You can then draw conclusions about how the entire population would respond based on the responses from this randomly selected group of people. This is exactly what political pollsters do - they ask a group of people a list of questions and based on their results, they draw conclusions about the population as a whole with those often heard disclaimers of "plus or minus 5%." If your population consists of just a few hundred people, you might find that you need to survey almost all of them in order to achieve the level of accuracy that you desire. As the population size increases, the percentage of people needed to achieve a high level of accuracy decreases rapidly. In other words, to achieve the same level of accuracy: Larger population = Smaller percentage of people surveyed Smaller population = Larger percentage of people surveyed Ques6: What is a confidence interval, and why it is useful? What is a confidence level? Ans: Confidence Interval: A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data. The confidence interval is the plus-or-minus figure usually reported in newspaper or television opinion poll results. For example, if you use a confidence interval of 4 and 47% percent of your sample picks an answer you can be "sure" that if you had asked the question of the entire relevant population between 43% (47-4) and 51% (47+4) would have picked that answer. In statistics, a confidence interval (CI) is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval (i.e it is calculated from the observations), in principle different from sample to
sample, that frequently includes the parameter of interest, if the experiment is repeated. How frequently the observed interval contains the parameter is determined by the confidence level or confidence coefficient. A confidence interval with a particular confidence level is intended to give the assurance that, if the statistical model is correct, then taken over all the data that might have been obtained, the procedure for constructing the interval would deliver a confidence interval that included the true value of the parameter the proportion of the time set by the confidence level. More specifically, the meaning of the term "confidence level" is that, if confidence intervals are constructed across many separate data analyses of repeated (and possibly different) experiments, the proportion of such intervals that contain the true value of the parameter will approximately match the confidence level; this is guaranteed by the reasoning underlying the construction of confidence intervals. Example Suppose a student measuring the boiling temperature of a certain liquid observes the readings (in degrees Celsius) 102.5, 101.7, 103.1, 100.9, 100.5, and 102.2 on 6 different samples of the liquid. He calculates the sample mean to be 101.82. If he knows that the standard deviation for this procedure is 1.2 degrees, what is the confidence interval for the population mean at a 95% confidence level? In other words, the student wishes to estimate the true mean boiling temperature of the liquid using the results of his measurements. If the measurements follow a normal distribution, then the sample mean will have the distribution N( , ). Since the sample size is 6, the standard deviation of the sample mean is equal to 1.2/sqrt(6) = 0.49.
Confidence level: The confidence level tells you how sure you can be. It is expressed as a percentage and represents how often the true percentage of the population who would pick an answer lies within the confidence interval. The 95% confidence level means you can be 95% certain; the 99% confidence level means you can be 99% certain. Most researchers use the 95% confidence level. Confidence level is a percentage of confidence in a finding. For example, if an insurance company's total Loss Reserves should be $10,000,000 in order to attain an 80% confidence level that enough money will be available to pay anticipated claims, then, in 8 times out of 10, after all claims have been settled the total claims paid out will be less than $10,000,000. Conversely, in 2 times out of 10 the total claims paid out will be greater than $10,000,000. In another example, a 70% confidence level of one's house burning would mean that the house would burn approximately once every 3.33 years [1 _ (1-0.70) = 3.33]. When you put the confidence level and the confidence interval together, you can say that you are 95% sure that the true percentage of the population is between 43% and 51%. Statistical measure of the number of times out of 100 that test results can be expected to be
within a specified range. For example, a confidence level of 95% means that the result of an action will probably meet expectations 95% of the time. Most analyses of variance or correlation are described in terms of some level of confidence. The wider the confidence interval you are willing to accept, the more certain you can be that the whole population answers would be within that range. For example, if you asked a sample of 1000 people in a city which brand of cola they preferred, and 60% said Brand A, you can be very certain that between 40 and 80% of all the people in the city actually do prefer that brand, but you cannot be so sure that between 59 and 61% of the people in the city prefer the brand.
-*-*-*-*-
MBA SEMESTER 1 MB0040 STATISTICS FOR MANAGEMENT- 4 Credits (Book ID: B1129) Assignment Set- 2 (60 Marks) Note: Each question carries 10 Marks. Answer all the questions Ques1: What are the characteristics of a good measure of central tendency? Ans: Central tendency is a statistical measure to determine a single score that defines the center of the distribution. The goal of central tendency is to find the single score that is most typical or most representative of the entire group. Three measures of central tendency are: mean, median, and mode. The mean for a distribution is the sum of the scores divided by the number of scores. SampleMean= Number of Scores Sum of the Scores M= x n Population Mean= Sum of the Scores Number of Scores = X N Population of Journalism Faculty Salaries (in thousands of dollars) 49 51 45 59 51 60 32
Sample of Psychology Faculty Salaries (in thousands of dollars) 35 33 42 55 65 46 46 50 42 Sample Mean = Interpretation
Population Mean = Interpretation
Some characteristics of the mean include: Every score influences the mean. Changing a score changes the mean. Adding or subtracting a score changes the mean (unless the score equals the mean).
If a constant value is added to every score, the same constant will be added to the mean. If a constant value is subtracted from every score, the same constant will be subtracted from the mean. If every score is multiplied or divided by a constant, the mean will change in the same way It is inappropriate to use the mean to summarize nominal and ordinal data; it is appropriate to use the mean to summarize interval and ratio data. If the distribution is skewed or has some outliers, the mean will be distorted. Median: If the scores in a distribution are listed in order, the median is the midpoint of the list. Half of the scores are below the median; half of the scores are above the median. 1. Place the data in descending order. (Ascending would have worked too.) 2. Find the score that cuts the sample into two halves. Age 19 18 21 35 40 56 0 Median = Interpretation Soda Pops Consumed Today 6 5 7 4 2 2 Median = Interpretation
Characteristics of the Median include: 1. It is inappropriate to use the median to summarize nominal data; it is appropriate to use the median to summarize ordinal, interval, and ratio data. 2. The median depends on the frequency of the scores, not on the actual values. 3. The median is not distorted by outliers or extreme scores. 4. The median is the preferred measure of central tendency when the distribution is skewed or distorted by outliers. Mode: In a frequency distribution, the mode is the score or category that has the greatest frequency.
Restaurant Meals During Past Week 6 4
Favorite Restaurant Chilis Charleys
6 7 6 5 13 8 8 Mode = Interpretation
La Siesta La Siesta O Charleys La Siesta La Siesta Outback Steakhouse Taco Bell Mode = Interpretation
Characteristics of the Mode include: The mode may be used to summarize nominal, ordinal, interval, and ratio data. There may be more than one mode. The mode may not exist. Relationships among the Mean, Median, and Mode The mean and median are equal if the distribution is symmetric. The mean, median, and mode are equal if the distribution is uni modal and symmetric. Otherwise, they do not give you the same answer. (b) What are the uses of averages? Ans: Definition of Average: n mathematics, an average, or central tendency of a data set is a measure of the "middle" value of the data set. There are many different descriptive statistics that can be chosen as a measurement of the central tendency of the data items. These include arithmetic mean, the median and the mode. Other statistical measures such as the standard deviation and the range are called measures of spread and describe how spread out the data is. An average is a single value that is meant to typify a list of values. If all the numbers in the list are the same, then this number should be used. If the numbers are not the same, the average is calculated by combining the values from the set in a specific way and computing a single number as being the average of the set. The most common method is the arithmetic mean but there are many other types of central tendency, such as median (which is used most often when the distribution of the values is skewed with some small numbers of very high values, as seen with house prices or incomes). Mean of the given numbers could be equivalent to the adding up of all the terms and total numbers of terms in the given list. The mean value is said to be average in statistics. The formula is, Mean = Average value of the given numbers
Example 1: Find the mean of the numbers, 60,47,55,39,18,22. Solution: Given numbers are, 60,47,55,39,18,22. The formula for mean is, Mean = Average value of the given numbers Here, the adding up of all the numbers = 60+47+55+39+18+22 = 241 Total number of given numbers = 6 Therefore, Mean = = 40.17 Answer: Mean = 40.17 Ques2: Your company has launched a new product .Your company is a reputed company with 50% market share of similar range of products. Your competitors also enter with their new products equivalent to your new product. Based on your earlier experience, you initially estimated that, your market share of the new product would be 50%. You carry out random sampling of 25 customers who have purchased the new product ad realize that only eight of them have actually purchased your product. Plan a hypothesis test to check whether you are likely to have a half of market share. Ans :The null hypothesis H0 : P = 0.5 Sample proportion Ps= 8/15 = 0.32 Using Z the statics we have X= (PQ/2)1/2 Z = |P PS| /X X =(0.5 *(1-0.5)/2)1/2 Z= 0.5 0.32 /0.1 = 1.8
Zcal = 1.8 At 0.01 % significance level Ztab = 2.33 Since Zcal < Ztab H0 is accepted hence at 0.01 % significance level we are likely to have half of the market share. Ques3: The upper and the lower quartile income of a group of workers are Rs 8 and Rs 3 per day respectively. Calculate the Quartile deviations and its coefficient? Ans: Unlike Range, quartile deviation doesnot involve the extreme values. It is defined as: Q.D. = | Q3 Q1| 2 Here, Q3 = Rs 8 Q1 = Rs 3 There fore, Q.D. = | 8 3 | 2 Q.D. = | 5 | 2 Q.D. = 2.5 To find Coefficient of quartile deviation, formula is Coefficient of quartile deviation = Q3 Q1 Q3 + Q1 Coefficient of Q.D. = 8 3 8+3 5 11 Therefore, Coefficient of Quartile Deviation is 0.45 =
Ques4: The cost of living index number on a certain data was 200. From the base period, the percentage increases in prices wereRent Rs 60, clothing Rs 250, Fuel and Light Rs 150 and Miscellaneous Rs 120. The weights for different groups were food 60, Rent 16, clothing 12, Fuel and Light 8 and Miscellaneous 4. Calculate the % change in food.
Ans :-
Group Food Rent Clothing Fuel & light Misc
%income in price X 60 250 150 120
Current index(P) 100+X 160 350 250 320
Weight (w) 60 16 12 8 4
P *w (100+X) 60 2560 4200 2000 880
Cost of living index of current year = 200 Pw/w=200 60X+15640/100 = 200 X= 72.67 Ques5: Education seems to be a difficult field in which to use quality techniques. One possible outcome measures for colleges is the graduation rate (the percentage of the students matriculating who graduate on time). Would you recommend using P or R charts to examine graduation rates at a school? Would this be a good measure of Quality? Ans: The p-chart is a type of control chart used to monitor the proportion of nonconforming units in a sample, where the sample proportion nonconforming is defined as the ratio of the number of nonconforming units to the sample size, n. The p-chart only accommodates "pass"/"fail"-type inspection as determined by one or more gono go gauges or tests, effectively applying the specifications to the data before they're plotted on the chart. Due to this sensitivity to the underlying assumptions, p-charts are often implemented incorrectly, with control limits that are either too wide or too narrow, leading to incorrect decisions regarding process stability. A p-chart is a form of the Individuals chart In statistical quality control R chart is a type of control chart used to monitor a variables data when samples are collected at regular intervals from a business or industrial process. The chart is advantageous in the following situations: 1. The sample size is relatively small .The sample size is constant 2. Humans must perform the calculations for the chart
The "chart" actually consists of a pair of charts: One to monitor the process standard deviation (as approximated by the sample moving range) and another to monitor the process mean,
Ques6: (a) Why do we use a chi-square test? Ans: Chi-square is a statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis. A chi-square test (also chi squared test or 2 test) is any statistical hypothesis test in which the sampling distribution of the test statistic is a chi-square distribution when the null hypothesis is true, or any in which this is asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be made to approximate a chi-square distribution as closely as desired by making the sample size large enough. For example, if, according to Mendel's laws, you expected 10 of 20 offspring from a cross to be male and the actual observed number was 8 males, then you might want to know about the "goodness to fit" between the observed and expected. Were the deviations (differences between observed and expected) the result of chance, or were they due to other factors. How much deviation can occur before you, the investigator, must conclude that something other than chance is at work, causing the observed to differ from the expected. The chi-square test is always testing what scientists call the null hypothesis, which states that there is no significant difference between the expected and observed result. The formula for calculating chi-square (Chi-square is a statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis. For example, if, according to Mendel's laws, you expected 10 of 20 offspring from a cross to be male and the actual observed number was 8 males, then you might want to know about the "goodness to fit" between the observed and expected. Were the deviations (differences between observed and expected) the result of chance, or were they due to other factors. How much deviation can occur before you, the investigator, must conclude that something other than chance is at work, causing the observed to differ from the expected. The chi-square test is always testing what scientists call the null hypothesis, which states that there is no significant difference between the expected and observed result. The chi-square is one of the most popular statistics because it is easy to calculate and interpret. There are two kinds of chi-square tests. The first is called a one-way analysis, and the second is called a two-way analysis. The purpose of both is to determine whether the observed frequencies (counts) markedly differ from the frequencies that we would expect by chance. The chi-square test statistic can be used to evaluate whether there is an association between the rows and columns in a contingency table. More specifically, this statistic can be used to determine whether there is any difference between the study groups in the proportions of the risk factor of interest. Returning to our example, the chi-square statistic could be used to test whether the proportion of individuals who smoke differs by asthmatic status.
The chi-square test statistic is designed to test the null hypothesis that there is no association between the rows and columns of a contingency table.
Example: A year group in school chooses between drama and history as below. Is there any difference between boys' and girls' choices?
Observed Chose drama Boys Girls Total 43 52 95 Chose history 55 54 109 Total 98 106 204
Expected = (row tot * col tot)/overall tot Chose drama Boys Girls Total 45.6 49.4 95 Chose history 52.4 56.6 109 Total 98 106 204
(observed - expected)^2/expected Chose drama Boys Girls Total 0.2 0.1 Chose history 0.1 0.1 0.55 Total
Chi-square is 0.55. There are (2-1)*(2-1) = 1 degree of freedom. Checking the Chi Square table shows 0.55 is between 0.004 and 3.84, so no conclusion can be drawn about independence or similarity between boys' and girls' choices.
(b) Why do we use analysis of variance? Ans: Analysis of variance (ANOVA) is a statistical technique that can be used to evaluate whether there are differences between the average value, or mean, across several population groups. With this model, the response variable is continuous in nature, whereas the predictor variables are categorical. For example, in a clinical trial of hypertensive patients, ANOVA methods could be used to compare the effectiveness of three different drugs in lowering blood pressure. Alternatively, ANOVA could be used to determine whether infant birth weight is significantly different among mothers who smoked during pregnancy relative to those who did not. In the simplest case, where two population means are being compared, ANOVA is equivalent to the independent two-sample t-test. The analysis of variance is process of resolving the total variation into its separate components that measure different sources of variance. If we have to test the equality of means between more than two populations, analysis of variance is used. To test the equality of two means of a population we use t-test. But if we have more than two populations, t-test is applied pairwise on all the populations. This pairwise comparison is practically impossible and time consuming so, we use analysis of variance. In analysis of variance all the populations of interest must have normal distribution. We assume that all the normal populations have equal variances. The populations from which the samples are taken are considered as independent. There are three methods of analysis of variance. Complete randomize design is used when one
variable is involved. When two variables are involved then Randomization complete block design is used. Latin square design is a very effective method for three variables. An analysis of the variation between all of the variables used in an experiment. Analysis of variance is used in finance in several different ways, such as to forecasting the movements of security prices by first determining which factors influence stock fluctuations. This analysis can provide valuable insight into the behavior of a security or market index under various conditions. The easiest way to understand ANOVA is through a concept known as value splitting. ANOVA splits the observed data values into components that are attributable to the different levels of the factors. Value splitting is best explained by example: The simplest example of value splitting is when we just have one level of one factor. Suppose we have a turning operation in a machine shop where we are turning pins to a diameter of .125 +/- .005 inches. Throughout the course of a day we take five samples of pins and obtain the following measurements: .125, .127, .124, .126, .128. We can split these data values into a common value (mean) and residuals (what's left over) as follows: .125 .127 .124 .126 .128 = .126 + -.001 .001 -.002 .000 .002 .126 .126 .126 .126
From these tables, also called overlays, we can easily calculate the location and spread of the data as follows: mean = .126 std. deviation = .0016.
-*-*-*-*-

Statistics Both Set

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics Both Set

Uploaded by

Copyright:

Available Formats

MBA SEMESTER 1 MB0040 STATISTICS FOR MANAGEMENT Assignment Set- 1 (60 Marks) Ques1: (a) Statistics is the backbone

Population Mean = Interpretation

Restaurant Meals During Past Week 6 4

Favorite Restaurant Chilis Charleys

Group Food Rent Clothing Fuel & light Misc

%income in price X 60 250 150 120

Current index(P) 100+X 160 350 250 320

P *w (100+X) 60 2560 4200 2000 880

You might also like