Combining Tests of Significance in Test Marketing Research Experiments

SOLOMON DUTKA*
One of the most frequently used experimental designs in test marketing research is for evaluating the effects of changes in marketing strategy or choosing between alternative strategies. Generally, several test markets are set up to reflect different marketing environments v/ithin which to test the new product or program. The author discusses a way to take advantage of all of the statistical information available from such test market experiments.
Combining Tests of Significance in Test Marketing Research Experiments
One of the most frequently used experimental designs in test marketing research is for evaluating the effects of changes in marketing strategy or choosing between altemative strategies. In such designs, one may test whether a new product, advertising campaign, pricing strategy, couponing, or point-of-sale display, singly or often in combination, will yield significantly better results than current programs or products. The results obtained from the test markets are incorporated as part of the decision process to (1) "go national," (2) make modifications and continue testing, or (3) stop testing. Generally, a number of different test markets are set up to reflect different marketing environments within which to test the new product or program. Statistically, by using such independent replicates, one helps increase the generalizability of any "significant" results obtained. In the analysis of data from the test markets the results of all markets may be pooled, but it is critical a priori to provide also for the analysis of the results from the individual replicates. Type 1 errors for such tests, familiar to most marketing researchers, occur when a statistical difference is stated to be present but in actuality there is no difference. By establishing, for example, a 95% level of significance, one controls the size of that error. Errors of the opposite type, known as Type II errors, are no less important but not as familiar. They occur when an experiment leads to the conclusion of no significant difference when in fact there is a true difference in treatments. Type II errors arise when (1) the sta-
tistical test used is not sensitive enough, (2) the sample sizes are too small, or (3) all of the evidence (especially in the case of replications) is not used. The consequences of making an error of this sort to a company testing a new product or marketing procedure could be costly. Conceivably, as a consequence of the Type II error, new products or innovations that might be successful in the marketplace may not be introduced. In decision theory such "lost opportunities" are translated into their estimated dollar values as part of a "go - no go" decision. This note offers market researchers an additional tool to take advantage of all information available from a test market experiment. Including the individual data for each of the replicates provides the sponsoring company with greater statistical sensitivity in analyzing the results and aiding in the decision process as to the next step. The problem of Type II error arises when there is apparently no statistically significant difference in the results for any individual test market, yet the data across markets show consistent differences favoring the new over the current or altemative A over altemative B. Is there a statistical test which can measure the significance of these differences when cumulated over all replicates? Is the fact that one has a number of "short straws" pointing in the same direction evidence of the possible superiority of the new over the current or A over B? As an example, let us assume that "Z"-values calculated for a one-tailed series of tests for each market in such an experimental study are as follows.
Market no. "Z"-value + 1.45 + 1.27 -0.42 + 1.60 Probability level attained (P) 1 1 1 1 .9265 =-- .0735 .8980 == .1020 .3372 == .6628 .9452 == .0548 1 2 3 4 118 Journal of Marketing Research Vol. XXI (February 1984), 118-9
*Solomon Dutka is Chief Executive Officer, Audits & Surveys, Inc.
COMBINING TESTS OF SIGNIFICANCE
119
Replicate no. 1 2 3 4 P < .05 log, 2.6105 2.2828 0.4113 2.9041 8,2087
In this illustration, three of the four markets indicate positive differences. No one market is significant at a predesignated level, in this case the .05 level of significance. One therefore might accept the "null hypothesis" and conclude that no differences exist. Note also the presence of one replicate showing a difference in the opposite direction. Fisher (1970) presents a method for combining tests of significance but does not provide for differences in the "wrong" direction ("Z" = - . 4 2 ) . Wallis (1942) elaborated on the Fisher procedure. The procedure outlined hereafter combines such tests by basing significance on the product of the probabilities among the various replicates. Allowance can be made for the occasional occurrence of negative values of "Z" by calculating the corresponding probability for a single tail. Thus, negative "Z"-values yield probabilities greater than .50 which can be combined with the probabilities less than .50 associated with the positive "Z"-values. The following theorem can be made the basis for combining such tests: if Xi, X2. xl has independent x^ distributions with n^, ni, ... n^ degrees of freedom, respectively, then Xl + X2 + XK will be distributed as X^ with Ml -I- rt2 + "K degrees of freedom. Note that this x^ transformation is introduced solely for ease of computation and not as a transformation into another statistical variate. In the particular case where n = 2, the natural log of the probability is equal to -V2X^- If we therefore take the natural log of the probability of the difference obtained for each replicate, change its sign, and double it, we have the equivalent of x^ for n = 2 degrees of freedom.
In the table, doubling the total of the natural logs, x^ = 16.4174 for n = 8 degrees of freedom, corresponds to P < .05, or a significant composite difference for the replicates at the designated confidence level. This composite test can be used for two-tailed tests as well. Note also that different degrees of freedom for each replicate are refiected in the values of P. In this particular application, because of its mathematical properties, the chi square distribution is used to combine tests involving means such as average sales or percentages such as share of market within test markets. This application should not be confused with the more usual applications of chi square for hypothesis testing using attribute data from fourfold tables. Fleiss (1973) gives an excellent discussion of such applications of chi square in combining measures of association among various populations. REFERENCES Fisher, R, (1970), Statistical Methods for Research Workers, 14th ed. New York: Hafner Publishing Co. Fleiss, J, L. (1973), Statistical Methods for Rates and Proportions. New York: John Wiley & Sons, Inc. Wallis, W. A, (1942), "Compounding Probabilities from Independent Significant Tests," Econometrica, 10, 229-48.
ADVERTISERS' INDEX
Burke Marketing Services, Inc The Ehrhart-Babic Group ISEO Kent Publishing Company Northwest Analytical Scott, Foresman and Company SPSS, Inc Survey Sampling, Inc Cover 4 F-1 p. 19 Cover 3 p, 31 p, 53 p. 64 Cover 2

Combining Tests of Significance in Test Marketing Research Experiments

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Combining Tests of Significance in Test Marketing Research Experiments

Uploaded by

Copyright:

Available Formats

SOLOMON DUTKA*

Combining Tests of Significance in Test Marketing Research Experiments

*Solomon Dutka is Chief Executive Officer, Audits & Surveys, Inc.

COMBINING TESTS OF SIGNIFICANCE

You might also like