One- and Two-Sample Tests of Hypotheses 10. 1 Statistical Hypotheses: General Concepts

January 15, 2018 | Author: Anonymous | Category: science, mathematics, statistics, algebra
Share Embed


Short Description

Download One- and Two-Sample Tests of Hypotheses 10. 1 Statistical Hypotheses: General Concepts...

Description

One- and Two-Sample Tests of Hypotheses

10. 1 Statistical Hypotheses: General Concepts Often. the problem confronting the scien tist or engineer is not so much the estimation o f a population parameter as discussed in Chapter 9. but rather the formation of a d a ta-based decision procedure that can prod uce a concl usion about some scien tific system . For example. a medical researcher may decide on the basis of experimental evidence whether coffee drinking incre ases the risk o f cancer in h u m a ns; a n engineer might have to decide on the basis of sam­ ple data whether t h ere is a difference between the accu racy of two kinds of gauges: or a sociologist might wish to collect appropriate data to en able him or her to decide whether a person's blood type and eye color are indepe ndent variables. [n each o f these cases the scie n tist or engin eer postll/mes or conjec­ tllres something a bout a system. [n addition. each must involve the use of experime ntal d ata and decision m aking that is based on the data. Formal ly. in e ach case. the conjecture can be put in the form of a statistical hypothesis. Pro­ ced u res t h a t lead to the acceptance or rejection of statistical hypotheses such as these comprise a major area of statistical inference . First. let us de fine pre­ cisely what we mean by a statistical hypothesis. Dcfinilldll to.l

A statistical hypothesis is a n assertion or conjecture concerning one o r m ore popUla t ions.

j

The truth or falsity of a statistical hypothesis is never k n own with absolute certainty u nless we examine the entire popu l ation . This. o f course. would be 290

Section 10.1 Statistical Hypotheses: General Concepts

291

impractical in most situations. I nstead. we take a random sample from the pop­ ulation of in terest a n d use (he data contained in this sample to provide evi­ dence that eith e r supports or does not support the hypothesis. E vidence from the sample that is inconsiste n t with the stated h ypothesis lea ds to a rejection of the hypothesis. where as evide nce supporting the hypothesis leads to its acceptance. It should be m a de clear to the reader that the design of a decision proce­ dure must be done with the notion in mind of the pro!Jabilil}' of (/ wrong COI/­ clusion. For example. suppose that the conjecture (the hypothesis) postulated by the engineer is that the fraction defective p in a certain process is 0.10. The experime n t is to observe a ran dom sample of the product in 4uestion. Suppose that 100 ite ms are tested and 12 i tems are found defective . Il is reasonable to conclude tha t this evide nce does not refute the condition p = 0. 1n. and thus i t m a y lea d to a n acceptance of t h e hypothesis. H owever. i t also does n o t refute p = 0.12 or perh aps even p = 0.15. As a result. the reade r m u st be acclIs­ tomed to u n derstan ding th at the acceptance of a hypothesis merely implies that the data do not give sufficient evidence to refute it. On the other hand, rejection implies that the sample evidence refutes it.

Put another way.

rejec­

tion means that there is a small probability of obtaining the sample informa­ tion observed when, in fact, the hypothesis is true. For example. in our proportion -defective h ypothesis. a sample of 100 revealing 20 defective items is ce rtainly e vide nce of rejection. Why'? I f. indeed. p = 0.10. the proba bility of obtaining 20 or more defectives is approximately 0.0035. With the resulting small risk of a wrong cOl/clllsiol/. it would seem safe to reject the hypothesis tha t p = 0. 1 0. In other words. rejection of a hypothesis It!ncls to all hut "rule ouf' the h ypothesis. On the other hand. it is very important to emphasize that the acceptance or. rather. failure to reject does not rule out other possihilities. As a result. th e fir/ll conclusiol/ is established hy rhe dow (Il/lIlyst when II hypothesis is rejected. The formal statement of a h ypothesis is often influenced hy the structure of the prohahility of a wrong conclusion. If the scientist is intereste d in strongly supporting a contention. he or she hopes to arrive at the contention in the form of rejection of a hypothesis. I f the me dical researcher wishes to show strong evidence in favor of the contention that coffee drin king increases the risk of cancer. the h ypothesis teste d should be of the form "the re is no increase in cancer risk produced hy drin ki ng coffee." As a result. th e conten tion is reached via a rejection. Similarly. to support the claim that one ki n d of gauge is more accurate tha n another. the engineer tests the hypothesis that there is no difference in the accuracy of th e two kin ds of gauges.

The .

and /Hternative Hvpotheses

The structure of hypothesis testing will be formulated with the use of the term This refers to any h ypoth esis we wish to test and is denoted hy Ho. The rej ection of HI) leads to the acceptance of an alternative hypothe­ sis. denoted by H I' A null h ypothesis concerning a population parameter will

null hypothesis.

292

Chapter 10

One- and Two-Sample Tests of Hypotheses

always be stated so as to specify an exact value of the paramete r, w he re as the a l ternative hypothesis allows for the possibility of several va l ues. H ence, if HI) is th e n u l l h ypothesis p = 0.5 for a binomial popul atio n , th e alte rn a tive hypoth esis HI would be one of th e following:

p > 0.5,

p <

or,

0.5,

p =1=

0.5 .

10.2 Testing a Statistical Hypothesis

L.

To illustrate the concepts used in testing a statistical hypothesis about a pop­ ulation, consider th e fo l lowing example. A ce rtain type of cold vaccin e is known to be only 25% e ffective afte r a period of 2 years. To determine if a n e w and somewhat more expensive vaccine is superior i n providing p rotection against the same virus for a longer period of time , suppose that 20 people are chosen at ran dom and inoculated. In a n actual study of this type the partici­ pan ts receiving th e new vaccine m ight number several thousand. The number 20 is being used h e re only to demonstrate the basic steps in carrying out a sta­ tistical test. I f more than 8 of those receiving th e new vaccine surpass th e 2year period without contracting the virus, th e new vaccine will be considered superior to the one presently in use. The requirement that the number exceed 8 is somewhat arbitrary but a ppears reasonable in that it represents a modest gain over the 5 people that could be expected to receive protection if the 20 people had been inoculated with the vaccine already in use. We are essentially testing the n u l l h ypothesis tha t the n e w vaccine is equal l y effective after a pe riod of 2 years as the one now commonly used. The alternative hypoth esis is that the n e w vaccine is in fact superior. This is e quivalent to testing the hypothesis th at the binomial parameter for thl proba bil ity of a success on a give n tria l is p = 1/4 against th e alternative that p > 1/-1-. This is usually writ­ ten as follows:

p= p>

I

4' I

4'

The test statistic on which we base our decision is X. the number of indi­ viduals in our test group who receive protection from the new vaccine for a pe riod of at least 2 years. The possible values of X. from () to 20, are divided into two groups: those numbers less than or equal to 8 and those greater th an 8. All possible scores greater than 8 constitute the critical region, and all pos­ sible scores less th an or e qual to 8 determine the acceptance region. The last number th at we observe in passing from the acceptance region into the critical region is called the critical value. I n our il lustration the critical va l ue is the num­ ber 8 . The refore, if x > 8, we reject Ho in favor of the alternative hypothesis H I' I f x � 8, we accept Ho. This decision crite rion is illustrated in Figure 10.1. The decision procedure j ust described could lead to either of two wrollg conclusions. For instance , the n e w vaccine may be no bette r than the one now

293

Section 10.2 Te sting a Statistical Hypothesis

Accept Ho I

0

(p=0.2)

2

I 7

.

Figure

I 8

I,

9

Reject

Ho

(p> 0.2)

I

10

1 0.1 Decision criterion for testing p

20

.

=

0.2 versus p

>

x

0.2.

in use and, for this particular randomly selected group of individuals, more than 8 surpass the 2-year period without contracting the virus. We would be committing a n error by rejecting HI) in favor of HI when. in fact, Ho is true. Such an error is called a type I error. l>efinitiun

111.2

Rejection of the n u ll h ypothesis w h e n it is true is called a type I

error.

A second kind of error is committed if 8 or fewer of the group surpass the 2-year period successful ly and we conclude that the new vaccine is no bette r when it actu a l ly is better. I n this case we would accept HI) when it is false. This is called a type II error. Definition

10.3

Acceptance of the n ull hypothesis when it is false is called a

type

n error.

I n testing any s tatistical hypothesis, the re are four possible situations that determine whether our decision is correct or in error. These four situations are summarized in Table 10.1. Table 1 0. 1

Possible Situations for Testing a Statistical Hypothesis Ho Is true

Ho Is false

Accept HI)

Correct decision

Type II error

Reject HI)

Type I error

Correct decision

The probability of committing a type I error, also called the level of sig­ is denoted by the Greek letter a. I n our illustration, a type I error will occur when more tha n 8 ind ividuals surpass the 2-year period without con­ tracting the virus using a new vaccine tha t is actually equiva len t to the one in use. H ence, I f X is the n umber of individuals who remain free of the virus for at least 2 years.

nificance,

a

=

P(type I error) = P

=

I

-

Lh K

\=0

(

x;

20,

)

(

X > 8 when p

1 = I 4 -.

-

I = -

4

)

=

21)

Lb

.\=9

(

"

.

x;

20 .

1 .-

4

)

0.9591 = 0.0409.

We say that the n u l l h ypothesis. p = 1/4. is being tested at the a = (l.O409 level of significance. Sometime s the l e ve l of significance is called the size of

2:}:'$

Chapter 10

One- and Two-Sample Tests of Hvpotheses

the critical region. A critical region of size 0.0409 is very small and therefore it is u n likel y that a type I error wi l l be com mitted. Consequently, it would be most unusual for more th a n 8 in dividuals to remain i m mune to a virus for a 2-year period using a new vaccine tha t is essentially equivalent to the one now on the market. The probability of committ i ng a type II error, denoted by (3, is impossible to compute u nless we have a specific alternative hypothesis. I f we test the null hypothesis that p = 1/4 against the alternative hypothesis that p = 1/2, then we are able to com pute the probability of accepting Ho w he n it is false. We simply find the probability of obtaining 8 or fewer in the group that surpass the 2-year period when p == 1/2. I n this case f3 = P(type II error) =

p(x,,;:

8 whe n p =

i) � ( =

I

b X; 20,

i)

==

0. 2517 .

This is a rather h ig h probability. indicating a test procedure in wh ich i t is quite likely that we shall reject the new vaccine when, i n fact, it is superior to tha t n o w i n use . I de a l ly, w e like to use a test procedure for which both the type I and type I I errors are small. I t is possible that the director of the testing program is willing to make a type II error if th e more expensive vaccine is not significantly superior. I n fact the on l y time he wishes to g uard against the t ype If error is when the true value of p is a t least 0.7. If p = 0.7. th i s test procedure gives f3 = P(type II error)

P(X";: 8 when p = 0.7) =

=

x

2: b(x; 20. 0.7) = 0.0051.

r=--O

With such a small probabil i ty of com mitting a type II error. it is extremely unlikely th at the new vacci ne would be rejected when it is 70o/r effective after a period of 2 years. As the alternative h ypothesis approaches unity. the val ue of {3 diminishes to zero. Let us assume tha t the director of the testi n g program is unwilling to com ­ mit a type [ [ error when th e al ternati ve h ypothesis p == 1/2 is true e ven though we have found the probabil ity of such an error to be f3 == 0. 2517. A reduction in f3 is always possible by increasing the size of the critical region. For exam­ ple . consider what happe ns to the values of Cl' and {3 when we change our crit­ ical value to 7 so that all score s greater than 7 fall in the criti cal region and those Jess th an or equal to 7 fall i n the acceptance regi on. Now. in testi ng p = 1/4 aga inst the alternat ive h ypothesis th at p = 1/2. we fi n d that Cl' =

and

'II

2: h

\

_.�

(

x;

I

20. '

4

)

= 1

f3 =

-

2: b

\

7 �(]

(

x:

± h(X: 1)' 20,

\

�II

'

I

)

20. - = 1 - 0.8982 = 0. 1018 4

2

=

0.1316.

By adopting a new decision procedure. we have reduced the probability of committing a type II error at the expense of increasing the probability of

Section 10.2

Te sting a Statistical Hypothesis

295

committing a type I error. For a fixed sample size, a decrease in the probabil­ ity of one error will usually resul t in an increase in the probability of the other error. Fortunately, the probability of com mitting both types of error can be reduced by increasing the sampie size. Con sider the same problem using a ran­ dom sample of 100 individuals. If more than 36 of the group surpass th e 2-year period, we rej ec t the n u ll h ypothesis that p = 1/4 and accept the alternative h ypothesis that p > 1/4. The critical value is now 36. A l l possible scores above 36 constitute the critical region an d all possible scores l ess than or equal to 36 fall in the acceptan ce region. To determine the probab ility of committing a type I error, we shall use the normal -curve approximation with J.L

= np = (lOO)(�) = 25

and

vnpq =

(T =

V(100)(�)(�)

=

4.33.

Referring to Figure 10.2, we need the area u nder the normal curve to th e right of x = 36.5. The corresponding z-value is z

36.5 - 25

=---4.-n- =

2.66.



----------------

� � �------

tL = 25

Figure

10.2 Probability of a type

From Table A.3 we find that 0' =

=

p(type I error)

=

1 - P(Z < 2.66)

p(X

=

I

=

�- x

error.

> 36 when p =

1 - 0.996 1

--

�)

=P(Z > 2,66)

0.0039.

I f Ho is false and th e true value of H I is p = 1/2, we can determine the probability of a type I I error using the normal -curve approximation with J.L =

np = ( 1 00)(�) =

50

and

(T

= vnpq = . � ..

V(100)( I )6I ) = 5. 2

The probability of falling in the acceptance region when H I is true is given by the area of the shaded region to the l eft of x = 36.5 in Figure 10.3. The z-value correspon ding to x = 36.5 is z

36.5 - 50 = -5--=

-

2.7.

-

296

Chapter 10 One- and Two-Sample Tests of Hypotheses

(T=

5

/ ____________ __

-LI

25

-L------------ x

----------------------

Figure

There fore, f3

=

P(type I I error) =

50

1 0.3 Proba b i l ity of a type II e r ror.

p(

X � 36 when p =

D

=

P(Z <

-

2.7) = 0.0035.

O bviously, the type I and type I I errors will rarely occur if the experiment con­ sists of 100 individuals. The illustration above u n derscores the strategy of the sci entist in hypoth­ esis testing. A fter the null and alternative h ypotheses are stated, it is i mpor­ ta n t to consider the sensitivity of the test procedure . By this we mean that t here should be a determi nation, for a fixed (1' , of a reasonable value for th e probability of wro ngly accepti ng HI! (i.e .. the value of (3) when th e true situa­ ti on represents some imporlilnt del'iafioll frolll H11. The value of the samole size can usually be de termined for which there is a re as onab le balance he twee n i t and the \alue of f3 c ompu te d in this fashion. Th e v acci ne probl e m is an illustration.

The cOllcepts discussed here for a discrete popUlation can equally well be applied to continuous popUlations. Consider the null hypothesis that the aver­

age weight of male student-. in a certain college is oX kilograms against the

,i 1ternati\e hypothesis

that It is une q u al to oK That is. we wish to test 1111 :

/1

1/1:

/-i

=

=1=

fiX, ()x.

The alternative hypothesis allows for the possihility that /1 '

A sample mean that falls close

oX or /1 -> flS.

the hypothesized value of fiX would

to

be is

considL'/'ed evidence in favor of fill' On the other ham!. a sample mean that considerably less than or more than IlX

ul d he

wo

evidence inconsistent with IlII

and therefore favoring 111' The "ample mean is t h e test statistic in thi s case. A

critical region for the test statistic might arh it rar ily be chosen t o

be t he tw o i nt er vals .r < 67 and .r > flY. T he ,lCceptance region will th en he t he interval 67 % x "" 6Y. This decision criterion is illustrated in F i gur e lOA. Ho (JL # 68)

Reject

Accept

(f..' 67

Figure 10.4

=

Ho

Reject Ho (JL # 68)

68)

68 Probab i lity of a type

69

II

e r ror.

Section 10.2

Testing a Statistical Hypothesis

297

Let us now use the decision criterion of Figure 10.4 to calculate the prob­ abilities of committing type I and type I I errors when testing the null hypoth­ esis that J.L = 68 kilograms against the alternative that J.L =1= 68 ki lograms for the continuous population of students' weights. Assume the standard deviation of the population of weights to be (T = 3.6. For large samples we may substitute s for (T if no other estimate of (T is avail aQ le. Our decision statistic, based on a random sample of size n = 36, will be X, the most efficient estimator of J.L. From the central limit theorem, we know that the sampling distribution of X is approximately normal with standard deviation (Tx (T/ Vii = 3.6/6 = 0.6. The probability of committing a type I e rror, or the level of significance of our test, is equal to the sum of the areas that have been shaded in each tail of the distribution in Figure 1 0.5. Therefore, =

a =

P(X < 67 when J.L

=

68)

+

P(X > 69 when J.L = 68 ).

�-------L--�--��-- x 69

67

Figure

1 0.5 Critical region for testing

The z-values corresponding to Xl = Zl

=

67 - 68

-0.6""

=

-

1 .67

J.L = 68

67 and x2 and

versus

=

69 when

Z2

=

J.L

*

68.

flo is true are

69 - 68 (i�6

=

1 .67.

The refore, a =

P(Z

<

- 1 .67)

+

P(Z

>

1 .67) = 2P(Z

- 1 .67)

<

=

0.0950.

Thus 9.5 % of all samples of size 36 would lead us to reject J.L = 68 k ilograms when it is true . To reduce a, we have a choice of increasing the sample size or widening the acceptance region. Suppose that we increase the sample size to n = 64. Then (Tx = 3.6/8 = 0.45. N ow � = "I

67 - 68 0.45 ...

...

"

=

-

2.22

and

Z2 =

69 - 68 . ... .. = 2.22. 0.45 "

H ence a =

P(Z

<

- 2.22)

+

P(Z > 2.22) = 2P(Z

<

- 2.22)

=

0.0264.

The reduction in a is not sufficient by itself to guarantee a good testing procedure. We must eval uate f3 for various alternative hypotheses that we feel should be accepted if true . Therefore, if it is important to reject Ho when the

298

Chapter 10 One- and Two-Sample Tests of Hvpotheses

true mean is some va lue f.L � 70 or f.L � 66, then the prohahility of committing a type I I error should he computed and examined for the altern atives f.L == 66 and f.L = 70. Because of sym m e try, it is only necessary to consider the proha­ hility of accepting the n ull hypothesis that f.L == 68 when the alternative f.L == 70 is true. A type I I error will result when the sampl e mean _r falls he tween 67 and 69 when HI is true. Therefore, referring to Figure 10.6. we find that f3 == P(67 � X � 69 when f.L == 70). ·,Ho

�____________IL______ X

________

67

Figure

70

69

68

10.6 Type" error for testin g

The z-values corresponding to

68 versus

J1. =

70.

and x2 == 69 when HI is true are

.rl == 67

67 - 70 Z I == - -- -- == - 6.67 0.45

J1. =

Z2 =

and

69 - 70 - - == - 2.22. 0.45

-�

Therefore. f3 == P( - 6.67 < Z < - 2 .22) == 0.0132 -

JI f

!t

=

P(Z < - 2.22) - P(Z < - 6.(7)

O.O()OO = 0.0132.

If the t rue va lue of f.L is the alternative f.L == 66. the va lue of f3 wi ll again he 0.0132. For all plissi hle values of f.L < fJ6 or f.L > 70. the value of f3 will he even smaller when 11 == 64. and consequently there would he little chance of accept ing H(( when it is false. The prohahility of committing a type [[ error in creases rapi dly when the true value of f.L approaches. hut is not eq ual to. the hypothesized value. Of course. this is usually the situation where we do not mind maki ng a type" error. For example. if the alternative h ypoth esis f.L == 6:-;.5 is true. we do not mind commit! ing. a type [[ error by concludi ng that the true answer is f.L == 6k. The prohabi li t y of making such an error \vi ll be high when 11 64. Referring to Figure 10. 7. we have ==

f3 == P(67 � X � 69 when f.L == 6k.5). H,

---LI

________

67

Figure

I

( I --'-I

...JIL-__

68

...JI

__

68.5

1 0.7 Type II error for testin g

____

69

J1. =

68 versus

!.L

x =

68.5.

299

Section 10.2 Te sting a Statistical Hypothesis

The z-values corresponding to.t] = z

]

=

67 - 68.5 0.45

.------

=

67 and .t2

- 3. 33

=

69 when JJ..

Z2 =

and

=

69 - 68.5 ' 0.45

.

68.5 are = 1.11.

Therefore, f3 = =

P( - 3.33 <

Z <

0.8665 - 0.0004

1. l 1) =

=

P(Z

<

1 .11) - P(Z

<

- 3.33)

0.866 1 .

The preceding examples illustrate the following important properties: 1. The type I e rror and type I I error are re lated. A decrease in the probabil­

ity of one generally results in an increase in the probability of the other. 2. The size of the critical region, and therefore the probability of committing

a type I error, can always be reduced by adj usting the critical value(s). 3. A n increase in the sample size n will reduce

Q'

and {3 simul taneously.

4. If the n u l l hyphothesis is false, {3 is a maximum when the true value of

a parameter approaches the hypothesized value. The greater the dis­ tance between the true value and the hypothesized value, the smaller {3 will be. One very i mportant concept that relates to error probabil i ties is the notion of the power of a test. Definition lOA

I ����

IT he power of a test is the probability of rejecting Ho given that a specific tive is true .

_

__

_ _

The power of a test can be computed as I - {3. Often different types of tests are compared by contrasting power properties. Consider the previous illustration in wh ich we were testing Ho: JJ.. = 68 and HI: JJ.. 1= 68. As before, suppose we are interested in assessing the sensitivity of the test. The test is gov­ erned by the rule that we accept if 67 :!S X � 69. We seek the capability of the test for properly rejecting Ho when indeed JJ.. = 68.5. We have seen that the probabil ity of a type I I error is given by f3 = 0.866 1 . Thus the power of the test is 1 - 0.866 1 = 0. 1 339. In a sense , the power is a more succinct measure of how sensitive the test is for "detecting differences" between a mean of 68 and 68.5. I n this case, if JJ.. is truly 68.5, the test as described will properly reject Hn only 13.39% of (he iime. As a result, the test would not be a good one if it is important that the analyst h ave a reasonable chance of truly distinguishing between a mean of 68.0 (specified by Ho) and a mean of 68.5. From the fore­ going, it is clear that to produce a desirable power (say, greater than 0.8), one must either increase Q' or increase the sample size. In what has preceded in this chapter, much of the text on hypothesis test­ ing revolves around foundations and definitions. I n t he sections that foll ow we get more specific and put hypotheses in categories as well as discuss tests of

300

Chapter 10

One- and Two-Sample Tests of Hvpotheses

h ypotheses on various parameters of i n terest. We begin by drawing th e dis­ tinction between a one-sided and two-sided h ypoth esis.

10.3 One- and Two-Tailed Tests A

test of any statistical hypoth esis, where the alternative is Ho:

8 = 81i,

HI:

8> 80,

or perhaps

Ho:

8 = 80,

lfl :

8 < 811,

one-sided,

such as

is called a one-tailed test. I n Section 10.2, we make reference to th e test statistic for a h ypoth esis. General ly, the criti cal region for the alternative hypoth esis 8> 80 lies in th e rig h t tai l of th e distribution of th e test statistic, wh i l e the criti cal region for th e alternative hypothesis 8 < ell l i es entirely in th e left tai l . In a sense, th e inequal i ty symbol poin ts i n th e direction where the critical regi on l ies. A on e­ tai led test is used in the vaccine experime n t of Section 10.2 to test the h ypoth­ esis p = 1/4 agai nst th e on e-s i de d alternative p > 1/4 for the bi nomial distri bution. Th e one-tai l e d criti cal region is usual l y obvious. For an under­ standing the reader should visual i ze the beh avior of th e test statistic an d notice th e obvious siRna/ that wou l d produce evidence supporti ng th e alter­ native hypothesis. A test of any statistical hypothesis where the alternative is two-sided, such as

t I

is cal led a two-tailed test, si nce the critical region is spl i t into two parts, ofte n having e qual probabi l ities placed in each t HII• A two-tai l e d test was used to test the n u l l hypothsis that f..t = 61-1 kilograms agai nst the two-sided alternative f..t =1= 61-1 kilograms for the contin uous popu­ lation of student weights in Section 10.2. The null hypothes is, HII, will always be stated using the eLju al ity sign so as to specify a single val ue. In th is way the probability of committing a type I error can be control led. Whe th er one sets up a one-tai l ed or a two-tai led test wi l l depend on the conclusion to be drawn if Ho i s rejected. The l ocation of th e critical region can be determi n ed on ly after HI h as been stated. For exampl e, i n testi ng a new drug, on e sets up the hypoth esis that it is no better than sim­ ilar drugs now on the market and tests this agai nst the alternative hypoth esis that the new drug is superior. Such an alternative hypoth esis wi l l result in a on e-tail ed test with the critical region in the right tai l. However, if we wish to compare a n e w teach ing techn ique w i th the conven tional classroom proce­ dure, the alternative hypothesis shou l d al low for the new approach to be either inferior or superi or to the conventional procedure. Hence th e test is two-tai led

Section 10.3 One- and Two-Tailed Tests

301

wi th the critical region divided equally so as to fal l in the extreme left and right tai ls of the distribution of our statistic. Certain guidelines are desirable in determining which hypothesis should be stated as Ho and which should be stated as HI . First, read the problem care­ ful ly and determine the claim that you want to test. Should the claim suggest a simple direction such as more than, less than, superior to, inferior to, and so on, then HI wil l be stated using the inequality symbol ( < or > ) correspond­ ing to the suggested direction. If, for example, in testing a new drug we wish to show strong evidence that more than 30% of the people will be helped, we immediately write HI: p > 0.3 and then the n u l l hypothesis is written Ho: p = 0.3. Should the claim suggest a compound direction (equality as well as direction) such as at least, equal to or greater, at most, no more than, and so on, then this entire compoun d direction ( � or � ) is expressed as Ho' but using only the equality sign , and HI is given by the opposite direction. Finally, if no direction wh atsoever is suggested by the claim, then HI is stated using the not equal symbol ( *- ) . Example 10.1 A man ufacturer of a certain brand of rice cereal claims that the average saturated fat conte nt does not exceed 1.5 grams. State the null and alternative hypotheses to be used in testing this claim and determine where the critical region is located.

SOLUTION

The manufacturer's claim should be rejected only if J-L is greater than 1.5 mil­ ligrams and should be accepted if J-L is less than or equal to 1.5 milligrams. Since the null hypothesis always specifies a single value of the parameter, we test Ho :

J-L = 1.5,

H I:

J-L> 1.5.

A lthough we have stated the null hypothesis with an equal sign, it is under­ stood to inc lude any value not specified by the alternative hypothesis. Conse­ q uently, the acceptance of HI! does not imply that J-L is exactly equal to 1.5 mi lligrams but rather that we do n ot h ave sufficient evidence favoring HI' Since we have a one-tailed test, the greater than symbol indicates that the crit­ i£a l region lies entirely in the right tail of the distribution of our test statistic

X.



Example 10.2 A real estate agent claims that 60% of all private residences being bui l t today are 3-bedroom homes. To test this claim, a large sample of new residences is inspected: the proportion of these homes with 3 bedrooms is recorded and used as our test statistic. State the null and alternative hypotheses 10 be used in this test and determine the 10catio'1 of the critical region.

SOLUTION

If the test statistic is substanti a l ly higher or lower than p = 0.6, we would reject the agent's claim. Hence we should m ake the hypothesis

302

Chapter 10

One- and Two-Sample Tests of Hvpotheses

Ho :

P = 0.6,

iii:

P *- 0.6.

The a l ternative hypothesis implies a two-tailed t �st with the crit ical region divided equal l y in both tails of the distribution of p, our test statistic.

10.4 The Use of P-Values for Decision Making In testing hypoth e ses in which the test statistic is discrete, t h e critical region may be chosen arbitrarily a n d its size determined. If a is too large, it can be reduced by making an adjustment in the critical val ue. I t may be n ecessary to increase the sampl e size to offset the decrease that occurs automatical l y in the power of t h e test. Over a number of gen erations of statistical analysis, it had become cus­ tomary to choose an a of 0.05 or 0.0 I and select th e critical region accordingly. Then , of course, strict rejection or nonrejection of HIJ wou l d depen d on tha t critical region. For example. i f t h e test is two-t ail ed and (Y is set a t the 0.1)) leve l of significance and the test statistic invol ves, say, the sta n dard normal distrib­ ution, then a z-va lue is observed from the data and the critical region is z > 1.96.

z <

-

1.96,

where the va lue 1.96 is found as ZIl.II.:'5 in Table A .3. A value of :: in the critical region prompts t h e state ment: 'The val ue of t h e test sta tistic is significant." We can translate that into the user's la nguage . For example. if the h ypot hesis is given by

J

fill:

f.L = 10,

"I:

f.L =to 10.

one might say: "The mean differs sign ifica ntly from the va lue 10." This preselection of a significance level a has its roots in the philosophy that the maximum risk of making a type [ error should be controlled. How­ ever. this approach does not account for values of test statistics that are "c1ose" to the critical region. Suppose. for example, in t h e il l u stnltion wit h /II': f.L = 10: III: f.L *- 10, a va lue of z = I.k7 is ohserved: strictly spea king, with a = 0.0) the va lue is not significant. But the risk of cummit ting a t ype I e rror if on e rejects iii I in t h is case cou l d hardly be con sidered severe. In fac t . in a two-tailed scenario one can quantify t h is risk as P = 2P(z >

UP

when f.L = 10) = 2(0.0307) = 0,0614.

As a resu l t . 0,0614 is the probabil it y of ohtaining a va l ue of z as large or larger (in magnitude) than l.k7 wh en in fact f.L = 10. Although this evidence against fill is not as strong as that which would resu l t from a rejection at an a = 0.0) le vel . it is important information to t h e user. I n deed. comin ued use o f a = O.OS or 0.0 I i s only a result of what standards have b e e n passed t h rough the generations. The P-value approach has been adopted extensively by users

Section

10.4 The Use of P-Values for Decision Making

303

in applied statistics. The approach is designed to give the user an alternative (in terms of a probability) to a mere "reject" or "do not reject" conclusion. The P-value computation also gives the user important information when the z-value falls well into the ordinary critical region. For example, if z is 2.73, it is informative for the user to observe that P =

2(0.0032)

=

0.0064

and thus the z-value is significant at a level considerably l ess than 0.05. It is important to k now that under the condition of Ho. a value of z = 2.73 is an extremely rare even t. N amely, a value at least that large in magnitude wou l d o n l y occur 64 times in 1 0.000 experiments. One very simple way of explaining a P-value graphically is to consider two distinct samples prematurely. Suppose that two materials are considered for coating a particular type of metal in order to inh ibit corrosion. Specimens are obtained and one co flection is coated with material 1 and one collection coated with material 2. The sample sizes are n, n2 10 for each sample and corro­ sion was measured in percent of surface area affected. The hypothesis is that the samples came from common distributions with mean Ji = 1 0. Let us assume that t he population variance is 1 .0. Then we are testing =

Ho:

Ji, = Ji2

=

=

1 0.

Let Figure 10.8 represent a point plot of the data; the data are placed on the distribution stated by the null hypothesis. Now it seems clear that the data do refute the null hypothesis. But how can this be summarized in one number? The P-value can be viewed as simply the probability of obtaining this data set given that the samples come from the distribution depicted. Clearly. this prob­ abi l ity is quite small. say 0.00000001 ! Thus the small P-value clearly refutes Ho. and the conclusion is that the population means are significantly different.

J.L

Figure

=

10

v,

1 0.8 Data that are likely generated from populations having two different means.

The P-val ue approach as an aid in decision m a king is quite natural because nearly all computer packages that provide hypothesis-testing compu­ tation print out P-values along with values of the appropriate test statistic. The fol lowing is a formal definition of a P-value. Detinition 10.5

[

A P-value is the lowest level (of significance) at which the observed value of the test statistic is significant.

Chapter 10 One, and Two-Sample Tests of Hypotheses

304

It might be appropriate at this poi n t to s u m marize the procedures f( h ypoth eh 111;11 O . Consider first the hypothesis Ho :

J-L == J-Lo .

HI :

J-L =1= J-Lo '

The appropriate test statistic should be based o n t h e random variable X. I n Chapter 8 . the central l imit theorem i s introduced, �hich essentially states that despite the distribution of X, the random variable X has approximately a nor­ mal distribution with mean J-L and variance (T 2 /f1 for reasonably large sample sizes. So, J-Lx = J-L and (T� = (T2/n. We can then determine a critical region based on the computed sample average, X. It shou ld be clear to the reader by now that there will be a two-tailed critical region for the test. It is convenient to standardize X and formally involve the standard nOf­ mal random variable Z. where Z

= (T/'v

X - J-L ..

...

n

.

We know that lInder HI ) , that is. if J-L = J-LI ) . then ( X N(O. 1 ) distribution. and hence the expression

P

(

7 (t Il - "" ,.

<



-

/

J-Lo

CT \ /1

<

7

, ,-

"' n / ")

)

= 1

- J-LlI )/(T/'v

/1

has an

- 0'

can be used to write an appropriate acceptance region. The reader should keep in mind that, formally, the critical region is designed to contro l 0', the proba­ bi lity of type I error. It should be obvious that a two-tailed signal of evidence is needed to support HI ' Thus, given a computed value x, the formal test involves rejecting Ho if the computed test statistic Z ==

.-.- .. .

x

J-L() > (T/Yn .

Z a/_"

or

Z

<

- Z,,/2 '

If - Z a /2 < Z < Z ,, /2 ' do not reject H() . Rejection of Ho , of course. implies acceptance of the a lternative hypothesis J-L =1= J-Lo . With this definition of the

Section 10.5 Single Sample: Tests Concerning a Single Mean (Variance Known)

307

critical region it should be clear that there will be probability ex of rejecting Ho (falling into the critical region) when, indeed, IL = ILo . Although it is easier to understand the critical region written in terms of Z , we write the same critical region i n terms of the computed average x. The following can be written as an identical decision procedure: reject Ho if x > b or x < a, where a

=

ILo

-

Z a/2 Vn' a

a

b = ILo + Z a/2 Vii '

Hence, for an ex level of significance, the critical values of the random variable Z and x are both depicted in Figure 10.9.

X -scale z -scale

Figure

1 0.9

Critical region for the alternative hypothesis Ii * lio '

Tests of one-sided hypotheses on the mean involve the same statistic described in the two-sided case. The difference, of course, is that the critical region is only in one tail of the standard normal distribution. As a result, for example, suppose that we seek to test Ho :

IL = ILo ,

HI:

IL > ILo '

The signal that favors H I comes from large values of z . Thus rejection of H o results when the computed Z > z a ' Obviously, if the alternative is H I : IL < ILo , the critical region is entirely in the lower t ai l and thus rejection results from Z < za ' The following two e xamples illustrate tests on means for the case in which a is known. -

Example 1 0.3 A random sample of 100 recorded deaths in the United States during the past year showed an average life span of 71.8 years. Assuming a population standard deviation of 8.9 years, does this seem to indicate that t he mean life span today is greater than 70 years? Use a 0.05 level of significance. SOLUTION

1. Ho :

IL

=

70 years.

308

Chapter 70 One- and Two-Sample Tests of Hvpotheses

M>

2. H I :

3.

ct ==

70 years.

0.05.

4. Critica l region:

Z > 1 .64) .

5. Co m p u tati on s :

x == 7 1 .8

where z

years.

==

x -

- '/

== 8.9

(T

Mil

. _- .

(T \ n

years. a n d ;: ==

8 9;--0- (}O 7 1 .8

70

1

.

==

2.02.

6. De cision :

Rej ect H" a n d con cl ude th a t t h e mean l ife span today is greater than 70 years.

In Example I 0.3 the P-va lue correspon ding to :: area o f the shaded region in Figure 1 0. 1 0. .

10. 1 0

P-vaJue

2.02 i s given by

the

----- z

o Figure

==

2.02

for

Example 10.3.

U s i ng Ta ble A .3 . we have P

==

P(Z > 2 .(2 )

==

0.02 1 7.

As a re s u l t . t he e v i d e n ce i n favor of HI is e v e n s t ro n g e r t h a n t h a t s u ggested by a 0.05 l e v e l of s i gn i ficance.

Examph.' lilA



A m a n u fa c t u r e r of sports e q u i p m e n t h a s d e ve l oped a n e w syn­

t h e t i c fi s h i n g line t h a t h e c l a i m s h a s

iI

m e a n h r ea k i n g s t re n g t h o f H k i lograms

w i t h a s t a n d a rd d e v i ta. II, + n, - 2 ·

Example 10.6

An e xperiment was pe rformed to compare the a brasive wear of two different l aminated materials. Twelve pieces of material 1 were tested by exposing each piece to a mach ine measuring wear. Ten pieces of material 2 were similarly tested. In each case, the depth of wear was observed. The sam­ ples of material ! gave an average ( coded ) wear of 85 un its with a sample stan­ dard deviation of 4, while the samples of material 2 gave an average of 81 and a sample standard deviation of 5. Can we cone/ude at the 0.05 level of signifi­ cance that the abrasive wear of material I exceeds that of m aterial 2 by more than 2 units? Assum e the popu lations to be approx imately normal with equal variances.

SOLUTION

Let J.LI and f-L2 represent the popUlation means of the abrasive wear for mate­ rial 1 and material 2, respectively.

Section

5. Computations:

.\-1

= 85,

.tz

=

Hence

=

p S

(= P

=

10.8 Two Samples: Tests on Two Means

315

81,

�(Tl'��I�-�) �j}25i (85 - 81 ) 4.4 7 8

\1'( 1 / 1 2 )

peT >

=

- 2 +

( 1 / 1 0)

4. 4 78,

=

1 . 04 ,

1 .04 ) = 0. 1 6.

6. Decision:

Do not reject HI ) . We are unable to conclude that the abrasive wear of material 1 exceeds that of material 2 by more than 2 units. _

Unkno wn But Unequal Variances

There are situations where the analyst is not able to assume that (T2 Recall from Chapter 9 that. if the populations are normal, the statistic (

I

_ -

=

(TZ '

( X -1 Xl ) - du Sf s�

- -- '" . c cC"'-CC--

� ;tl + ,;�

has an approximate (-distribution with approximate degrees of freedom v

=

2 (S12 /n ..l + si./nz ... ..) . .. .. . . . . [ (sUn l f/ ( n l - 1 ) 1 + [ (si !nz )z/ (112 - 1 ) ] ' .

.. .

. .

.

. ..

As a result the test procedure is to flO( reject Hu when - (,,;2. /. ' <

t'

< {"fZ .

t"

with v given as above. Again, as in the case of the pooled (-test, one-sided alt ernatives suggest one-sided critical regions. Paired Observations

When the student of statistics studies the two-sample (-test or confidence inter­ val on the difference between means, he or she should realize that some ele­ mentary notions dealing in experimental design become relevant and must be 2. 1 45 , where t = degrees of freedom. 5. Computations:

Therefore,

d - d Sd

.,.2 with v / Vn

= 14

The sample mean and standard deviation for the d, s are '

d =

9.848 t

=

and

9.848 0 ____ 1 8.474/ VlS

Sri

=

=

1 8.474.

2.06.

6. Though the {-statistic is not significant at the 0.05 level,

p = p( 1 TI > 2.06) = 0.06.

31 8

Chapter 10

One- and Two-Sample Tests of Hypotheses

As a result, there is some evidence that there is a difference in mean cir­ culating leve ls of androgen . I n the case of paired observations, i t is important t h a t there b e no inter­ action between the treatments and the experimental units. This was discussed in Chapter 9 in the development of confidence intervals. The no interaction assumption impl ies that the effect of the experimental, or pairing, unit is the same for each of the two treatments. In Example 1 0.7 . we are assuming that the effect of the deer is the same for the two conditions under study, namely "at injection" and 30 minutes after injection. • Annotated Computer Printout For Paired T-test

Figure 1 0. 1 3 displays a SAS computer printout for a paired (-test using the data of Example 1 0.7. Notice that the appearance of the printout is that of a single sample (-test and, of course, that is exactly what is accomplished since the test seeks to determine if d is significantly different from zero. Analysi. Variable : DIFF Difference

in Level. of Androgens

N

lie an

Std Error

T

Prob> I T I

15

9 . 8480000

4 . 7698699

2 . 0646265

0 . 0580

Figure

10.13 SAS p r i ntout of p a i red t-test for data of E x a m p l e 10.7.

Summary of Test Procedures

As we complete the formal deve lopment of tests on popu lation means, we offer Table 1 0. 2 . which summarizes the test procedure for the cases of a single mean and two means. Notice the approximate procedure when distributions are normal and variances are unknown but not assumed to be equa l . Th is sta­ tistic was in troduced in Chapter 9.

10.9 Choice of Sample Size for Testing Means I n Section 1 0.2 we demonstrate how the ana l yst can exploit re latio nsh ips among the sample size, the significance level 0' , and the power of the test to achieve a certain standard of o ua l i tv . In mn Z,,/2 f < - fa I > fa

J.l-2 * do

«

J.I-] -

J.l-2 < do

(' < - ("

J.l-2 > do

J.l-I - J.l- 2 * do

- (a / 2 and

( > (,,/2

.

(' > ("

' ( < - (nil and ( ' > (ttl"].

(sUn] )2 (s{ /n2 )2 ' . ...... . . ... + "] 1 "2 - 1 _ ...

rJl

1 > I"

l )si

J�U"� � s{ /"2 )2 .. . -

Z > Z a/2

" --"'-" -

=

J.l-z > do

Z > Za

J.I-] -

J.I-] -

/)

Critical region

_.

* (T2 and unknown

d - do -;:=- : v = n - 1 . Sd /vn

J.l-n < dl)

( < - ("

paired observations

J.l-n * do

«

(=

.. . .. .

J.l-n > do

Suppose that we wish to test the hypothesis

( > I"

- (In and

( > (all

-320

Chapter 10 One- a n d Two-Sample Tests o f Hvpotheses

---------= Figure

f.1o

a

f.1o + 8

1 0. 1 4 Testi n g

f.1 = f.111

versus

-------- x

f.1

f.111 + 8.

Therefore, f3 = P(X < a when fJ- = fJ-o + 8) =

p r� =-( tto_��) I. (I/V n

<

(1-=- ( fLJL.-1:=.. �l when (I/\ 11

fJ- = fJ-o

151_ . J

+

Under the a l ternative h ypot hesis fJ- = fJ-1l + 8, the statistic x

-

( fJ- 1I + (5) (I/'v 1 1

is the standard norma l varia hIe Z. Therefore, f3 = P

(z

<

a

� fJ-1I

IT/\ 11

_

) (z (I/\ Il 8

= P



"' �

�H -

_

8 (I/\

11

)



from which we conclude that 15 \ /1 (T

and hence

Choice of Sample Size

a result that is also true when the alternative hypothesis is fJ- / fJ-il ' I n the case of a two-tailed test we ohtain the power I f3 for a specified a l ternative when

Section 10.9 Choice of Sample Size for Testing Means

Example 10.8

32 1

Suppose that we wish to test the h ypothesis HI) :

J-i

6H k i lograms.

==

J-i >

H, :

68 k i lograms

for the weights of male students at a ce rtain co llege using an a = (I.OS l evel of significance when it is known that if == 5. Find the sample size required if the power of our test is to be 0.95 when the true mean is 69 k i lograms. SOLUTION

8

Since a = {3 = 0.05. we have z" = z {3 = 1 . 645 For the alternative J-i = 69. we take = I and then .

fI =

( 1 .645

+ 1 .(45)2(25) -. - -- . . -.- -- -. = 270.6. -.

1

Therefore. 27 1 observations are required i f the test is to reject the null hypoth­ esis 95% of the time when. i n fact, J-i is as large as 69 k i lograms. A similar procedure can be used to determi ne the sample size n = II, = 112 required for a specific power of the test in which two population means are being compared. For example. suppose that we w ish to test the hypothesis

Ho : H, :

=

do .

J-i ,

-

J-i2

J-i,

-

J-i2 '* do ,

when if, and (T2 are k nown. For a specific alternative, say J-i, the power of O l l r test is shown in Figure 1 0. 1 5 to be I

-

(3 =

a/2

p( I X,

I i

I

!

I

I do

-a

Figure

-

1 0. 1 5 Testing

xl i >

(/

f3

when J-i,

a/2 a

ILl - iJ.�



-

J-i2 = do

-

J-i 2 = do

+ 8.

I do + 0

x, - x2

d" versus IL l

-

IJ.,



d"

+

0.

Therefore. (3

=

==

<

p

X , - X2

<

do +

l0;�1-��(�J):,� XI 0(�i :;;��1 -8)

PC-a <

a

/

a when J.i,



1

J.i2 =

<

-:-:- (dl! + 8 ) wh e n + (Ti .>111

\! « Ti

-

II I"'" I

-

II 2 = ,..-

d0

J

+ 8 .

8)

Under the alternative hypothesis J-i , - J-i 2 = do + 8, the statistic

+ 8.

322

Chapter 70 One- a n d Two-Sample Tests of Hypotheses

V(iff

+

if] )/n

is the standard normal variable Z. Now, writing I

- (I 0

\: ( if i +

we have p

=

p

l-

Z,,/2

-

dl l

Z (t / e

and

0

(Y';J /n

=

V(�f- -! �] );�1 < Z < Z ,,1 2

----- -

-

a --------

\'/( if �

dll

i )/ 11 '

+ (T

\/ ( if f

from which we conclude that

: -ITj)/lll

and he nce

11

For the one-tai led test, the expression for the required sample size when =

II I

=

112 I S

Choice of Sample Size n

=

( .(,a +

7 ...... f3 --



2 ) ( ifCI

8

2

+

if"2 ) .

When the population v ariance ( or variances in the two-sa mple situat ion ) is unk nown , t h e choice of sa mple size is not straigh t forward. I n testing t h e h ypot hesis J-L J-L II when t h e true value i s J-L = J-L II + 8. t h e statistic =

X

-

( J-L II + 8 )

S/\ 11

does not follow the {-distribution, as one might expect. but instead fol lows the However, tables or charts based on the nonce ntral {-d istribution do exist for determining the appropriate sample size i f some esti­ mate of if is available or i f 8 is a multiple of (T. Table A.X gives the sample sizes needed to control the val ues of a and f3 for various values of noncentral (-distribution.

� =

L8 1 = lL� ..&1 (T

(T

for both one- and two-tailed tests. I n the case of the two-sample {-test in wh ich the variances are un known but assumed equal, we obtain the sample sizes 1 1 = 11 I 112 n eeded to control the values of a and f3 for various values of =



=

8

1 J if

if

from Table A.Y.



!. \ ; n l l p h- ! O.9 I n comparing the performance of two catalysts on the e ffect of a reaction yield, a two-sample {-test is to be conducted with a O.OS. The vari=

Section 10.10

Graphical Methods for Comparing Means

323

ances i n the yields are considered to be the same for the two cat alysts. How large a sample for each catalyst is needed to test t h e hypothesis

HI :

11-1 =/.

11-2

if it is essential to detect a difference of 0.8 u between the catalysts with prob­ ability 0.9? SOLUTION

From Tahle A . 9, w i t h a = 0.05 for a two-tailed test. f3 = 0. 1 , and .1 = 1 0.8 ul (T

=

0.8.

=

we fi nd t h e required sample size to be n 34. It is e mphasized that in practical situa tions it might be difficul t to force a scient ist or engineer to make a com m i tment on i n format ion from which a va l ue of .1 can be found. The reader is reminded t ha t the .l-val ue quantifies the kind of difference between the means that the scien tist considers impor­ tant, t h at i s, a difference considered significant from a scientific, not a s tatisti­ cal, point of view. Example 1 0.9 i l l ustrates h ow this choice is often made. namely, by selecting a fraction of (T. Ohviously, if t h e sample size is hased on a choi ce of 1 8 1 t h at is a sma l l fraction of (T, the resu lt i ng sample size may be q u i te l arge compared to what t h e study allows. •

10. 1 0 Graphical Methods for Comparing Means I n Chapter 3 considera ble attention is d irected toward displaying data i n graph ical form. Stem and leaf displays and, i n Chapter 8 , box and w h isker, quantile plots. and quanti le-quantile norma l plots are used to provide a "pic­ t ure" to summarize a set of experimental data. Many computer software pack­ ages produce graph ical displays. As we proceed to other forms of data analysis (e.g .. regression ana lysis and analysis of variance) , graphical methods become even more i nformative. Graphi cal aids used i n conj unction with hypothesis testi ng are not used as a replacement of the test procedure. Certain ly. the value of the test statistic ind icates the proper type o f evidence i n support of fll ) or H I ' However. a pic­ torial display provides a good i l l ustration and is often a better commun icator of evidence to the beneficiary of the a nalysis. A lso, a picture will often clarify w h y a sign i ficant d i fference was found. Failure of an i m portant assumption may be exposed by a summ ary type of graphical d isplay. For the comparison of means, side-by-side box and whisker plo ts provide a t e l l ing display. The reader should recal l that these plots display t h e 25th per­ centile, 75th perce n t i le, and the median in a data se t . In addition, the w h iskers d isplay t h e extremes in a data set. Consider Exercise 22 fol lowing t h is section. Plasma abscorbic acid levels were measured in two groups of pregnant women, smokers and nonsmokers. Figure 1 0. 1 6 shows t h e box and w h isker p lots for

324

Chapter 10 One- and Two-Sample Tests of Hypotheses

1 .8 1 .5

J

"0

'(3 1 .2 F'

0 . 0 1 7 .1;

V a r i a nces a r e e q u a l , =

E r ro r

0 . ], 11 6 7 .1; ], 0 6

OF

For HO :

Std

0 . 0 2 3 0 ], 9 3 2

T

V a r ia nces

Dev

0 . 072793 Po .

at the a-level o f sign ificance, w e compute P

=

P(X ;0, x when P = Po )

and reject 110 in favor of III if this P-va lue is less than or equal to a. Finally, to test the h ypothesis f lo :

P = Po ,

HI :

p =I- P I) '

at the a-level of significance . we compute P =

2 P ( X � x when p

=

Po )

if x < I1p " or

P = 2 P ( X ;o, .r when p = Po )

if x > "I'" and reject H" i n favor of HI if the computed P-va lue is less than or equal to a. The steps for testin g a null hypot hesis about a proportion aga inst various alternat ives usi n g the binomial probahil ities of Table A . ) are as follows: Testing a proportion: small samples

1. Ho :

P

2. HI :

A l te rnatives are P < Po , P > Po, or P =I- Po ·

=

Po '

3. Choose a level of significance equal to a.

4. Test statist ic: Binomial variable X with P = Po . S. Computations: Find .t, the n umber of s uccesses, and compute the appropriate P-value. 6. Decision: Draw appropriate conclusions based on the P-value. E x a m ple H U H A builder cla ims that heat pumps are installed i n 7Wii of all homes heing constructed today in the city of Richmond. Would you agree with

332

Chapter 1 0 One- a n d Two-Sample Tests of Hypotheses

this clai m if a random survey of new homes i n this city shows that 8 out of 1 5 had heat pumps i nstalled? Use a 0. 1 0 leve l o f signi ficance. SOLUTION

p = 0.7.

1. I/o : 2.

HI :

P *-

0.7.

3. a = 0. 1 0.

4. Test statistic:

B i nomial variable X with p = 0.7 and n = 1 5.

x = 8 and np/ J = ( 1 5 ) (0.7) = \ 0.5. Therefore . from Table A. I , the computed P-value is

S. Computations:

P = 2 P ( X � 8 when p =

0.2622 > 0. 1 0.

=

0.7) = 2

K

2:

b (x ; 1 5 . 0.7)

6. Decision:

Do not rej ect I/o . Conclude that there is i nsufficient reason to doubt the builder's claim.

In Section 5.3. we saw that binomial probabi lities were obtainable from t h e actual binomial formula or from Table A . l when 11 is small. For large 1 1 . approximat ion procedures are required. When t h e hypothesized value Po is very close to () or I . the Poisson distribution. with parameter fL = /lPI i ' may be used . H owever . the normal -curve approx imation. with parameters fL = IIP r r and (J' c = npli 1/o . is usual l y preferred for l arge II and is very accurate as long as Prr is not e xtremely close to 0 or to I . I f we use t h e normal approximation. the z-value for testing P = PI! is given by ;:, ==

x - npr ) \ II P r r l/ rr

which is a value of the sta ndard normal variable Z. He nce . for a two-t ai led test at t h e a-level o f significance. t he cri tical region is ::: < - ::: " C and ::: > ::: " 2 ' For the one-sided alternative P < P li ' t h e crit ical region is ::: < - ::: " , and for t h e alternat ive I) > /)r r ' the critical region is ::: > ::: " , .. I' , a m p le 1 0 . 1 1 A com monly prescribed drug for re lieving nervous te nsion is be l i e ved to he o n l y 6WIr e ffecti ve. Experi m e n t a l results w i t h a new drug adm i n istered to a random sample of 1 00 adults who were suffering from n e r­ vous tension sr r)w that 70 rece ived relief. Is this sufficient evidence to con­ clude that the new drug is superior to the one commonly prescribed'! Use a 0.05 leve l of sign i ficance. S O L IJTION

HII : 2. H I : 1.

3. a

=

P =

0.6.

P > 0.6. 0.05.

4. Critical region:

z

> 1 .645.

Section

70. 12 Two Samples: Tests on Two Proportions

70, n

=

1 00, np o

70 - 60 ------. .V( l 00) (0.6) (0.4)

=

2.04,

5. Computations: z = ..

-

6. Decision:

x

=

�-

=

( 100) (0.6)

P = P (Z >

=

333

60, and

2.04)

<

0.025.

Reject Ho and conclude that the new drug is superior.



10. 12 Two Samples: Tests on Two Proportions Situations often arise where we wish to test the hypothesis that two propor­ tions are equal. For example, we might try to show evidence that the propor­ tion of doctors who are pediatricians in one state is equal to the proportion of pediatricians in another state. A person may decide to give up smoking only if he or she is convinced that the proportion of smokers with lung cancer exceeds the proportion of nonsmokers with lung cancer. In general, we wish to test the null h ypothesis that two proportions, or binomial parameters, are equal. That is, we are testing PI = Pz against one of the alternatives PI < Pz , P I > P2 ' or P I * P2 ' Of course, this is equivalent to testi ng the null h ypothesis that PI - P2 = ° against one of the a l ternatives P I - p z < 0, PI p z > 0, or P I -;: P 2 � O. The statistic on wh ich we base our decision is the random variable P I - P 2 . I ndependent samples of size n l and n 2 are selecteF at ra'1.dom from two binomial popUlations and the proportion of successes P I and P z for the two samples is computed. I n our construction of confidence interv�ls for PI apd P2 we noted, for n l and n2 sufficiently large, that the point estimator P I minus P 2 was approximately nor­ mally distributed with mean -

an d var iance

' (J' L P, -P,

=

P I q l + f}2q-;" nI n2

Therefore, our acceptance and critical regions can be established by using the standard norma l vari able Z = �

( P 1 . -. P'2 ) - (PI --. eJ V(P l q l /n l ) + (P 2 q2 /n 2 )

When HI I is true, we can substitute P I = P2 = P and ql = q 2 = q (where P and q are the common values) in the preceding formula for Z to give the form

To compute a value of Z, however, we must estimate the parameters P and q that appear in the radical . Upon poo ling the data from both samples, the pooled estimate of the proportion p is

334

Chapter

10

One- and Two-Sample Tests o f Hypotheses

where

XI

-

a n d r , a rc t h e n u m b e r of su ccesses i n e a c h of the two s a m p l e s . S u b ­

s t i t u t i n g fi for f J a n d

II

1

=

m i n e d from t h e form u l a

- /) lor 1/ . t h e ;:-\- alue for testing P I

\

- , .,

PI

p,

1)(1[(1/11 ;)-+- ( I /f�, ) 1

::::

1'2

is d e t e r­

'

fhe c r i t i c a l r e g i o n s for t h e a p p ro p r i a t e a l t e rn a t i ve h yp o t h e ses a re set u p as belore using c r i t i c a l poi n ts o f the s t a n d ar d nor m a l c u r v c . l l e n c e .

fur t h e

;l l t e r ­

n a t i v e P I * p � a t t h e a- l e v e l of sign i fi c a nce. t h e c r i t i c a l regioll i s ; - :. " , and For a test w h e r e t h e a l t e rn a t i v e i s P I < 1', . t h e c r i t i c a l r e g i o n IS :. > ,�" :. < ::: " a n d w h e n t h e a l t e r n a t i ve i s P I > P2 ' t h e c r i t i c a l r e g i o n i s ::: ::: " ,

" ' ;I II1 P '" 1 0. 1 2

A

vote i s t o b e t a k e n among t h e reside n t s 01

a

town a n d t h e

s u r r o u n d i ng co u n t y t o d e t e r m i n e w h e t h e r a p roposed c h e m ic a l p l a n t s h o u l d be cOl h t r u c t e d . T h e con s t r u c t i ( ) n s i t e i s w i t h i n t h e t o w n l I m i t s a n d for t h is r e a ­ son m a n y voters i n t h e c o u n t y fe e l t h a t t h e propos a l w i l l p a ss because of t h e l a rge propor t i o n o f t o w n voters w h o fa vor t h e const ru c t i o n . To d e t e r m i ne i f t h e re i s a sign i fi c a n t d i ffe ren ct: i n t h t: p roport i on o f t o w n v o t e r " a n d c o u n t y vOle rs fa vor i n g t h e p roposa l . a p o l l i s t :1 k t: n . I f 1 20

o f 20() t u w n

vote l s favor t h e

proposal a n d 2 40 o f 5 0 0 co u n t y reside n t s ravor i t . w o u l d y o u a gr e e t h a t t h e p ro po r t i o n o f t o w n voters favori n g t h t: proposal i s h i g h e r t h a n t h e propor t i on o f co u n ty v o t e rs " Use a O.()25 l e ve l o f s i g n i fi c a nce .

SOUJ flON

L e t P I and P , bc the t r u e proport i o n o r voters i n t h e tmvn and co u n ty. respec­ t i v e l y . ravor i n g t h e proposa l .

1 . "1 1 :

)

2. " 1 : 3.

lY

=

II I

=

1'1

II , .

> /"-

O.(]2".

4. Cri t ical region:

5. ( 'oll1 p u t a t i l l l l s :

,

\yn.

XI

PI

1/ 1

2 4( ]

x'

/ ',

I'

1 20

� ( )( )

1/,

.

XI

III

,,00

+ t,

+

II,

( LhO

( iAK

1 2( ) + 240

200 + ,,00

6. D e c i s i o n :

=

p ( /. > 2 .Y )

0.) \

( J.hO -- ( J.·+K

\ ( O .'i I ) ( O. 4() 1 1 ( I nO( ) ) I'

=

=

O . l )( J I l) .

1

( I /.'i( )( ) 1 I

- 2.9.

R e j e c t I I" a n d a gree t h a t t h e pro p o r t i o n o r town v o t e rs fa vor­

ing the prupos a l i s h ig h e r t h a n the p ropor t i o n o f cou n t v voTe rs.



Section 10. 12 Two Samples: Tests on Two Proportions

335

Exercises 1. A m a r k d i n g e x p e r t for a past a-making company

belit!ves t h a t 40'k of pasta lovers prefer lasagna. I f 9 out of 20 pasta lovers choose lasagna over other pas­

tas. w h a t can be concl uded anout the expert 's claim? Use a 0.05 level of significance.

increased i f the experiment were repeated and 16 of

48 rats developed t u m ors? Use a 0.05 l e ve l o f signifi ­

cance.

9. I n a s t u d y t o e s t i m a t e the proportion of resi­

dents in a certain c i t y a n d i ts sunurns who favnr t h e

2. Suppose t h a t . i n t he past. 40'k of all a d u l ts

construct i o n o f a n uck;lr power p l a n t . i t i s fou n d

believe t hat t h e proportion o f adults favoring capi t a l

w h il e only 5 9 of 1 25 sunurnan resi d e n t s arc i n favor.

favored capital p u n i s h m e n t . Do we h a ve reason t o punishment t o d a y h a s i ncreased i f . i n a random sam­

ple of 1 5 adults. R favor capi t a l p u n i s h m e n t ? Use a

0.05 level of significance.

3. A coin is tossed 20 t i mes, resulting in 5 heads. I s

t h i s suffi c i e n t evide nce t o reject t h e h y po t h esis t h a t

t h a t 63 of 1 00 u r n a n resi d e n t s favor t he construction

I s t h ere a signi ficant d i fference ndween the propo r­

t i o n of urnan and s u burnan residents who favor con ­ s t r u c t i o n of t h e n uclear p l a n t ? M a k e

use

of a

P-value.

1 0. In a study on the fe rtility of married women con­

t h e c o i n is b a l anced i n favor of t h e a l t e rnative t h a t

ducted ny M a r t i n O ' Connell and Caroly n C. Rogers

value.

less wives aged 25 t o 29 were selected a t random and

heads occu r less t h an SOCk o f t h e t i m e ? Quote a P­

4. I t is believed that at l e a s t 600/( o f t h e residents i n

a certain a r e a favor a n a n n e x a t i on s u i t n y a nt!igh­

fo r the Census Bureau in 1 979. two groups of child­

each w i fe was asked i f she eventually planned t o have a child. One group was selected from among t h ose

nor i n g city. W h a t conclusion would y o u draw i f only

wives married kss than two years and t he other from

1 1 0 in a sample o f 200 voll:rs favor t h e sui t ? Use a

among t hose wives married five y e ars. Suppose t h a t

0.05 level of significance.

5 . A fuel oil company claims t ha t o n e - fifth of t h e

240 of 3()0 wives m a rried less t h a n t w o years planned t o have children some day compared to 2�11 o f the 4()()

homes in a cert a i n c i t y arc heated n y o i l . Do we h ave

wives m a rried five years. Can we conclude t h a t the

reason to douht t h is claim i f. in a random sample o f

proportion o f wives married kss than two y e a rs who

1 00 0 homes in this c i t y . i t is found t h a t 1 36 a r c heated

planned t o havt! chi ldren is signifi ca n t l y higher than

ny o i l ? Usc a 0.01 lewl of sign i ficance.

t h e proportion o f wives married five years'! Make usc

seem t o be a valid estimate if. i n a random sam ple of

i ncidence o f breast cancer is higher t han in a nearny

6. A t a certa i n colit! ge i t i s e s t i m a t e d t h at a t most 25';{- o f the students ride bicycles t o class. Does t h is

\10 coll ege st u d e n t s . 2� arc found to ride nicycles to class'? Usc a 0.0) level of significance. 7 . A new rad a r device is be i n g considered for a

certa i n

defense

m i s si l e

system.

The

system

is

checked ny e x p e ri m e n t i n g w i t h a c t u a l a i rc r a ft in which a kill or a

I/O

kill i s s i m u l a t e d . If in 3()O tri als,

250 k i l l s occu r . accept o r rej e c t . a t t h e 0.04 level of sign i ficance. t h e claim t h a t t h e probabi l i t y of a k i l l

of a P-value.

1 1 . An urnan com m un i t y would like to show that the

rura l area. ( PCB levels were fo und to ne higher in the

soil o f t he urnan community . ) If it is found t h a t 2() of

2()O a d u l t women in the urban com m u n i t y h a v e nreast cancer and 1 0 of 1 50 adult women in t he rural com­

m u n i t y have bre ast cancer. can we concl ude at the

0.06 kvel o f sign i ficance t h a t breast cancer is more prevaknt in t h e urnan community?

1 2. In a winter o f

elll

epidemic Il u. 2(J()O banies were

w i t h t h e n c w system docs n o t exceed t h e O . � proba­

surveyed by a w..:l l- known pharmace utical compa n y

hili t y of the existing device.

t o determine if t h e company's n e w medicine was

8. In a con t rolled laboratory experi ment. scie n tists at the University of M i n n e sota discovered that 25 '7r o f a certain strain o f rats sunjt!cted t o a 20% coffee bea n

d i e t a n d t he n force-fed a powerful cancer-ca using chem i cal later developed cancerous t u m ors. Would

effective after t wo days. Amlm g 1 20 nabies who had the fl u and were given the medicine. 29 were cured

wit hin two days. Among 2110 nanies who had the flu but were not given the medicine. 56 were cured

wi t h i n two days. I s t here a n y signi ficant i n dication

we have reason t o neli eve that the proportion of rats

t hat supports the compa n y ' s claim o f t h e effectiveness

developing t u mors when subjected t o this d i e t has

o f t he medicine?

View more...

Comments

Copyright © 2017 HUGEPDF Inc.