목차
- Fundamentals of Hypothesis Testing: Two-Sample Tests
- 개요
- Comparing the mean of Two Independent Populations
- Independent Sample
- Example: Pooled-Variance t-test
- 문제
👀, 🤷♀️ , 📜, 📝
이 아이콘들을 누르시면 정답, 개념 부가 설명을 보실 수 있습니다:)
Fundamentals of Hypothesis Testing: Two-Sample Tests
즉, 가설검증이다.
test라는 위딩이 들어갔다는 것은 이 추정의 결과로 인한 의사결정까지 한다는 것이다.
Two-Sample Tests: 데이터 세트가 두개이다.
- One-Sample Tests: 우리나라 사람의 아이큐가 100 이상이다 -> 우리나라 사람의 아이큐라는 하나의 데이터 세트가 필요하다
- Two-Sample Tests: 우리나라와 일본의 아이큐를 비교해보면 우리나라가 더 높다 -> 우리나라 아이쿠 데이터 세트와 일본의 아이큐 데이터세트가 필요하다. 즉, 비교, 변화 검증에 많이 쓰인다.
개요
아래의 도식을 하나씩 자세히 알아볼 것이다.
- Comparing the mean of Two Independent Populations
- Comparing the mean of Two Related Populations
- Comparing the Proportions of Two Independent Populations
- F Test for the Ratio of Two Variances
결정 방법
각각 구하는 식
이제 여기에 나온 식들을 예를 들어 하나씩 사용해볼 것이다!
(지금은 이렇다~ 정도만 알아두자)
왜 이런식이 유도되었는지 보단 어떤 문제라서 어떤 식을 써야하는 지가 중요하다
위 그림 다운받기
➕ 꿀팁이 있는데 여기 경우에 따라 계산해주는 계산기가 있다(대신 어떤 경우에 어떤 계산기를 쓰는지 알아야 함으로 포스티팅을 끝까지 잘 보자!)
Comparing the mean of Two Independent Populations
σ(퍼짐의 정도)를 알 때
➡️ Z-test
σ(퍼짐의 정도)를 모를 때
➡️ t-test
Pooled-Variance t-test
- variances가 비슷하다고 가정
Separate-variance t-test
- variances가 비슷하지 않다고 가정
Independent Sample
즉 두개의 데이터가 독립적인 샘플이란 것이다.
예를 들어 TV가격에 대한 샘플이라면 아래와 같은 느낌이다
Example: Pooled-Variance t-test
그럼 이제 문제로 예를 들어보자
σ(퍼짐의 정도)를 알 때 ➡️ Z-test 의 경우는 너무 쉬워서 밑의 연습문제에서 다뤄보겠다!
지금은 σ(퍼짐의 정도)를 모를 때 ➡️ t-test의 문제를 보며 어떻게 푸는지 살펴보자
전반적인 과정은 전의 포스팅인
- One-Sample Tests_Hypothesis & Z-Test
- 📌 One-Sample Tests_𝜎 Unknown (t test)
- One-Sample Tests_one tail test
- One-Sample Tests_Hypothesis Tests for Proportions
을 참고하고 떠올리면 좋을 것 같다(전반적인 과정은 비슷하다)
PROBLEM
You are a financial analyst for a brokerage firm. Is there a difference in dividend yield between stocks listed on the NYSE & NASDAQ? You collect the following data: NYSE NASDA Assuming both populations are approximately normal with equal variances, is there a difference in mean yield (a = 0.05)
SOLVE 1
1) Check: 데이터의 Independent VS Related
확인해보면 Number 즉, sample size가 다르다
➡️ 📌 related는 무조건 samlpe size가 같아야 한다
즉 이건 independent 하다 -> Pooled-Variance t-test 의 방법을 따라야 한다
2) Check: tail 확인
is there a difference in mean yield
➡️ 문제의 최종 의도는 sample mean인 3.27과 2.53을 다르다고 볼 수 있을까? 이다
➡️ Two-tail test
3) 정규성 확인
Assuming both populations are approximately normal가 있는 이유: sample size가 각각 21, 25로 모두 30을 넘지 않기 때문에 이러한 가정이 필요하다.
➡️ 정규성 확보!
2) 조건 확인
- \(H_0\): μ1- μ2 = 0 i.e. (μ1= μ2)
- \(H_1\): μ1- μ2 ≠ 0 i.e. (μ1≠ μ2)
- a = 0.05
- df = 21 + 25 - 2 = 44
3) Find Rejection Region: Critical Values
a를 이용하여 t-table에서 구하기
더 알아보기
Critical Values: t = 2.0154
4) Test Statistic
이므로 이에 그냥 대입만 하면,
3) Decision
Reach a decision and interpret the result:
➡️ Reject \(H_0\) at a = 0.05
- There is sufficient evidence of a difference in means.
자 그럼 전반적인 계산 과정을 알았으니 밑의 문제들을 통해 나머지 케이스들을 살펴보자!
문제
[TABLE A]
Any difference between American and Korean? A randomly selected group of each were administered the Sarnoff Survey of Attitudes Toward Life (SSATL), which measures motivation for upward mobility. The SSATL scores are summarized below.
1. Referring to Table A, judging from the way the data were collected, which test would likely be most appropriate to employ?
a) Paired t test
b) Pooled-variance t test for the difference between two means
c) Independent samples Z test for the difference between two means
d) Related samples Z test for the mean difference
📜 정답 보기
Any difference between American and Korean? ➡️ two-tail
table을 보면 모두 Population Std. Dev. 를 안다 그러므로 1번 경우
즉, σ(퍼짐의 정도)를 알 때 ➡️ Z-test이다.
그리고 sample size가 다름으로 independent이다.
즉, 답은 c) Independent samples Z test for the difference between two means이다.
2. Referring to Table A, give the null and alternative hypotheses to determine if the average SSATL score of Korean managers differs from the average SSATL score of American managers.
a) \(𝐻_0\): \(𝜇_{A} - 𝜇_{k}\) ≥ 0 versus \(𝐻_1\): \(𝜇_{A} - 𝜇_{k}\) > 0
b) \(𝐻_0\): \(𝜇_{A} - 𝜇_{k}\) ≤ 0 versus \(𝐻_1\): \(𝜇_{A} - 𝜇_{k}\) < 0
c) \(𝐻_0\): \(𝜇_{A} - 𝜇_{k}\) = 0 versus \(𝐻_1\): \(𝜇_{A} - 𝜇_{k}\) ≠ 0
d) \(𝐻_0\): \(X ̅ _{A} - X ̅ _{k}\) = 0 versus \(𝐻_1\): \(X ̅ _{A} - X ̅ _{k}\) ≠ 0
📜 정답 보기
Any difference between American and Korean? ➡️ two-tail
이므로,
c) \(𝐻_0\): \(𝜇_{A} - 𝜇_{k}\) = 0 versus \(𝐻_1\): \(𝜇_{A} - 𝜇_{k}\) ≠ 0
➕ x ̅ -> 이건 가설에 쓰지 못한다
3. Referring to Table A, assuming the independent samples procedure was used, calculate the value of the test statistic.
📜 정답 보기
이 경우는 1번의 경우이므로 찾으면
![image](https://user-images.githubusercontent.com/76824611/145292680-dc991bb4-5f0c-48e0-a86c-414ef873a2ad.pn
➕ \(𝜇_{A} - 𝜇_{k}\) = 0이라고 가정하므로 식에서도 0이다
그러므로 답은 d) 이다
4. Referring to Table A, suppose that the test statistic is Z = 2.45. Find the p-value if we assume that the alternative hypothesis was a two-tailed test (\(𝜇_{A} - 𝜇_{k}\) ≠ 0).
a) 0.0071
b) 0.0142
c) 0.4929
d) 0.9858
📜 정답 보기
먼저 Z = 2.45를 z-table에서 찾아보면 0.00714이다.
그런데 two-tail test이므로 2배를 해주면 b)이다.
[TABLE B]
A real estate company is interested in testing whether, on average, families in Gotham have been living in their current homes for less time than families in Metropolis have. Assume that the two population variances are equal. A random sample of 100 families from Gotham and a random sample of 150 families in Metropolis yield the following data on length of residence in current homes.
Gotham: \(X ̅ _{G}\) = 35 months, \(s_{G}^2\) = 900
Metropolis: \(X ̅ _{G}\) = 50 months, \(s_{M}^2\) = 1050
5. Referring to Table B, which of the following represents the relevant hypotheses tested by the real estate company?
a) \(𝐻_0\): \(𝜇_{G} - 𝜇_{M}\) ≥ 0 versus \(𝐻_1\): \(𝜇_{G} - 𝜇_{M}\) < 0
b) \(𝐻_0\): \(𝜇_{G} - 𝜇_{M}\) ≤ 0 versus \(𝐻_1\): \(𝜇_{G} - 𝜇_{M}\) > 0
c) \(𝐻_0\): \(𝜇_{G} - 𝜇_{M}\) = 0 versus \(𝐻_1\): \(𝜇_{G} - 𝜇_{M}\) ≠ 0
d) \(𝐻_0\): \(X ̅ _{G} -x ̅ _{M}\) ≥ 0 versus \(𝐻_1\): \(X ̅ _{G} - x ̅ _{M}\) < 0
📜 정답 보기
less time than ➡️ one-tail test
Gotham have been living in their current homes for less time than families in Metropolis have를 기호화 해보자!
- Gotham: \(X ̅ _{G}\) = 35 months, \(s_{G}^2\) = 900
- Metropolis: \(X ̅ _{G}\) = 50 months, \(s_{M}^2\) = 1050
\(𝜇_{G}\) < \(𝜇_{M}\)
이므로 (가설은 무조건 population)
a) \(𝐻_0\): \(𝜇_{G} - 𝜇_{M}\) ≥ 0 versus \(𝐻_1\): \(𝜇_{G} - 𝜇_{M}\) < 0 이다
6. Referring to Table B, what is an unbiased point estimate for the mean of the sampling distribution of the difference between the 2 sample means?
a. – 22
b. – 10
c. – 15
d. 0
📜 정답 보기
📌 point estimate:
- sample의 차이를 이들의 평균의 차이로 보여준 것
35-50 = -15
7. Referring to Table B, what is(are) the critical value(s) of the relevant hypothesis test if the level of significance is 0.05?
a) t ≅ Z = – 1.645
b) t ≅ Z = ± 1.96
c) t ≅ Z = – 1.96
d) t ≅ Z = – 2.080
📜 정답 보기
t ≅ Z -> t-test지만 z-test를 쓴다
➡️ t-test에서 sample size가 1000을 넘어가면 z-test와 비슷하다
➡️ 지금은 거의 100을 넘어가므로 z-test를 써도 무방하다.
150+100-2=248
df = 248으로 보고 찾으면 a) t ≅ Z = – 1.645 이다.
8. Referring to Table B, suppose 𝛼 = 0.05. Which of the following represents the result of the relevant hypothesis test?
a) The alternative hypothesis is rejected.
b) The null hypothesis is rejected.
c) The null hypothesis is not rejected.
d) Insufficient information exists on which to make a decision.
📜 정답 보기
2번 경우: Pooled-Variance t-test 이다. 즉
test stastic은,
-3.69이다.
그런데 reject범위가
이므로 reject이다
즉, 답은
b) The null hypothesis is rejected.
9. Referring to Table B, suppose 𝛼 = 0.05. Which of the following represents the correct conclusion?
A) There is not enough evidence that, on average, families in Gotham have been living in their current homes for less time than families in Metropolis have.
B) There is enough evidence that, on average, families in Gotham have been living in their current homes for less time than families in Metropolis have.
C) There is not enough evidence that, on average, families in Gotham have been living in their current homes for no less time than families in Metropolis have.
D) There is enough evidence that, on average, families in Gotham have been living in their current homes for no less time than families in Metropolis have.
📜 정답 보기
The null hypothesis is rejected. 이므로
원래 가정이 거절되었으니,
B) There is enough evidence that, on average, families in Gotham have been living in their current homes for less time than families in Metropolis have.
10. Referring to Table B, what is(are) the critical value(s) of the relevant hypothesis test if the level of significance is 0.01?
A. t ≅ Z = – 1.96
B. t ≅ Z = ± 1.96
C. t ≅ Z = – 2.080
D. t ≅ Z = – 2.33
📜 정답 보기
0.01 -> -2.326 이므로
D. t ≅ Z = – 2.33
11. Referring to Table B, suppose 𝛼 = 0.01. Which of the following represents the result of the relevant hypothesis test?
A. The alternative hypothesis is rejected.
B. The null hypothesis is rejected.
C. The null hypothesis is not rejected.
D. Insufficient information exists on which to make a decision.
📜 정답 보기
이므로 이의 p-value를 z-table에서 구해보면 0.00011이다.
여기에 one-tail 이니까 2배를 해주지 않는다.
이 값은 𝛼 = 0.01보다 작으므로 reject한다.
B. The null hypothesis is rejected.
12. Referring to Table B, suppose 𝛼 = 0.01. Which of the following represents the correct conclusion?
A. There is not enough evidence that, on average, families in Gotham have been living in their current homes for less time than families in Metropolis have.
B. There is enough evidence that, on average, families in Gotham have been living in their current homes for less time than families in Metropolis have.
C. There is not enough evidence that, on average, families in Gotham have been living in their current homes for no less time than families in Metropolis have.
D. There is enough evidence that, on average, families in Gotham have been living in their current homes for no less time than families in Metropolis have.
📜 정답 보기
11번에 의해 B. There is enough evidence that, on average, families in Gotham have been living in their current homes for less time than families in Metropolis have. 이다.
[TABLE H]
A problem with a telephone line that prevents a customer from receiving or making calls is disconcerting to both the customer and the telephone company. The data on samples of 20 problems reported to two different offices of a telephone company and the time to clear these problems (in minutes) from the customers’ lines are collected. Below is the Excel output to see whether there is evidence of a difference in the mean waiting time between the two offices assuming that the population variances in the two offices are not equal.
13. Referring to Table H, which of the following represents the relevant hypotheses tested by the telephone company?
a) \(𝐻_0\): \(𝜇_{𝐼} - 𝜇_{𝐼𝐼}\) ≥ 0 versus \(𝐻_1\): \(𝜇_{𝐼} - 𝜇_{𝐼𝐼}\) > 0
b) \(𝐻_0\): \(𝜇_{𝐼} - 𝜇_{𝐼𝐼}\) ≤ 0 versus \(𝐻_1\): \(𝜇_{𝐼} - 𝜇_{𝐼𝐼}\) < 0
c) \(𝐻_0\): \(𝜇_{𝐼} - 𝜇_{𝐼𝐼}\) = 0 versus \(𝐻_1\): \(𝜇_{𝐼} - 𝜇_{𝐼𝐼}\) ≠ 0
d) \(𝐻_0\): \(𝜇_{𝐼} - 𝜇_{𝐼𝐼}\) ≠ 0 versus \(𝐻_1\): \(𝜇_{𝐼} - 𝜇_{𝐼𝐼}\) = 0
📜 정답 보기
먼저 위의 경우는 Separate-variance t-test이다.
- σ(퍼짐의 정도)를 모를 때
- variances in the two offices are not equal.
그럼 이에 대한 가설을 살펴보면
there is evidence of a difference ➡️ ~인지 아닌 지 이므로 two tail test이다.
즉, c) \(𝐻_0\): \(𝜇_{𝐼} - 𝜇_{𝐼𝐼}\) = 0 versus \(𝐻_1\): \(𝜇_{𝐼} - 𝜇_{𝐼𝐼}\) ≠ 0 이다.
14. Referring to Table H, what is(are) the critical value(s) of the relevant hypothesis test if the level of significance is 0.05?
a) 1.6860
b) ± 1.6860
c) 2.0244
d) ± 2.0244
📜 정답 보기
two tail test이므로,
각 꼬리엔 a= 0.025씩 있고
df = 38이므로 이를 통해서 t-table에서 찾아보면,
약 2.022이다 근데 이게 양 쪽에 있으므로,
d) ± 2.0244 이다
15. Referring to Table H, what is(are) the critical value(s) of the relevant hypothesis test if the level of significance is 0.10?
a) 1.6860
b) ± 1.6860
c) 2.0244
d) ± 2.0244
📜 정답 보기
14번과 같은 방법으로 찾아보면 b) ± 1.6860 이다.
17. Referring to Table H, what is the value of the test statistic?
a) 0.2025
b) 0.3544
c) 2.0115
d) 2.2140
📜 정답 보기
이 공식을 쓰면,
이므로 답은 b) 0.3544이다
18. Referring to Table H, suppose = 0.10. Which of the following represents the result of the relevant hypothesis test?
a) The alternative hypothesis is rejected.
b) The null hypothesis is rejected.
c) The null hypothesis is not rejected.
d) Insufficient information exists on which to make a decision.
📜 정답 보기
c) The null hypothesis is not rejected.
19. Referring to Table H, suppose a = 0.05. Which of the following represents the result of the relevant hypothesis test?
a) The alternative hypothesis is rejected.
b) The null hypothesis is rejected.
c) The null hypothesis is not rejected.
d) Insufficient information exists on which to make a decision.
📜 정답 보기
c) The null hypothesis is not rejected.
20. Referring to Table H, suppose a = 0.1. Which of the following represents the correct conclusion?
a) There is not enough evidence of a difference in the mean waiting time between the two offices.
b) There is enough evidence of a difference in the mean waiting time between the two offices.
c) There is not enough evidence that the mean waiting time between the two offices are the same.
d) There is enough evidence that the mean waiting time between the two offices are the same.
📜 정답 보기
null hypothesis is not rejected. 이므로,
a) There is not enough evidence of a difference in the mean waiting time between the two offices. 이다
21. Referring to Table H, suppose = 0.05. Which of the following represents the correct conclusion?
a) There is not enough evidence of a difference in the mean waiting time between the two offices.
b) There is enough evidence of a difference in the mean waiting time between the two offices.
c) There is not enough evidence that the mean waiting time between the two offices are the same.
d) There is enough evidence that the mean waiting time between the two offices are the same.
📜 정답 보기
20번과 같이
a) There is not enough evidence of a difference in the mean waiting time between the two offices.
22. The production manager at a battery factory wants to determine whether there is any difference in the mean life expectancy of batteries manufactured on two different types of machines. A random sample of 25 batteries from machine A indicates a sample mean of 250 hours and a population standard deviation of 100 hours, and a similar sample of 25 from machine B indicates a sample mean of 242 hours and a population standard deviation of 75 hours. Using 0.05 Sig level, is there any evidence of a difference in the mean life of batteries produced by the two types of machines?
📜 정답 보기
23. The production manager at a battery factory wants to determine whether there is any difference in the mean life expectancy of batteries manufactured on two different types of machines. A random sample of 25 batteries from machine A indicates a sample mean of 250 hours and a sample standard deviation of 100 hours, and a similar sample of 25 from machine B indicates a sample mean of 242 hours and a sample standard deviation of 75 hours. Using 0.05 Sig level, and assuming that the population variances are equal, is there any evidence of a difference in the mean life of batteries produced by the two types of machines?
📜 정답 보기
24. The production manager at a battery factory wants to determine whether there is any difference in the mean life expectancy of batteries manufactured on two different types of machines. A random sample of 25 batteries from machine A indicates a sample mean of 250 hours and a sample standard deviation of 100 hours, and a similar sample of 25 from machine B indicates a sample mean of 242 hours and a sample standard deviation of 75 hours. Using 0.05 Sig level, and assuming that the population variances are not equal, is there any evidence of a difference in the mean life of batteries produced by the two types of machines?