목차
👀, 🤷♀️ , 📜, 📝
이 아이콘들을 누르시면 정답, 개념 부가 설명을 보실 수 있습니다:)
INTRO
목표: 데이터 전체의 분포 Distributions을 본다.
[Definitions]
• Discrete: variables produce outcomes that come from a counting process
* e.g. number of classes you are taking
• Continuous: variables produce outcomes that come from a measurement
* e.g. your annual salary, or your weight
[Types Of Variables]
Discrete Variables
[A probability distribution for a discrete variable]
- 이산형 변수에 대한 확률 분포
- a mutually exclusive listing of all possible numerical outcomes for that variable and a probability of occurrence associated with each outcome
[Expected Value (or mean)]
📝 예시 보기
[Variance]
기존과 차이점: 그 사건이 일어날 확률 weight(\(P(X=x_i)\))를 곱해줌
- \(E(X)\) = Expected value of the discrete variable X
- \(x_i\) = the \(i^{th}\) outcome of X
- \(P(X=x_i)\) = Probability of the \(i^{th}\) occurrence of X
[Standard Deviation]
📝 예시 보기
Probability Distributions
- +) Bernoulli distribution(베르누이 분포):경우의 수가 2가지
- Binomial(이항 분포): 베르누이 분포 반복시행
- ex) A marketing research firm receives survey responses of “yes I will buy” or “no I will not”
- ex) Suppose the event of interest is obtaining heads on the toss of a fair coin. You are to toss the coin three times. In how many ways can you get two heads?
- Possible ways: HHT, HTH, THH, so there are three ways
you can getting two heads.
- Possible ways: HHT, HTH, THH, so there are three ways
- Poisson(포아송 분포): 일어날 확률이 드문 경우에 씀
Bernoulli distribution
성공확률이 θ∈[0,1]인 시행을 실시해,
- 성공이 관측되면 1
-
실패가 관측되면 0
- 이항분포의 특별한 경우인 B(1, θ)
- 어떤 시행의 결과가 ‘성공’,’실패’인 시행
확률질량함수는 \(p(x) =P(X=x) =θ^x(1−θ)^1−x\), \(x= 0,1\)
[베르누이 시행Bernoulli trial]
베르누이 실험을 독립적으로 반복하여 시행하는 과정
이때, 확률변수 X가 모수 p인 베르누이 분포를 따른다고 했을 때,
Mean
\(E(X) =θ\)
\(E(X) = (1 - θ) * 0 + θ * 1 = θ\)
Variance & Standard Deviation
\(Var(X) =θ(1−θ)\)
\(Var(X) = E(X^2) - {E(X)}^2\),
\(= p - p^2\),
\(= p(1-p)\),
- \(E(X^2) = (1 - θ) * 0^2 + θ * 1^2 = θ\),
Binomial Distributions
이항분포다.
[Formula]
이 공식을 미친듯이 외우기 보단 그냥 상식적으로 생각해보면 된다.
성공확률이θ로 동일한 베르누이 시행을 n번 독립적으로 실시할 때,
- 성공횟수X의 분포를 이항분포라고 한다.
- P(X) = probability of X successes in n trials, with probability of success p on each trial
- X = number of ‘successes’ in sample, (X = 0, 1, 2, …, n)
- n = sample size (number of trials or observations)
- π = probability of “success”
[Distribution Shape]
Here, n = 5 and π = .1
Here, n = 5 and π = .5
Mean
Variance & Standard Deviation
- n = sample size
- π = probability of the event of interest for any trial
- (1 – π) = probability of no event of interest for any trial
📝 예시 보기
📝 예시 문제 보기
ex0)
동전을 두 개 던질 때 둘 다 앞면이 나올 확률은θ= 1/4이다.
이러한 시행을 8회 실시해 둘 다 앞면이 나온 횟수를X라 하면X∼B(8,0.25)가 되며,
기대값과 분산은 각각?
- E(X) = 8×0.25 = 2,
- Var(X) = 8×0.25×0.75 = 1.5이다.
X의 값이 3이상이 될 확률은?
- P(X≥3) = 1−P(X≤2) = 1−P(X= 0)−P(X= 1)−P(X= 2) = 0.3216
ex1)
ex2) What is the probability of one success in five observations if the probability of an event of interest is 0.1?
- x = 1, n = 5, and π = 0.1
ex3)
- x = 2, n = 10, and π = 0.02
Poisson Distributions
= 포아송 분포
계수(count)데이터의 확률분포를 모형화하는 데 유용한 분포
되게 희박한 경우에 쓰임
평균이 주어질때 많이 쓰임
[쓰는 경우]
- You wish to count the number of times an event occurs in a given area of opportunity
- The probability that an event occurs in one area of opportunity is the same for all areas of opportunity
- The number of events that occur in one area of opportunity is independent of the number of events that occur in the other areas of opportunity
- The probability that two or more events occur in an area of opportunity approaches zero as the area of opportunity becomes smaller
- The average number of events per unit is lambda
[ex]
- 책에서 발견되는 오타 수
- 치킨을 먹다가 발견되는 머리카락 수
- 커피를 마시다가 실수로 쏟는 횟수
- 기계가 고장나는 횟수
[Formula]
- x = number of events in an area of opportunity, 실제로 발생한 횟수
- λ = expected number of events, 단위 시간(또는공간) 내에 평균 발생 횟수 즉, np
- e = base of the natural logarithm system (2.71828…)
🔥 조건) λ >0
📝 이항분포 ➡️ 포아송비
- n과 p를 λ 파라미터 하나로 바꾼다.
- n이 무한대로 갈 때 이항 분포의 확률 밀도 함수의 극한값을 도출한다.(매우 의박한 상황에 쓰이는 것 이므로)
Mean
Variance & Standard Deviation
- λ = expected number of events
📝 예시 보기
📝 예시 문제 보기
ex0)
경부고속도로 양재-오산 구간에서 하루 평균 0.2회의 교통사고가 일어난다고한다.
이 구간에서 하루에 발생하는 교통 사고 횟수를X라 하자.
내일 이 구간에서2건 이상 교통사고가 발생할 확률을 예측해보면,
문제
1. A probability distribution is an equation that
a) associates a particular probability of occurrence with each outcome in the sample space.
b) measures outcomes and assigns values of X to the simple events.
c) assigns a value to the variability in the sample space.
d) assigns a value to the center of the sample space.
📜 정답 보기
a)
[TABLE 5]
The following table contains the probability distribution for X = the number of retransmissions necessary to successfully transmit a 1024K data package through a double satellite media.
2. Referring to Table 5, the probability of at least one retransmission is ________.
📜 정답 보기
least one: 이 표현 유의해서 보자!!
적어도 1이상이니 p(1)+p(2)+p(3) = 0.65
3. Referring to Table 5, the mean or expected value for the number of retransmissions is ________.
📜 정답 보기
4. Referring to Table 5, the standard deviation of the number of retransmissions is ________.
📜 정답 보기
5. The on-line access computer service industry is growing at an extraordinary rate. Current estimates suggest that only 20% of the home-based computers have access to on-line services. This number is expected to grow quickly over the next 5 years. Suppose 20 people with home-based computers were randomly and independently sampled. Find the probability that fewer than 10 of those sampled currently have access to on-line services.
📜 정답 보기
20% -> p = 0.2
20 people -> n = 20
fewer than 10 of those sampled -> x = 1~9
= 0.9974
6. In a game called Taxation and Evasion, a player rolls a pair of dice. If on any turn the sum is 7, 11, or 12, the player gets audited. Otherwise, she avoids taxes. Suppose a player takes 5 turns at rolling the dice. The probability that she gets audited once is ________.
📜 정답 보기
a pair of dice(2개의 주사위)의 합이 7이 나올 확률 -> p = 23%
takes 5 turns -> n = 5
한번 나오고 나머지는 안나오는 확률 모두 곱하고 그 한번이 5번중 어디 나올지 모르니까 5까지 곱하는 식을 세우는 것이다.
7. The quality control manager of Marilyn’s Cookies is inspecting a batch of chocolate chip cookies. When the production process is in control, the average number of chocolate chip parts per cookie is 6.0. What is the probability that any particular cookie being inspected has 4.0 chip parts?
📜 정답 보기
평균이 주어졌으니 포아송 문제이다
- λ = 6
- x = 4
공학용 계산기를 이용하여 계산하자!
= 0.1339
8. The Department of Commerce in a particular state has determined that the number of small businesses that declare bankruptcy per month is approximately a Poisson distribution with a mean of 6.4. Find the probability that more than 3 bankruptcies occur next month.
📜 정답 보기
평균이 주어졌으니 포아송 문제이다
- λ = 6.4
- more than x = 4~ 한계를 모름 -> 전체에서 1,2,3을 빼줘야 함
공학용 계산기를 이용하여 계산하자!
if \(e^{-6.4}=0.001694\),
= 0.881
9. What type of probability distribution will the consulting firm most likely employ to analyze the insurance claims in the following problem?
An insurance company has called a consulting firm to determine if the company has an unusually high number of false insurance claims. It is known that the industry proportion for false claims is 3%. The consulting firm has decided to randomly and independently sample 100 of the company’s insurance claims. They believe the number of these 100 that are false will yield the information the company desires.
a) binomial distribution.
b) Poisson distribution.
c) none of the above.
📜 정답 보기
a)
포아송은 평균인 람다가 주어져야함
10. (T/F) One difference between the binomial distribution and Poisson distribution is that the binomial’s upper bound is the number of trial while the Poisson has no particular upper bound.
📜 정답 보기
F