 # Week 20: Chi-Square distribution:

To clarify, does the variable “n” in Estimation[S2]-Exercise refer to the sample size?

I think ‘n’ refers to number of samples.

1 Like

Yes @Abhineet is correct, n here refers to the number of samples.
“Estimating E[S2] Exercise- 4:30:00”

I suppose there is slight confusion regarding in the naming convention. If you consider the image from the video Estimating[S2] , the use of n is somewhat arbitrary.

Thank you.

Yes, this is what i was trying to refer to…
The number of samples collected within a sample.

I am unable to understand. please explain with an example if suitable.
In the case of Variance(Xbar) = (sigma ** 2 )/n, the variance of the sample means, the n here refers to the number of samples, right?

1 Like

Yes, that is correct.

Just to recap, number of samples != size of each sample.
I would suggest you to iterate the videos once again, it will definitely become clear.
I’m also getting a bit confused now will check once again.

1 Like

My confusion still persists The given image shows the empirical proof that var(Xbar) is inversely proportional to the sample size (n), but

n here denotes the number of samples and is proved mathematically too Okay, it’s fine… Just before coming to a conclusion, can you just simplify you query?
Just to make sure we all are on same page…
Just state the assumptions that you have, and the doubt regarding that.

I do have 2 confusions:

• what are the notations referring to in Var(Xbar) ? How do we mathematically prove it?

• In E[sigma2 - S2] , what does n refer to?

1. Notation for Var(Xbar) is Var(Xbar) only, no other notation. What exactly do you want to prove mathematically here? Is it the formula? If yes, then Pratyush sir has already explained the derivation for the same, as also shown in your screenshot.

2. Please do not include sigma^2 in E as sigma is a assumed as a given value and it is the std deviation of the entire population. Coming to your question about ‘n’, it represents the total number of samples at all times atleast in the topic of Chi Square distribution.

Hi @Abhineet,

thank you for the help.

• By notations, I refer to the variables used in the formula. As stated in the screenshots above, n represents sample size while trying to empirically compute the formula, whereas in mathematical proof n is considered to be the number of samples.

Could you please elaborate on the 2nd point stated above.

Thank You,

I will have to disagree with your point, because even in your screenshots, ‘n’ only represents number of samples and if during explanation sir has used the term sample size for ‘n’, then we can assume it as human error We have no use of each sample’s individual size because we don’t need the mean value of every sample (X1bar, X2bar, …)

What exactly is your doubt in the second point? Is it still about ‘n’ or something else?

Please refer to exercise -part 1, there the number of samples is fixed to be 1000 and the sample size is varied, which what the ‘n’ represents.
If you look at the screenshot above, it states that Xbar = Average of n throws of a die, implying n to be sample size and not number of samples (which is 1000 here).

Besides, Xbar denotes the sample mean and hence the use of X1bar, X2bar and so on, as we iterate over all the samples. Hope my query is now clear to you. See, sir has also mentioned it as 1000 SAMPLES. So, ‘n’ means number of samples.

Also, Xbar represents the mean of all samples i.e., mean of (X1, X2, …, Xn) collectively and not individual sample mean.
Hope this clears the misconception… yes it is exactly right,what you have said…Xbar is the mean of all samples mean…and as of my understanding i can say

1. n = size of each sample (therefore sir considered n = 3)
2. k =Number of samples ( but no where “k” notation mentioned in above formulea. so,dont get confused by this point)
3. Items in sample - (i.e X1,X2,X3,X4,…X1000)

as of the population considered

1. population size is N…

there is alot difference between n and N…

if iam wrong any where please correct me.

thank you

@swaroopyadav49, you are right about Xbar but not for ‘n’ and ‘k’.
‘n’ is not the size of each sample, it represents the number of samples. Size of each sample is 1 because definition of a single sample in this example is a single throw of die. Hence X1, X2, …, X1000 are not items in a sample but the number of samples. Also, it is best if we do not mention ‘N’ (population or the total number of times the event of throwing a die takes place). We are not using ‘N’ in this study so its better to omit extra things and have more confusion because of it. Coming to ‘k’, it is not number of samples, it is degrees of freedom (DOF). Easy to understand if you have followed sir’s explanation. Initially, values of ‘k’ and ‘n’ seem to coincide but it may not necessarily be the case.

1 Like

Probably too late, but my understanding:

Let population size(N) be 1000. Also let there are total of 40 samples drawn such that each sample is of size (no of elements per sample) is 5, then n = sample size = 5. Expectation and variance formulae follows from it.

• Sample mean, \bar{X} = \frac{x_1+x_2+x_3+x_4+x_5}{5}
• E(\bar{X})=E(\frac{x_1+x_2+x_3+x_4+x_5}{5}) = \frac{E(x_1)+E(x_2)+E(x_3)+E(x_4)+E(x_5)}{5}
=\frac{\mu+\mu+\mu+\mu+\mu}{5} = \frac{5\mu}{5} = \mu
• Above note that E(x_1) = E(x_2) =......E(x_i)=\mu

.
Similarly:

• Var(\bar{X}) = Var(\frac{x_1+x_2+x_3+x_4+x_5}{5}) = \frac{Var(x_1)+Var(x_2)+Var(x_3)+Var(x_4)+Var(x_5)}{5^2}
=\frac{\sigma^2+\sigma^2+\sigma^2+\sigma^2+\sigma^2}{5^2} = \frac{5\sigma^2}{5^2} = \frac{\sigma^2}{5} = \frac{\sigma^2}{n=5}
• Again note that Var(x_1) = Var(x_2) = ........Var(x_i) = \sigma^2

Basically for proving: Var(\bar{X}) = \frac{\sigma^2}{n}, n is sample size i.e. number of elements in each sample(not number of samples)

No. Give it a thought, if your above interpretation was correct, can you substitute Var(X_1) etc with \sigma^2. X_1, X_2 ..etc represents individual elements of a sample and hence Var(X_i) = \sigma^2

sorry i have wrongly mentioned about x1,x2,x3…,x1000 is items in sample…but as you said is 100% right about thz point…i.e., x1,x2…,x1000 is number of samples…

thank you for identifying my mistake…

1 Like