Wk8 / Case Study & Problems / Q8 : Doubt

In the solution to Q8, the concept of the solution is:
if x is the current score, find all scores which are greater than x, average them and then indicate that Sachin would score (average_of_subset - x).

My doubt is:
Wouldn’t it be appropriate to break the mean into several means based on histogram and then estimate?

E.g.
Sachin’s histogram is:
(array([99, 36, 28, 16, 11, 17, 8, 8, 1, 1]), array([ 0. , 18.6, 37.2, 55.8, 74.4, 93. , 111.6, 130.2, 148.8, 167.4, 186. ]))
Thus the average of each bar is:
[ 9.3 27.9 46.5 65.1 83.7 102.3 120.9 139.5 158.1 176.7]

So, if Sachin walks in to bat, x = 0, he is likely to score 9 runs
If he crosses, 9 ==> the next mean is 27. So if the input x=10, then Sachin would score 27-10 = 17 more runs.
If he crosses 27 ==> the next mean is 46. So, if input x = 30, then Sachin would score 46-30 = 16 more runs.

Why don’t we use this concept of subtracting the input x from average of the next bar of histogram?

Thanks,

Hi @pandurang,
Will check this out and get back to you soon.

@Ishvinder,
Could you check about this question?
I understand we are in lock-down and also I am not in hurry.
Just checking if you got a chance to check this?

Hi @pandurang,
Really sorry but this thread went out of my mind…

Coming to the point that you’re mentioning here, if we follow the approach defined by you (which is also a valid solution to the problem) might cause several flaws as per my understanding:

  1. We are considering the number of runs sachin will score “on average” after scoring ‘x’ runs. But if we solve this by binning out values, the problem changes to “as an estimate” instead of “on average”.
  2. Considering the problem statement to be an estimate, we cannot take the max value of each bin, and represent it as an estimate to the score. ( In my opinion, median score from each of the bins will be better option)

I agree with your point, and the solution to the problems varies person to person, which generally happens in many real world scenarios as well.

Hope this helps, we can continue our discussion further in this thread.
Sorry for this long wait of 9 days :slight_smile:

@Ishvinder, No problem at all !

Your arguments are valid too :slight_smile:

Regarding 1)
I agree that by binning, it would be more of an estimate. The question would then become : “How many more runs does Sachin is likely to score after having scored x runs?”

Regarding 2)
My suggestion was to take the mean of the bin. Perhaps median score is more suitable .

Thanks for answering my doubts.
We could close this thread. :+1: