probability problem

Get your science fix here: research, quackery, activism and all the rest
Post Reply
Allo V Psycho
Catbabel
Posts: 791
Joined: Sat Nov 16, 2019 8:18 am

probability problem

Post by Allo V Psycho » Tue Dec 10, 2024 8:59 am

I'm fairly mathematically literate, but I have a blind spot for probability! If anyone can help me solve this puzzle, I'd be very grateful.

An exam has 3000 items in the bank. From this bank are drawn 200 items for an individual exam. A candidate knows 60% of the material in the bank, which is also the pass mark for the exam. What's the chance that this candidate will nonetheless fail the exam (i.e. get a score below 60%)?

User avatar
dyqik
Princess POW
Posts: 8172
Joined: Wed Sep 25, 2019 4:19 pm
Location: Masshole
Contact:

Re: probability problem

Post by dyqik » Tue Dec 10, 2024 1:18 pm

Each question in the bank is either known ("green") or not known to the candidate ("red"). So the problem is now "how many ways are there of drawing without replacement at least 120 green balls in 200 trials from a bag of 3000 balls, 1800 of which are green."

It takes some googling to find the right terms for the solution here, but you are looking for Hypergeometric distribution, which
Wikipedia wrote:describes the probability of k successes (random draws for which the object drawn has a specified feature) in n draws, without replacement, from a finite population of size N that contains exactly K objects with that feature.

User avatar
dyqik
Princess POW
Posts: 8172
Joined: Wed Sep 25, 2019 4:19 pm
Location: Masshole
Contact:

Re: probability problem

Post by dyqik » Tue Dec 10, 2024 1:26 pm

(I edited the above to change blue to green to match the explanation on the wiki page. But "what are the chances of getting blue balls?" is objectively funnier)

User avatar
dyqik
Princess POW
Posts: 8172
Joined: Wed Sep 25, 2019 4:19 pm
Location: Masshole
Contact:

Re: probability problem

Post by dyqik » Tue Dec 10, 2024 1:34 pm

Oh, and to turn it into "at least j successes in n trials", you'll need to sum exactly k successes between j and n.

Or you can reverse things to count the number of failures, and use the CDF from the wiki page ("the probability of less than 81 failures")

User avatar
dyqik
Princess POW
Posts: 8172
Joined: Wed Sep 25, 2019 4:19 pm
Location: Masshole
Contact:

Re: probability problem

Post by dyqik » Tue Dec 10, 2024 1:59 pm

Final thought: with the numbers you give here, for most practical purposes you can use the simpler binomial distribution (which is for the same problem, but with replacement/independent trials).

IvanV
Stummy Beige
Posts: 3111
Joined: Mon May 17, 2021 11:12 am

Re: probability problem

Post by IvanV » Tue Dec 10, 2024 3:03 pm

dyqik wrote:
Tue Dec 10, 2024 1:59 pm
Final thought: with the numbers you give here, for most practical purposes you can use the simpler binomial distribution (which is for the same problem, but with replacement/independent trials).
Since the cumulative distribution function of the binomial is a formula with special functions in, it terms of calculating it with a formula, that doesn't actually help much to simplify it. At least in terms of getting an algebraically precise answer in a practical amount of time, that we can actually then calculate. Though if you are in a paper exam with a formula book with a table of cumulative binomial distribution function values, then that may be the approach that can practically be done in a practical amount of time. If the table has useful values in.

Excel with a Stats add-in package - at least as installed on my computer, maybe my company paid for something - has a hypergeometric distribution function, including a logical flag for the cumulative form. So it is no harder to do it with the hypergeometric than the bionomial, if you permit that approach of using Excel.

We want the sum from 120 to 200, so that is 1 minus the cumulative dist with a value of 119 placed into the cumulative function.

So I can pop the numbers in. If I have interpreted correctly what the parameters it asks for, and populated it correctly, then it tells me the answer is 53.15%.

That at least is roughly what I expected, in the sense of being a bit above 50%. The reason it is a bit above 50% is that it is a discrete distribution, and the individual discrete values have a material individual probability around the median of the distribution. And the discrete value 120, which is probably the nearest whole number to the median, is a useful value. In fact, if I put 120 itself in the function, the answer falls by 5.96%. So the individual value of 120 has a probability of about 5.96%, and that proves the median is only very slightly different from 120.

I have a maths degree, but I don't think I've ever faced an exam that would require me to know how to answer this question.

User avatar
dyqik
Princess POW
Posts: 8172
Joined: Wed Sep 25, 2019 4:19 pm
Location: Masshole
Contact:

Re: probability problem

Post by dyqik » Tue Dec 10, 2024 4:18 pm

The basics of the question are AS Level discrete maths that I took in 1997, but the maths required to derive the answer are a bit beyond that.

The beta function is a bit more common that the generalized hypergeometric function, but only a bit.

IvanV
Stummy Beige
Posts: 3111
Joined: Mon May 17, 2021 11:12 am

Re: probability problem

Post by IvanV » Tue Dec 10, 2024 4:37 pm

IvanV wrote:
Tue Dec 10, 2024 3:03 pm
...
We want the sum from 120 to 200, so that is 1 minus the cumulative dist with a value of 119 placed into the cumulative function.

So I can pop the numbers in. If I have interpreted correctly what the parameters it asks for, and populated it correctly, then it tells me the answer is 53.15%.
I misremembered the question was asking for "fail", rather "pass" as I answered it. So it's the plain cdf, not 1 minus. The answer would thus be 46.85%, assuming no further errors.

This is the Excel formula that produces 46.85%, in case can see I've populated that wrong.
=HYPGEOM.DIST(119,200,1800,3000,TRUE)

IvanV
Stummy Beige
Posts: 3111
Joined: Mon May 17, 2021 11:12 am

Re: probability problem

Post by IvanV » Tue Dec 10, 2024 5:10 pm

dyqik wrote:
Tue Dec 10, 2024 4:18 pm
The basics of the question are AS Level discrete maths that I took in 1997, but the maths required to derive the answer are a bit beyond that.

The beta function is a bit more common that the generalized hypergeometric function, but only a bit.
I agree that the basic probability mechanics to calculate that kind of thing is A level, if the numbers are all small enough that you can manipulate the basic probabilities by hand. But when the numbers are large enough you need to know that repeated draws without replacement are a hypergeometric distribution, well that's something else, I don't think was ever on a syllabus I have studied. Though at uni I changed to Maths in year 2, and thus didn't do the year 1 syllabus, so I might have missed it. For the options I chose, I only needed continuous, not discrete, probability for finals.

If we make the approximation of ignoring the "without replacement" aspect, for numbers of this size, and use the central limit theorem to approximate it using a normal distribution, then that would seem to be within the scope of the Further Maths A-level syllabus I did. But the question would tell you to make those approximations. Using a continuous distribution, you'd have to treat discrete questions as having a width, and so get a reading off the normal cdf at 119.5.

Allo V Psycho
Catbabel
Posts: 791
Joined: Sat Nov 16, 2019 8:18 am

Re: probability problem

Post by Allo V Psycho » Wed Dec 11, 2024 3:34 pm

Thanks, fam, that's very helpful, and special thanks, Ivan, for the Excel version. I've been varying the parameters and exploring the consequences.
AvP

Post Reply