probability problem
-
- Catbabel
- Posts: 792
- Joined: Sat Nov 16, 2019 8:18 am
probability problem
I'm fairly mathematically literate, but I have a blind spot for probability! If anyone can help me solve this puzzle, I'd be very grateful.
An exam has 3000 items in the bank. From this bank are drawn 200 items for an individual exam. A candidate knows 60% of the material in the bank, which is also the pass mark for the exam. What's the chance that this candidate will nonetheless fail the exam (i.e. get a score below 60%)?
An exam has 3000 items in the bank. From this bank are drawn 200 items for an individual exam. A candidate knows 60% of the material in the bank, which is also the pass mark for the exam. What's the chance that this candidate will nonetheless fail the exam (i.e. get a score below 60%)?
Re: probability problem
Each question in the bank is either known ("green") or not known to the candidate ("red"). So the problem is now "how many ways are there of drawing without replacement at least 120 green balls in 200 trials from a bag of 3000 balls, 1800 of which are green."
It takes some googling to find the right terms for the solution here, but you are looking for Hypergeometric distribution, which
It takes some googling to find the right terms for the solution here, but you are looking for Hypergeometric distribution, which
Wikipedia wrote:describes the probability of k successes (random draws for which the object drawn has a specified feature) in n draws, without replacement, from a finite population of size N that contains exactly K objects with that feature.
Re: probability problem
(I edited the above to change blue to green to match the explanation on the wiki page. But "what are the chances of getting blue balls?" is objectively funnier)
Re: probability problem
Oh, and to turn it into "at least j successes in n trials", you'll need to sum exactly k successes between j and n.
Or you can reverse things to count the number of failures, and use the CDF from the wiki page ("the probability of less than 81 failures")
Or you can reverse things to count the number of failures, and use the CDF from the wiki page ("the probability of less than 81 failures")
Re: probability problem
Final thought: with the numbers you give here, for most practical purposes you can use the simpler binomial distribution (which is for the same problem, but with replacement/independent trials).
Re: probability problem
Since the cumulative distribution function of the binomial is a formula with special functions in, it terms of calculating it with a formula, that doesn't actually help much to simplify it. At least in terms of getting an algebraically precise answer in a practical amount of time, that we can actually then calculate. Though if you are in a paper exam with a formula book with a table of cumulative binomial distribution function values, then that may be the approach that can practically be done in a practical amount of time. If the table has useful values in.
Excel with a Stats add-in package - at least as installed on my computer, maybe my company paid for something - has a hypergeometric distribution function, including a logical flag for the cumulative form. So it is no harder to do it with the hypergeometric than the bionomial, if you permit that approach of using Excel.
We want the sum from 120 to 200, so that is 1 minus the cumulative dist with a value of 119 placed into the cumulative function.
So I can pop the numbers in. If I have interpreted correctly what the parameters it asks for, and populated it correctly, then it tells me the answer is 53.15%.
That at least is roughly what I expected, in the sense of being a bit above 50%. The reason it is a bit above 50% is that it is a discrete distribution, and the individual discrete values have a material individual probability around the median of the distribution. And the discrete value 120, which is probably the nearest whole number to the median, is a useful value. In fact, if I put 120 itself in the function, the answer falls by 5.96%. So the individual value of 120 has a probability of about 5.96%, and that proves the median is only very slightly different from 120.
I have a maths degree, but I don't think I've ever faced an exam that would require me to know how to answer this question.
Re: probability problem
The basics of the question are AS Level discrete maths that I took in 1997, but the maths required to derive the answer are a bit beyond that.
The beta function is a bit more common that the generalized hypergeometric function, but only a bit.
The beta function is a bit more common that the generalized hypergeometric function, but only a bit.
Re: probability problem
I misremembered the question was asking for "fail", rather "pass" as I answered it. So it's the plain cdf, not 1 minus. The answer would thus be 46.85%, assuming no further errors.IvanV wrote: ↑Tue Dec 10, 2024 3:03 pm...
We want the sum from 120 to 200, so that is 1 minus the cumulative dist with a value of 119 placed into the cumulative function.
So I can pop the numbers in. If I have interpreted correctly what the parameters it asks for, and populated it correctly, then it tells me the answer is 53.15%.
This is the Excel formula that produces 46.85%, in case can see I've populated that wrong.
=HYPGEOM.DIST(119,200,1800,3000,TRUE)
Re: probability problem
I agree that the basic probability mechanics to calculate that kind of thing is A level, if the numbers are all small enough that you can manipulate the basic probabilities by hand. But when the numbers are large enough you need to know that repeated draws without replacement are a hypergeometric distribution, well that's something else, I don't think was ever on a syllabus I have studied. Though at uni I changed to Maths in year 2, and thus didn't do the year 1 syllabus, so I might have missed it. For the options I chose, I only needed continuous, not discrete, probability for finals.
If we make the approximation of ignoring the "without replacement" aspect, for numbers of this size, and use the central limit theorem to approximate it using a normal distribution, then that would seem to be within the scope of the Further Maths A-level syllabus I did. But the question would tell you to make those approximations. Using a continuous distribution, you'd have to treat discrete questions as having a width, and so get a reading off the normal cdf at 119.5.
-
- Catbabel
- Posts: 792
- Joined: Sat Nov 16, 2019 8:18 am
Re: probability problem
Thanks, fam, that's very helpful, and special thanks, Ivan, for the Excel version. I've been varying the parameters and exploring the consequences.
AvP
AvP