Page 1 of 1

Stats question from someone bad at stats

Posted: Sat Oct 03, 2020 2:43 pm
by sideshowjim
Hey,

I'm putting together a paper at the sec, which compares scattered radiation dose from 2 imaging modalities at various body parts. Now there is a significant difference between the 2 modalities (no big surprise from this), but what I'd like to do is show how big the difference is.

Should I just divide the big mean by the little one, or is there a better test that I can do? Or just report the 2 means and let the reader figure it out for themselves?

Re: Stats question from someone bad at stats

Posted: Sun Oct 04, 2020 8:25 am
by sTeamTraen
What do the raw data look like? That is, of what are your means the means? Do you have multiple measurements per person? Do you have independent people in each modality, or the same people tested twice?

Re: Stats question from someone bad at stats

Posted: Sun Oct 04, 2020 4:03 pm
by sideshowjim
sTeamTraen wrote:
Sun Oct 04, 2020 8:25 am
What do the raw data look like? That is, of what are your means the means? Do you have multiple measurements per person? Do you have independent people in each modality, or the same people tested twice?
It's measurements on a dummy (phantom), so technically the same "person" tested twice.

9 measurements per body part for the CT (measurement repeated 3 times at 3 exposure settings for S,M,L)

9 measurements per body part for the x-ray (4 different views, each with 3 exposure settings for S,M,L, each repeated 3 times, doses combined across the 4 views (4 views = 1 exam)).

Data is continuous, range is from about 0.0001 to 4. CT measurements are significantly higher (student T-test, not got the results to hand but can dig them out)

Re: Stats question from someone bad at stats

Posted: Sun Oct 04, 2020 9:12 pm
by sTeamTraen
I'm not sure that any sort of inferential statistics will be very meaningful, since you're not testing a sample from a population, unless you expect substantial manufacturing variation in multiple dummies. (Put another way, exactly what is the null hypothesis?) Presumably the results are fairly reproducible with two modalities of a machine being used on a manufactured object. I think the statistic that would interest me most as a punter would be the variance within the repeated measurements, and how they relate to the reliability of the measuring instrument.

Re: Stats question from someone bad at stats

Posted: Mon Oct 05, 2020 6:46 am
by Allo V Psycho
How about calculating the Effect Size, e.g. Cohen's d? Presumably the sample sizes are similar: what do the SDs look like? But if you can do student's test, you should be able to do Cohen's.

https://www.socscistatistics.com/effect ... ult3.aspx

Re: Stats question from someone bad at stats

Posted: Mon Oct 05, 2020 8:20 am
by sTeamTraen
Allo V Psycho wrote:
Mon Oct 05, 2020 6:46 am
How about calculating the Effect Size, e.g. Cohen's d?
Based on what we've been told, I can't see what one would put into the calculation of Cohen's d as the pooled standard deviation, or what would it mean here. It doesn't appear that we're sampling from a population.

Re: Stats question from someone bad at stats

Posted: Mon Oct 05, 2020 8:50 am
by shpalman
sideshowjim wrote:
Sat Oct 03, 2020 2:43 pm
Hey,

I'm putting together a paper at the sec, which compares scattered radiation dose from 2 imaging modalities at various body parts. Now there is a significant difference between the 2 modalities (no big surprise from this), but what I'd like to do is show how big the difference is.

Should I just divide the big mean by the little one, or is there a better test that I can do? Or just report the 2 means and let the reader figure it out for themselves?
Dividing the big mean by the little one is fine unless they're both really small numbers with what subjectively seems like a lot of uncertainty in them (i.e. the values which you averaged together have a wild scatter between them) in which case you're basically dividing zero by zero. In any case I'd want the means put into some sort of numerical context i.e. is the big mean "a lot" and/or is the little mean "not much" in the grand scheme of things.

Re: Stats question from someone bad at stats

Posted: Mon Oct 05, 2020 10:34 am
by Allo V Psycho
sTeamTraen wrote:
Mon Oct 05, 2020 8:20 am
Allo V Psycho wrote:
Mon Oct 05, 2020 6:46 am
How about calculating the Effect Size, e.g. Cohen's d?
Based on what we've been told, I can't see what one would put into the calculation of Cohen's d as the pooled standard deviation, or what would it mean here. It doesn't appear that we're sampling from a population.
I thought there were repeat measures at each site, but, I was speed reading when I should be working!

Re: Stats question from someone bad at stats

Posted: Mon Oct 05, 2020 11:19 am
by sTeamTraen
Allo V Psycho wrote:
Mon Oct 05, 2020 10:34 am
I thought there were repeat measures at each site, but, I was speed reading when I should be working!
I wish I had a pound etc etc. :D

Re: Stats question from someone bad at stats

Posted: Mon Oct 05, 2020 3:15 pm
by monkey
shpalman wrote:
Mon Oct 05, 2020 8:50 am
sideshowjim wrote:
Sat Oct 03, 2020 2:43 pm
Hey,

I'm putting together a paper at the sec, which compares scattered radiation dose from 2 imaging modalities at various body parts. Now there is a significant difference between the 2 modalities (no big surprise from this), but what I'd like to do is show how big the difference is.

Should I just divide the big mean by the little one, or is there a better test that I can do? Or just report the 2 means and let the reader figure it out for themselves?
Dividing the big mean by the little one is fine unless they're both really small numbers with what subjectively seems like a lot of uncertainty in them (i.e. the values which you averaged together have a wild scatter between them) in which case you're basically dividing zero by zero. In any case I'd want the means put into some sort of numerical context i.e. is the big mean "a lot" and/or is the little mean "not much" in the grand scheme of things.
Wouldn't the best way of putting the numbers in context be to compare them both to some reference, rather than each other? (I realise that there may not be a suitable reference to compare to in every situation).

Also, if you are calculating the ratio, don't forget that it has an uncertainty associated with it, because the measurements used to calculate it have uncertainty.

Re: Stats question from someone bad at stats

Posted: Mon Oct 05, 2020 7:09 pm
by sideshowjim
I'm hopefully doing another set of measurements on a 3rd system when I can get round to it.

SD on all the measures is pretty narrow, I do have them calculated but only have the details on my work computer...

***Edit- I hate stats...

Re: Stats question from someone bad at stats

Posted: Tue Oct 06, 2020 8:49 am
by sTeamTraen
sideshowjim wrote:
Mon Oct 05, 2020 7:09 pm
SD on all the measures is pretty narrow, I do have them calculated but only have the details on my work computer...
Again, I'm not sure what the SDs mean. You can't treat repeated measures as if they were independent. From what you've described, I don't think you need statistics at all. You're not estimating parameters of a population, because you don't have a population. (If you're trying to measure the variability between different examples of your new machine due to manufacturing tolerances, you have a population, but then you need to take some number of samples from that, in the form of 20 or 100 machines; the number will depend on how small a difference you're interested in.)

Think about two cars. You do three standing quarter miles in a Porsche 911 and three in a Renault 4. The Porsche times are 12.4, 12.6, and 12.7. The R4 times are 32.4, 33.6, and 34.1. How would you report that? And how is your research question meaningfully different from "Which car should I buy if I want to accelerate a lot?"

Re: Stats question from someone bad at stats

Posted: Wed Oct 07, 2020 9:47 am
by jimbob
sTeamTraen wrote:
Tue Oct 06, 2020 8:49 am
sideshowjim wrote:
Mon Oct 05, 2020 7:09 pm
SD on all the measures is pretty narrow, I do have them calculated but only have the details on my work computer...
Again, I'm not sure what the SDs mean. You can't treat repeated measures as if they were independent. From what you've described, I don't think you need statistics at all. You're not estimating parameters of a population, because you don't have a population. (If you're trying to measure the variability between different examples of your new machine due to manufacturing tolerances, you have a population, but then you need to take some number of samples from that, in the form of 20 or 100 machines; the number will depend on how small a difference you're interested in.)

Think about two cars. You do three standing quarter miles in a Porsche 911 and three in a Renault 4. The Porsche times are 12.4, 12.6, and 12.7. The R4 times are 32.4, 33.6, and 34.1. How would you report that? And how is your research question meaningfully different from "Which car should I buy if I want to accelerate a lot?"

Actually, rereading this - it looks like the sort of question that is open to https://en.wikipedia.org/wiki/ANOVA_gauge_R%26R as far as finding out any variation in the measurement noise. But from what you say, the standard deviation is small compared to the difference between them.

As for reporting the data: I like variations of a violin plot: https://en.wikipedia.org/wiki/Violin_plot with a table of the means below

But then I'm an engineer and tend to use an approach as in this NIST handbook (which I push on my colleagues at every opportunity):

https://www.itl.nist.gov/div898/handbook/

A lot of my data is definitely not normally-distributed so plotting it out is always a good start.