Stats question from someone bad at stats

Get your science fix here: research, quackery, activism and all the rest
Post Reply
sideshowjim
Sindis Poop
Posts: 72
Joined: Wed Nov 20, 2019 6:17 pm

Stats question from someone bad at stats

Post by sideshowjim » Sat Oct 03, 2020 2:43 pm

Hey,

I'm putting together a paper at the sec, which compares scattered radiation dose from 2 imaging modalities at various body parts. Now there is a significant difference between the 2 modalities (no big surprise from this), but what I'd like to do is show how big the difference is.

Should I just divide the big mean by the little one, or is there a better test that I can do? Or just report the 2 means and let the reader figure it out for themselves?

User avatar
sTeamTraen
Catbabel
Posts: 901
Joined: Mon Nov 11, 2019 4:24 pm

Re: Stats question from someone bad at stats

Post by sTeamTraen » Sun Oct 04, 2020 8:25 am

What do the raw data look like? That is, of what are your means the means? Do you have multiple measurements per person? Do you have independent people in each modality, or the same people tested twice?
Something something hammer something something nail

sideshowjim
Sindis Poop
Posts: 72
Joined: Wed Nov 20, 2019 6:17 pm

Re: Stats question from someone bad at stats

Post by sideshowjim » Sun Oct 04, 2020 4:03 pm

sTeamTraen wrote:
Sun Oct 04, 2020 8:25 am
What do the raw data look like? That is, of what are your means the means? Do you have multiple measurements per person? Do you have independent people in each modality, or the same people tested twice?
It's measurements on a dummy (phantom), so technically the same "person" tested twice.

9 measurements per body part for the CT (measurement repeated 3 times at 3 exposure settings for S,M,L)

9 measurements per body part for the x-ray (4 different views, each with 3 exposure settings for S,M,L, each repeated 3 times, doses combined across the 4 views (4 views = 1 exam)).

Data is continuous, range is from about 0.0001 to 4. CT measurements are significantly higher (student T-test, not got the results to hand but can dig them out)

User avatar
sTeamTraen
Catbabel
Posts: 901
Joined: Mon Nov 11, 2019 4:24 pm

Re: Stats question from someone bad at stats

Post by sTeamTraen » Sun Oct 04, 2020 9:12 pm

I'm not sure that any sort of inferential statistics will be very meaningful, since you're not testing a sample from a population, unless you expect substantial manufacturing variation in multiple dummies. (Put another way, exactly what is the null hypothesis?) Presumably the results are fairly reproducible with two modalities of a machine being used on a manufactured object. I think the statistic that would interest me most as a punter would be the variance within the repeated measurements, and how they relate to the reliability of the measuring instrument.
Something something hammer something something nail

Allo V Psycho
Clardic Fug
Posts: 210
Joined: Sat Nov 16, 2019 8:18 am

Re: Stats question from someone bad at stats

Post by Allo V Psycho » Mon Oct 05, 2020 6:46 am

How about calculating the Effect Size, e.g. Cohen's d? Presumably the sample sizes are similar: what do the SDs look like? But if you can do student's test, you should be able to do Cohen's.

https://www.socscistatistics.com/effect ... ult3.aspx

User avatar
sTeamTraen
Catbabel
Posts: 901
Joined: Mon Nov 11, 2019 4:24 pm

Re: Stats question from someone bad at stats

Post by sTeamTraen » Mon Oct 05, 2020 8:20 am

Allo V Psycho wrote:
Mon Oct 05, 2020 6:46 am
How about calculating the Effect Size, e.g. Cohen's d?
Based on what we've been told, I can't see what one would put into the calculation of Cohen's d as the pooled standard deviation, or what would it mean here. It doesn't appear that we're sampling from a population.
Something something hammer something something nail

User avatar
shpalman
After Pie
Posts: 1961
Joined: Mon Nov 11, 2019 12:53 pm
Location: One step beyond

Re: Stats question from someone bad at stats

Post by shpalman » Mon Oct 05, 2020 8:50 am

sideshowjim wrote:
Sat Oct 03, 2020 2:43 pm
Hey,

I'm putting together a paper at the sec, which compares scattered radiation dose from 2 imaging modalities at various body parts. Now there is a significant difference between the 2 modalities (no big surprise from this), but what I'd like to do is show how big the difference is.

Should I just divide the big mean by the little one, or is there a better test that I can do? Or just report the 2 means and let the reader figure it out for themselves?
Dividing the big mean by the little one is fine unless they're both really small numbers with what subjectively seems like a lot of uncertainty in them (i.e. the values which you averaged together have a wild scatter between them) in which case you're basically dividing zero by zero. In any case I'd want the means put into some sort of numerical context i.e. is the big mean "a lot" and/or is the little mean "not much" in the grand scheme of things.
molto tricky

Allo V Psycho
Clardic Fug
Posts: 210
Joined: Sat Nov 16, 2019 8:18 am

Re: Stats question from someone bad at stats

Post by Allo V Psycho » Mon Oct 05, 2020 10:34 am

sTeamTraen wrote:
Mon Oct 05, 2020 8:20 am
Allo V Psycho wrote:
Mon Oct 05, 2020 6:46 am
How about calculating the Effect Size, e.g. Cohen's d?
Based on what we've been told, I can't see what one would put into the calculation of Cohen's d as the pooled standard deviation, or what would it mean here. It doesn't appear that we're sampling from a population.
I thought there were repeat measures at each site, but, I was speed reading when I should be working!

User avatar
sTeamTraen
Catbabel
Posts: 901
Joined: Mon Nov 11, 2019 4:24 pm

Re: Stats question from someone bad at stats

Post by sTeamTraen » Mon Oct 05, 2020 11:19 am

Allo V Psycho wrote:
Mon Oct 05, 2020 10:34 am
I thought there were repeat measures at each site, but, I was speed reading when I should be working!
I wish I had a pound etc etc. :D
Something something hammer something something nail

monkey
Stargoon
Posts: 143
Joined: Wed Nov 13, 2019 5:10 pm

Re: Stats question from someone bad at stats

Post by monkey » Mon Oct 05, 2020 3:15 pm

shpalman wrote:
Mon Oct 05, 2020 8:50 am
sideshowjim wrote:
Sat Oct 03, 2020 2:43 pm
Hey,

I'm putting together a paper at the sec, which compares scattered radiation dose from 2 imaging modalities at various body parts. Now there is a significant difference between the 2 modalities (no big surprise from this), but what I'd like to do is show how big the difference is.

Should I just divide the big mean by the little one, or is there a better test that I can do? Or just report the 2 means and let the reader figure it out for themselves?
Dividing the big mean by the little one is fine unless they're both really small numbers with what subjectively seems like a lot of uncertainty in them (i.e. the values which you averaged together have a wild scatter between them) in which case you're basically dividing zero by zero. In any case I'd want the means put into some sort of numerical context i.e. is the big mean "a lot" and/or is the little mean "not much" in the grand scheme of things.
Wouldn't the best way of putting the numbers in context be to compare them both to some reference, rather than each other? (I realise that there may not be a suitable reference to compare to in every situation).

Also, if you are calculating the ratio, don't forget that it has an uncertainty associated with it, because the measurements used to calculate it have uncertainty.

sideshowjim
Sindis Poop
Posts: 72
Joined: Wed Nov 20, 2019 6:17 pm

Re: Stats question from someone bad at stats

Post by sideshowjim » Mon Oct 05, 2020 7:09 pm

I'm hopefully doing another set of measurements on a 3rd system when I can get round to it.

SD on all the measures is pretty narrow, I do have them calculated but only have the details on my work computer...

***Edit- I hate stats...

User avatar
sTeamTraen
Catbabel
Posts: 901
Joined: Mon Nov 11, 2019 4:24 pm

Re: Stats question from someone bad at stats

Post by sTeamTraen » Tue Oct 06, 2020 8:49 am

sideshowjim wrote:
Mon Oct 05, 2020 7:09 pm
SD on all the measures is pretty narrow, I do have them calculated but only have the details on my work computer...
Again, I'm not sure what the SDs mean. You can't treat repeated measures as if they were independent. From what you've described, I don't think you need statistics at all. You're not estimating parameters of a population, because you don't have a population. (If you're trying to measure the variability between different examples of your new machine due to manufacturing tolerances, you have a population, but then you need to take some number of samples from that, in the form of 20 or 100 machines; the number will depend on how small a difference you're interested in.)

Think about two cars. You do three standing quarter miles in a Porsche 911 and three in a Renault 4. The Porsche times are 12.4, 12.6, and 12.7. The R4 times are 32.4, 33.6, and 34.1. How would you report that? And how is your research question meaningfully different from "Which car should I buy if I want to accelerate a lot?"
Something something hammer something something nail

User avatar
jimbob
Dorkwood
Posts: 1594
Joined: Mon Nov 11, 2019 4:04 pm
Location: High Peak/Manchester

Re: Stats question from someone bad at stats

Post by jimbob » Wed Oct 07, 2020 9:47 am

sTeamTraen wrote:
Tue Oct 06, 2020 8:49 am
sideshowjim wrote:
Mon Oct 05, 2020 7:09 pm
SD on all the measures is pretty narrow, I do have them calculated but only have the details on my work computer...
Again, I'm not sure what the SDs mean. You can't treat repeated measures as if they were independent. From what you've described, I don't think you need statistics at all. You're not estimating parameters of a population, because you don't have a population. (If you're trying to measure the variability between different examples of your new machine due to manufacturing tolerances, you have a population, but then you need to take some number of samples from that, in the form of 20 or 100 machines; the number will depend on how small a difference you're interested in.)

Think about two cars. You do three standing quarter miles in a Porsche 911 and three in a Renault 4. The Porsche times are 12.4, 12.6, and 12.7. The R4 times are 32.4, 33.6, and 34.1. How would you report that? And how is your research question meaningfully different from "Which car should I buy if I want to accelerate a lot?"

Actually, rereading this - it looks like the sort of question that is open to https://en.wikipedia.org/wiki/ANOVA_gauge_R%26R as far as finding out any variation in the measurement noise. But from what you say, the standard deviation is small compared to the difference between them.

As for reporting the data: I like variations of a violin plot: https://en.wikipedia.org/wiki/Violin_plot with a table of the means below

But then I'm an engineer and tend to use an approach as in this NIST handbook (which I push on my colleagues at every opportunity):

https://www.itl.nist.gov/div898/handbook/

A lot of my data is definitely not normally-distributed so plotting it out is always a good start.
Have you considered stupidity as an explanation

Post Reply