stats query

Get your science fix here: research, quackery, activism and all the rest
Post Reply
Allo V Psycho
Catbabel
Posts: 738
Joined: Sat Nov 16, 2019 8:18 am

stats query

Post by Allo V Psycho » Sat Jan 13, 2024 10:37 am

I have a variety of data sets and need to compare them for differences. The histograms show quite a complex relationship: typical example shown. What tests would I need to show if the histogram shown was statistically different from another more or less similar one? Grateful for any help from hive mind.
Attachments
Histogram.png
Histogram.png (17.98 KiB) Viewed 1950 times

User avatar
bob sterman
Dorkwood
Posts: 1135
Joined: Mon Nov 11, 2019 10:25 pm
Location: Location Location

Re: stats query

Post by bob sterman » Sat Jan 13, 2024 10:51 am

The variable on your x-axis of the histogram - is that a continuous variable that has been binned to create 18 categories for the histogram? Or is it an ordinal variable that has 18 categories?

Allo V Psycho
Catbabel
Posts: 738
Joined: Sat Nov 16, 2019 8:18 am

Re: stats query

Post by Allo V Psycho » Sat Jan 13, 2024 11:34 am

bob sterman wrote:
Sat Jan 13, 2024 10:51 am
The variable on your x-axis of the histogram - is that a continuous variable that has been binned to create 18 categories for the histogram? Or is it an ordinal variable that has 18 categories?
Ordinal with 18 categories.

User avatar
jimbob
Light of Blast
Posts: 5302
Joined: Mon Nov 11, 2019 4:04 pm
Location: High Peak/Manchester

Re: stats query

Post by jimbob » Sat Jan 13, 2024 1:58 pm

Allo V Psycho wrote:
Sat Jan 13, 2024 11:34 am
bob sterman wrote:
Sat Jan 13, 2024 10:51 am
The variable on your x-axis of the histogram - is that a continuous variable that has been binned to create 18 categories for the histogram? Or is it an ordinal variable that has 18 categories?
Ordinal with 18 categories.
Ouch.

Are there fewer parent categories you could use to combine some into?
Have you considered stupidity as an explanation

Allo V Psycho
Catbabel
Posts: 738
Joined: Sat Nov 16, 2019 8:18 am

Re: stats query

Post by Allo V Psycho » Sat Jan 13, 2024 2:17 pm

jimbob wrote:
Sat Jan 13, 2024 1:58 pm
Allo V Psycho wrote:
Sat Jan 13, 2024 11:34 am
bob sterman wrote:
Sat Jan 13, 2024 10:51 am
The variable on your x-axis of the histogram - is that a continuous variable that has been binned to create 18 categories for the histogram? Or is it an ordinal variable that has 18 categories?
Ordinal with 18 categories.
Ouch.

Are there fewer parent categories you could use to combine some into?
Alas, no... all unique.

User avatar
science_fox
Snowbonk
Posts: 512
Joined: Mon Nov 11, 2019 1:34 pm
Location: Manchester

Re: stats query

Post by science_fox » Sat Jan 13, 2024 2:58 pm

How many samples have you got? Could you PCA them which will group those sets with more common features together, even if it's a bit complex to work out what makes them common.

Not my area of expertise!
I'm not afraid of catching Covid, I'm afraid of catching idiot.

User avatar
bob sterman
Dorkwood
Posts: 1135
Joined: Mon Nov 11, 2019 10:25 pm
Location: Location Location

Re: stats query

Post by bob sterman » Sat Jan 13, 2024 3:13 pm

To compare 2 of these distributions - how about a 2 x 18 chi-square test?

https://www.icalcu.com/stat/chisqtest.html

KAJ
Fuzzable
Posts: 310
Joined: Thu Nov 14, 2019 5:05 pm
Location: UK

Re: stats query

Post by KAJ » Sat Jan 13, 2024 8:27 pm

"statistically [significant] differen[ce]" is rarely of real interest. See Wikipedia. What kind of difference (location, dispersion, ...) and what size of difference would you consider worthy of mention?

Allo V Psycho
Catbabel
Posts: 738
Joined: Sat Nov 16, 2019 8:18 am

Re: stats query

Post by Allo V Psycho » Sun Jan 14, 2024 10:02 am

Bob:
That seems plausible: Thanks for the handy calculator!
KAJ:
Yes, I think this is where I was heading. The numbers are so big that even quite a small difference might be statistically significant. I generally phrase this as "statistical significance is not the same as biological significance", and would aim to calculate the effect size.
As these data sets emerge (it's taking me quite a long time to analyse them), I think they plainly DO fall into the 'not biologically meaningful' category.

Thanks to all responders!

IvanV
Stummy Beige
Posts: 2714
Joined: Mon May 17, 2021 11:12 am

Re: stats query

Post by IvanV » Mon Jan 15, 2024 9:11 am

In time series analysis, there are frequently high correlations between unrelated things, because anything with a time trend will have a high correlation with anything else with a time trend. Many completely different things have time trends. This raises a potential issue with these. Maybe the categories are such that many things tend to have similarities across those categories, for no particular reason.

User avatar
sTeamTraen
After Pie
Posts: 2558
Joined: Mon Nov 11, 2019 4:24 pm
Location: Palma de Mallorca, Spain

Re: stats query

Post by sTeamTraen » Mon Jan 15, 2024 5:41 pm

IvanV wrote:
Mon Jan 15, 2024 9:11 am
In time series analysis, there are frequently high correlations between unrelated things, because anything with a time trend will have a high correlation with anything else with a time trend. Many completely different things have time trends. This raises a potential issue with these. Maybe the categories are such that many things tend to have similarities across those categories, for no particular reason.
Yes, Bob's suggestion of the 18-way chi-square test will not make sense if the histograms being compared are, say, the prevalence of 18 diseases in the UK in 2022 versus 2023. It would be OK for the prevalence of those diseases in the UK in 2022 versus France in 2022, etc.

FWIW, I often use a 10-way chi-square test if I think that the distribution of trailing digits in a dataset looks dodgy (because the numbers have been "proctologically derived"). You can compare the actual distribution against uniform, or if you want to be extra fancy and have good reason to think that Benford's Law is operating, you can compare with the distribution for the Nth digit predicted by that law (which is famous for the exponential decay-type curve for leading digits, but also makes predictions about the 2nd, 3rd, etc).
Something something hammer something something nail

Post Reply