Stats problem - flood events

Discussions about serious topics, for serious people
User avatar
dyqik
Princess POW
Posts: 7527
Joined: Wed Sep 25, 2019 4:19 pm
Location: Masshole
Contact:

Re: Stats problem - flood events

Post by dyqik » Tue Feb 25, 2020 11:20 am

plodder wrote:
Tue Feb 25, 2020 11:13 am
Perhaps you'll agree that if the gauges have all been connected through the whole time series (assuming they're all the same age etc - again not true), then comparing the last 10 years to the previous 50 will still give an indication of baseline change?
In the most extreme case where they were all 100% correlated, then your N=1, and the distribution of "which decade did all the gauges show a record" becomes flat, as it's essentially a single roll of a six sided die. You're only talking about a single gauge at that point.

KAJ
Fuzzable
Posts: 310
Joined: Thu Nov 14, 2019 5:05 pm
Location: UK

Re: Stats problem - flood events

Post by KAJ » Tue Feb 25, 2020 1:59 pm

dyqik wrote:
Tue Feb 25, 2020 11:17 am
However, the uncorrelated independent case is one extreme of the range of assumptions, so you should ask if it's unusual even in that case, especially as it's easy to answer that question.
Also that case gives the lowest probability.
plodder wrote:
Tue Feb 25, 2020 11:13 am
take a look at the tweet - it's 400 gauges nationally, and there's quite a spread of records. So whilst I agree that a proper engineer would do it properly, they'd also look at the dataseries themselves to find out if the records were broken by 1mm or 100mm, the duration of the time series, interconnectivity etc.

Perhaps you'll agree that if the gauges have all been connected through the whole time series (assuming they're all the same age etc - again not true), then comparing the last 10 years to the previous 50 will still give an indication of baseline change?
The more I think about it the more I think statistical analysis won't give a valid answer to your question. The correlation issue is important and ssems intractable.

More fundamentally, the question asked in the OP is essentially "Is the pattern noticed in the data unusual?" Of course it is, else you wouldn't have noticed it!
I think there's a name for the logical fallacy of testing a hypothesis with the data that generated the hypothesis, but I can't recall it. Would you have thought about "comparing the last 10 years to the previous 50" if they weren't different?
There are ways of testing for "structural change", but I think you'd need far more than five periods to compare with the latest.

User avatar
Gfamily
Light of Blast
Posts: 5180
Joined: Mon Nov 11, 2019 1:00 pm
Location: NW England

Re: Stats problem - flood events

Post by Gfamily » Tue Feb 25, 2020 2:07 pm

Not a stats problem, but an impressive demonstration of the effectiveness of temporary flood barriers.

This is at Ironbridge
received_241888453494897.jpeg
received_241888453494897.jpeg (15.24 KiB) Viewed 1444 times
Usual view
IMG_20200225_141329.jpg
IMG_20200225_141329.jpg (329.2 KiB) Viewed 1441 times
My avatar was a scientific result that was later found to be 'mistaken' - I rarely claim to be 100% correct
ETA 5/8/20: I've been advised that the result was correct, it was the initial interpretation that needed to be withdrawn
Meta? I'd say so!

User avatar
dyqik
Princess POW
Posts: 7527
Joined: Wed Sep 25, 2019 4:19 pm
Location: Masshole
Contact:

Re: Stats problem - flood events

Post by dyqik » Tue Feb 25, 2020 3:04 pm

KAJ wrote:
Tue Feb 25, 2020 1:59 pm
dyqik wrote:
Tue Feb 25, 2020 11:17 am
However, the uncorrelated independent case is one extreme of the range of assumptions, so you should ask if it's unusual even in that case, especially as it's easy to answer that question.
Also that case gives the lowest probability.
I thought that was the case, but I don't statistic very much. And I'm not sure what anti-correlations do to this.
KAJ wrote:
Tue Feb 25, 2020 1:59 pm
plodder wrote:
Tue Feb 25, 2020 11:13 am
take a look at the tweet - it's 400 gauges nationally, and there's quite a spread of records. So whilst I agree that a proper engineer would do it properly, they'd also look at the dataseries themselves to find out if the records were broken by 1mm or 100mm, the duration of the time series, interconnectivity etc.

Perhaps you'll agree that if the gauges have all been connected through the whole time series (assuming they're all the same age etc - again not true), then comparing the last 10 years to the previous 50 will still give an indication of baseline change?
The more I think about it the more I think statistical analysis won't give a valid answer to your question. The correlation issue is important and ssems intractable.
I assume here you mean statistical analysis as opposed to trying to model the actual measured correlations and doing something Monte-Carlo.

I did try a quick look through Tamino's blog for posts on extreme or record values, and did find this one on estimating the likelihood of record values being set. It talks a lot about survival functions, and a note at the end mentions that this kind of analysis is fundamentally different to estimating the likelihood of multiple x% per time period events.

KAJ
Fuzzable
Posts: 310
Joined: Thu Nov 14, 2019 5:05 pm
Location: UK

Re: Stats problem - flood events

Post by KAJ » Tue Feb 25, 2020 3:15 pm

dyqik wrote:
Tue Feb 25, 2020 3:04 pm
KAJ wrote:
Tue Feb 25, 2020 1:59 pm
dyqik wrote:
Tue Feb 25, 2020 11:17 am
However, the uncorrelated independent case is one extreme of the range of assumptions, so you should ask if it's unusual even in that case, especially as it's easy to answer that question.
Also that case gives the lowest probability.
I thought that was the case, but I don't statistic very much. And I'm not sure what anti-correlations do to this.
Yes, I'd neglected anti-correlations, but I guess they're unlikely in this context.
dyqik wrote:
Tue Feb 25, 2020 3:04 pm
KAJ wrote:
Tue Feb 25, 2020 1:59 pm
plodder wrote:
Tue Feb 25, 2020 11:13 am
take a look at the tweet - it's 400 gauges nationally, and there's quite a spread of records. So whilst I agree that a proper engineer would do it properly, they'd also look at the dataseries themselves to find out if the records were broken by 1mm or 100mm, the duration of the time series, interconnectivity etc.

Perhaps you'll agree that if the gauges have all been connected through the whole time series (assuming they're all the same age etc - again not true), then comparing the last 10 years to the previous 50 will still give an indication of baseline change?
The more I think about it the more I think statistical analysis won't give a valid answer to your question. The correlation issue is important and ssems intractable.
I assume here you mean statistical analysis as opposed to trying to model the actual measured correlations and doing something Monte-Carlo.
I think the actual correlation structure would be very complicated, the different gauges being correlated with each other and with themselves with time lags. I wouldn't like to try to model that, but I guess experts in the field know more about it than me! I still think the kind of modelling discussed upthread has very little validity.

User avatar
dyqik
Princess POW
Posts: 7527
Joined: Wed Sep 25, 2019 4:19 pm
Location: Masshole
Contact:

Re: Stats problem - flood events

Post by dyqik » Tue Feb 25, 2020 3:32 pm

KAJ wrote:
Tue Feb 25, 2020 3:15 pm
dyqik wrote:
Tue Feb 25, 2020 3:04 pm
KAJ wrote:
Tue Feb 25, 2020 1:59 pm

Also that case gives the lowest probability.
I thought that was the case, but I don't statistic very much. And I'm not sure what anti-correlations do to this.
Yes, I'd neglected anti-correlations, but I guess they're unlikely in this context.
I can construct a toy model that would do that: fixed amount of precipitation arrives in a region, but it can fall on one side or another of a range of hills depending on wind direction or cloud height or something. River gauges for the rivers draining each side of the hills would then be anti-correlated to some degree.

But that's a Duplo model, rather than a Lego Technic one. ;)

Post Reply