Anti-correlation does not disprove causation

Get your science fix here: research, quackery, activism and all the rest
Post Reply
IvanV
Stummy Beige
Posts: 2660
Joined: Mon May 17, 2021 11:12 am

Anti-correlation does not disprove causation

Post by IvanV » Wed Jun 16, 2021 11:36 pm

This is a statistical fallacy that ought to be better known. We all know that correlation does not prove causation, but its converse is also false. Anti-correlation does not disprove causation.

I was recently directed to a paper that started by observing that while it is well-known that correlation does not prove causation, anti-correlation plainly disproves causation. The author proceeded to do some nice statistical analysis of longterm climate and CO2 reconstructions, showed some relevant kind of anti-correlation, and thus "proved" that CO2 was not responsible for climate change.

The falseness of this assertion, that anti-correlation disproves causation, is seemingly not well-known. Thus a plausible academic could assert it in a paper, and get it past the referees.

After thinking about this for a while, I realised that I had recently seen a beautiful and simple explanation just why anti-correlation does not disprove causation. The paper was written by the country's leading econometrician, Sir David Hendry. He wrote the paper to show why another academic's "disproof" of the role of CO2 in climate change was wrong. That was the essential mistake, though it's manifestation was rather subtle and not easily spotted.

The author of the earlier erroneous paper was Michael Beenstock. He is not far behind Sir David in his reputation in econometrics. This is doubtless why Sir David felt it worth writing a formal debunking. Most people look at Beenstock's paper, observe that the relations he finds badly violate the laws of physics, laugh, and leave it there. But it is a reasonable question to ask why did seemingly good and powerful statistics produce an analysis of physical quantities that so badly violated the laws of physics. Sir David presented the explanation. The details are complicated, but the fundamental principle is the matter in hand.

What Sir David did to explain it so nicely was to present two well-known datasets, but initially without identifying them. The dependent variable A trends down, and the independent variable B trends up, with very high anti-correlation. The natural reaction of nearly everyong is that this is very strong evidence that an increase in B cannot possibly cause A to increase. It is either largely or completely irrelevant and just happens to have a strong time trend like the other series. Or if it is very relevant, then whatever effect it has on A cannot possibly be positive.

He then reveals that that A is the number of road casualties, and B is the quantity of traffic. Plainly at times when a busy road has only half the level of traffic, it tends to have only around half the level of casualties. This direct proportion is in fact the relationship Highways England uses when predicting road casualties, though there are some complications around speed and congestion. So the quantity of traffic has an immediate large and positive causation with the quantity of road accidents. The more people out there indulging in a risky activity, the more accidents they'll have.

But what we see in the long term time graphs is the effect of many other changes that have reduced the level of road accidents and overwhelmed the effect of increasing traffic.

The processes related to climate change have similar disguising effects. We have emitted CO2, but that isn't what happened in the past. The tilt of the earth changed and that had a climate effect. Only then did CO2 change in a feedback effect which accentuated the change. So it looks like CO2 follows rather than leads the temperature change. The recent paper I saw was picking up that trailing effect, and came to its erroneous conclusion.

The reasons Beenstock hadn't spotted that was the nature of his error was rather subtle and difficult, and that's why it took Hendry to spot it.

If you want it, the paper is here. https://www.researchgate.net/publicatio ... ate_Change

Some econometricians, such as Hendry, are now taking an interest in climate statistics, because it turns out statistical methods of econometrics are suitable for application to climate data. In contrast, attempts to get biologists and medical scientists to talk to econometricians have failed because their methods tend to be quite different.

User avatar
dyqik
Princess POW
Posts: 7526
Joined: Wed Sep 25, 2019 4:19 pm
Location: Masshole
Contact:

Re: Anti-correlation does not disprove causation

Post by dyqik » Thu Jun 17, 2021 10:12 am

Not knowing that anti-correlation is also correlation is the first mistake here. The second is not understanding the concept of "lag".

User avatar
Bird on a Fire
Princess POW
Posts: 10137
Joined: Fri Oct 11, 2019 5:05 pm
Location: Portugal

Re: Anti-correlation does not disprove causation

Post by Bird on a Fire » Thu Jun 17, 2021 11:34 am

And the concept of "things having more than one cause".

I didn't realise anyone was still publishing kooky climate denial though lol

I'll have a look at the paper but it sounds a bit like a sledgehammer to crack a nut.
We have the right to a clean, healthy, sustainable environment.

User avatar
Bird on a Fire
Princess POW
Posts: 10137
Joined: Fri Oct 11, 2019 5:05 pm
Location: Portugal

Re: Anti-correlation does not disprove causation

Post by Bird on a Fire » Thu Jun 17, 2021 12:13 pm

IvanV wrote:
Wed Jun 16, 2021 11:36 pm
Some econometricians, such as Hendry, are now taking an interest in climate statistics, because it turns out statistical methods of econometrics are suitable for application to climate data. In contrast, attempts to get biologists and medical scientists to talk to econometricians have failed because their methods tend to be quite different.
dyqik wrote:
Thu Jun 17, 2021 10:12 am
Not knowing that anti-correlation is also correlation is the first mistake here.
That's actually quite a good example - in biology we call it "negative correlation" rather than "anti-correlation", which makes the point rather clearer.

I think applying methods from one field to another can be very productive, if done with care. For instance, I've been using methods originally developed for social network analysis to look at spatial movements in birds. You've got to make sure you understand how the methods work first, and be intellectually honest in applying them rather than starting from an ideological position (as I suspect Beenstock et al did). It also helps to collaborate with people from the originating field(s) quite closely to spot mistakes.

There was a cool preprint (possibly a paper now) applying climate-modelling data-integration techniques to model the covid outbreak in the UK, for example. Unlike the methods epidemiologists seem to rely on, climate science (and biology) are very used to incorporating information about measurement errors, so the paper was quite successful in picking up the likely number of covid cases that weren't detected by the UK's (pisspoor) testing regime. (I'd thought of doing something similar with hidden Markov models, but never got round to it.)
We have the right to a clean, healthy, sustainable environment.

Post Reply