Nonreplicable publications cited more than replicable ones

jimbob · Post by **jimbob** » Thu Jun 03, 2021 3:03 pm

https://advances.sciencemag.org/content/7/21/eabd1705

We use publicly available data to show that published papers in top psychology, economics, and general interest journals that fail to replicate are cited more than those that replicate. This difference in citation does not change after the publication of the failure to replicate. Only 12% of postreplication citations of nonreplicable findings acknowledge the replication failure. Existing evidence also shows that experts predict well which papers will be replicated. Given this prediction, why are nonreplicable papers accepted for publication in the first place? A possible answer is that the review team faces a trade-off. When the results are more “interesting,” they apply lower standards regarding their reproducibility.

Still, it keeps STeamTraen occupied.

shpalman · Post by **shpalman** » Thu Jun 03, 2021 3:47 pm

jimbob wrote: Thu Jun 03, 2021 3:03 pm https://advances.sciencemag.org/content/7/21/eabd1705

We use publicly available data to show that published papers in top psychology, economics, and general interest journals that fail to replicate are cited more than those that replicate. This difference in citation does not change after the publication of the failure to replicate. Only 12% of postreplication citations of nonreplicable findings acknowledge the replication failure. Existing evidence also shows that experts predict well which papers will be replicated. Given this prediction, why are nonreplicable papers accepted for publication in the first place? A possible answer is that the review team faces a trade-off. When the results are more “interesting,” they apply lower standards regarding their reproducibility.

Still, it keeps STeamTraen occupied.

A possible answer is that it's not the job of the referee to actually try to reproduce the experiment in the few weeks we are given to decide whether to accept or reject a paper.

But the general answer is that yes, "interesting" trumps "correct" when it comes to certain high-impact journals.

monkey · Post by **monkey** » Thu Jun 03, 2021 4:06 pm

shpalman wrote: Thu Jun 03, 2021 3:47 pm
jimbob wrote: Thu Jun 03, 2021 3:03 pm https://advances.sciencemag.org/content/7/21/eabd1705

We use publicly available data to show that published papers in top psychology, economics, and general interest journals that fail to replicate are cited more than those that replicate. This difference in citation does not change after the publication of the failure to replicate. Only 12% of postreplication citations of nonreplicable findings acknowledge the replication failure. Existing evidence also shows that experts predict well which papers will be replicated. Given this prediction, why are nonreplicable papers accepted for publication in the first place? A possible answer is that the review team faces a trade-off. When the results are more “interesting,” they apply lower standards regarding their reproducibility.

Still, it keeps STeamTraen occupied.
A possible answer is that it's not the job of the referee to actually try to reproduce the experiment in the few weeks we are given to decide whether to accept or reject a paper.

But the general answer is that yes, "interesting" trumps "correct" when it comes to certain high-impact journals.

I've always seen it as the job of the reviewer to do their best to make sure a paper contains enough information to ensure that someone else could reproduce it, if they tried. That's how I approach it, anyway, it's not my (unpaid) job to do more, and I have my own work to get on with. I don't think all reviewers do this, as far as I can tell, in my field, anyway.

Post by **Bird on a Fire** » Fri Jun 04, 2021 10:13 am

Maybe all the citations are saying "these guys are wrong", or at least noting that it's an unusual result.

The main problem here is the absurd level of veneration given to citations (and by extension impact factors). They don't actually mean anything.

FlammableFlower · Post by **FlammableFlower** » Fri Jun 04, 2021 11:12 am

I'm probably being overly pedantic... but in terms of experimental details - is there sufficient detail for it to be repeated, as the authors had done?

Reproducibility itself is a bit of a harder one to pin down or demand - the authors may have reported something in good faith, but did they report everything (it's not a given) and did they miss any details/were there other things involved? There's the example of ppb levels of trace palladium being the actual cause of catalysis in reactions that people had reported as palladium-free. It turned up impregnated in the Teflon-coated magnetic stirrer bars that had been used for reaction that had had palladium present and even in freshly opened, from the supplier, bottles of sodium and caesium carbonate.

In my final year undergrad project I was supposed to be working off a particular catalysis paper. I ended up spending most of my time failing to reproduce their results. 85% yield? Not in my hands... always 35% - what was I doing wrong? You had Nobel laureates publishing on potential mechanisms. Then, just as I was writing up, I found they'd just published a paper in a really obscure journal that said (to paraphrase), "you know we said dry molecular sieves...? They shouldn't be dried. They need the trace water present to form the active catalyst. If you scrupulously dry them you'll only get a 35% yield..." So, either the oven they were storing their sieves in wasn't up to scratch, or someone didn't want to admit to their boss that they hadn't been bothering to dry the mol sieves before use.

Then does everything else fit together? If the experiments look ok, are the data from them ok? Are the conclusions from that sound?

Novelty does have a part to play - when reviewing I've turned down plenty on the account they are massively insignificant changes to (sometimes several) papers the same authors have published elsewhere - it's just an attempt to get paper and citation count up.

Additionally, around this issue - there are journals out there solely focused on reproducibility. Although, maybe this is coming from a synthetic discipline, that I take it for granted. E.g. Organic Syntheses is generally considered a quite dull place to publish, but it's been going for a century now and all procedures are checked multiple times and come with notes and commentary on the published procedure even down to the differences in outcome due to supplier of starting materials. If it's in there, you can pretty much rely on it to work as it says.

Nowadays there's the Nature Protocols:

Protocols are presented in a 'recipe' style, providing step-by-step descriptions of procedures that users can immediately apply in their own research. All of our Protocols have been proven to work already (used to generate data in published papers), with further validation provided by peer review of the Protocols themselves. As a supporting primary research paper is a requirement for publication, novelty is not a prerequisite. However, it is important that our Protocols add value to the published literature and expand significantly upon the information available in the supporting papers (for example, with additional detail relating to experimental design, troubleshooting, data analysis, etc.).

kerrya1 · Post by **kerrya1** » Fri Jun 04, 2021 11:40 am

If you are looking for a good way to create, share, and validate research protocols then I would recommend protocols.io. We're trying to get more groups to use it, not only for their own private protocols but also to share and receive feedback on protocols associated with publications. The ability to fork established protocols also allows tweaks to be made for different situations whilst keeping a record of when and why changes are made.

IvanV · Post by **IvanV** » Sun Jun 06, 2021 3:10 pm

I think this is mainly about the kind of study that uses statistics to draw its conclusions. The subjects cited - psychology, economics - do a lot of that.

There's a been a long literature on why are so many studies using stats are not replicable. I think some meta-studies have said that if you come across a paper that says it is reporting results at 90% significance, the probability that it is a replicable result is in reality nearer 30-50%, not 90%.

One big factor is that tests reported to be 90% are only 90% in certain proper conditions. One proper condition is that you set it up as a single, once-off test. But people don't usually do that. They often do some search activities to find a "good" functional form. But such search processes use up some of the statistical power of tests. But it is rare to report what adjustment to the statistical power is for the search process you went through. One reason for that is that typically it's not even a tractable calculation. Though there are some calculations you can do for canonical cases which present a bound on what it might be.

I have read papers where people used a macro to do thousands of runs to exhaustively search all possible functional forms within a set of related specifications. As Professor Sir David Hendry, the most famous living econometrician, said in a paper on model specification search, it's a perfectly reasonable thing to do provided you cite the adjustment to the statistical power that results from doing it. He demonstrated a formula for calculating it for a certain type of exhaustive search, which shows that the adjustment can be large and indeed leave you with nothing, depending how much power you had in the first place. These are apparently standard methods in so-called "big data", but people seem to be less aware of statistical consequences of "fitting" the data.

And then, on top of that, as mentioned, there is publication bias for "interesting results". Interesting results are, of course, the ones most likely to be wrong.

It is a particularly worrying issue in relation to medical tests of new drugs.

jimbob · Post by **jimbob** » Mon Jun 07, 2021 10:20 am

IvanV wrote: Sun Jun 06, 2021 3:10 pm I think this is mainly about the kind of study that uses statistics to draw its conclusions. The subjects cited - psychology, economics - do a lot of that.

There's a been a long literature on why are so many studies using stats are not replicable. I think some meta-studies have said that if you come across a paper that says it is reporting results at 90% significance, the probability that it is a replicable result is in reality nearer 30-50%, not 90%.

One big factor is that tests reported to be 90% are only 90% in certain proper conditions. One proper condition is that you set it up as a single, once-off test. But people don't usually do that. They often do some search activities to find a "good" functional form. But such search processes use up some of the statistical power of tests. But it is rare to report what adjustment to the statistical power is for the search process you went through. One reason for that is that typically it's not even a tractable calculation. Though there are some calculations you can do for canonical cases which present a bound on what it might be.

I have read papers where people used a macro to do thousands of runs to exhaustively search all possible functional forms within a set of related specifications. As Professor Sir David Hendry, the most famous living econometrician, said in a paper on model specification search, it's a perfectly reasonable thing to do provided you cite the adjustment to the statistical power that results from doing it. He demonstrated a formula for calculating it for a certain type of exhaustive search, which shows that the adjustment can be large and indeed leave you with nothing, depending how much power you had in the first place. These are apparently standard methods in so-called "big data", but people seem to be less aware of statistical consequences of "fitting" the data.

And then, on top of that, as mentioned, there is publication bias for "interesting results". Interesting results are, of course, the ones most likely to be wrong.

It is a particularly worrying issue in relation to medical tests of new drugs.

Which leads us to Didier Raoult...

Well, to be fair to most journals, he did chose a cowboy journal to publish that very influential HCQ study.

Post by **Bird on a Fire** » Mon Jun 07, 2021 11:23 am

In terms of drug trials, the idea of pre-trial registration of outcomes to be published addresses that. Or it would, if pharma companies actually stuck to it and publishing companies actually enforced it like they said they would. It's a nice idea, though.

IvanV · Post by **IvanV** » Tue Jun 08, 2021 10:40 am

Bird on a Fire wrote: Mon Jun 07, 2021 11:23 am In terms of drug trials, the idea of pre-trial registration of outcomes to be published addresses that. Or it would, if pharma companies actually stuck to it and publishing companies actually enforced it like they said they would. It's a nice idea, though.

Just because you can take their data and replicate their statistics, doesn't mean it is a real result. True replication in such cases is repeating the actual experiment. Get more data, and see if the model is still significant with new data. This is the usual point of failure of non-replicable experiments that depend on stats for their demonstration.

I've learned this the hard way. Client wanted a forecast of how much income a particular charging scheme would raise, given the total quantity of business the charging scheme was applied to. We had some history to go on. The actual income raised depended on the distribution of incomes of the individual businesses that made up the total, as the charging scheme was progressive. The data on this distribution was very poor. So I ran a simple but "obviously wrong" model on it. It was "wrong" in the sense that the simplifying assumption was very crude, and we had a couple of facts that indicated it was very crude indeed. But it meant I was only estimating one parameter from the historic data. It was the best I could do with a single parameter estimation. This "wrong" model acted as a pretty good predictor for many years, predicting income given the quantity of business within about 5%-10% for many years.

I twice tried to make a "better" model, taking more account of the actual distribution of the individual businesses by more detailed statistical inference of what that was. This needed me to estimate a few more parameters in a multiple regression, better to shape the distribution. Got a beautiful fit on historic data on each occasion, R2 over 99%, all the parameters significant and plausible. But the forecasts didn't work at all. Produced completely wrong predictions. My beautiful fits were "fitting" it. It did not find the true distribution at all. Just some random arrangement of those parameters that "fitted" history well. To show it was completely wrong, one more year's data sufficed.

Post by **Bird on a Fire** » Tue Jun 08, 2021 11:01 am

The point of trial registration is to prevent p-hacking/overfitting/fishing expeditions and prevent publication bias. The researchers say what kind of model they'll fit before they get the data, which means they can't go fishing for an overfitted, poorly predictive model.

The AllTrials campaign has had some success in getting funders and publishers to make it mandatory, but enforcement is lax https://www.alltrials.net/

I was addressing your previous post, where you identified statistical jiggery-pokery as a major source of non-replicability that's particularly worrying when it happens with medical trials.

Collecting more data is indeed another problem. There's no funding for it, and nobody will publish articles that merely replicate previous findings. The economic incentives in academia reward new shiny surprises, not solid plodding quality control, so it shouldn't be too surprising that the system is charging out a lot of garbage.

Fixing it will require (at least) one of two things: a top-down change in the culture of what's funded and published, or a bottom-up change in what scientists do.

Scrutable

Nonreplicable publications cited more than replicable ones

Nonreplicable publications cited more than replicable ones

Re: Nonreplicable publications cited more than replicable ones

Re: Nonreplicable publications cited more than replicable ones

Re: Nonreplicable publications cited more than replicable ones

Re: Nonreplicable publications cited more than replicable ones

Re: Nonreplicable publications cited more than replicable ones

Re: Nonreplicable publications cited more than replicable ones

Re: Nonreplicable publications cited more than replicable ones

Re: Nonreplicable publications cited more than replicable ones

Re: Nonreplicable publications cited more than replicable ones

Re: Nonreplicable publications cited more than replicable ones