Scrutable

Posted: **Wed Jul 29, 2020 8:02 pm**

In another worrying development for the integrity of published scientific results, a project that got 70 teams of neuroscientists to analyse the same fMRI dataset found that they got wildly inconsistent results.

https://medium.com/the-spike/seventy-te ... d96c23dbf4

The implications probably stretch beyond analysis of fMRI:

he NARPS paper ends with the warning that “although the present investigation was limited to the analysis of a single fMRI dataset, it seems highly likely that similar variability will be present for other fields of research in which the data are high-dimensional and the analysis workflows are complex and varied”.

The Rest of Science: “Do they mean us?”

Yes, they mean you. These crises should give any of us working on data from complex pipelines pause for serious thought. There is nothing unique to fMRI about the issues they raise.

FWIW I recently teamed up with a friend and contributed an analysis to a similar project for ecology & evolutionary biology. I'm interested (if a bit anxious!) to see what the results are. On the one hand, analytical pipelines are generally less complex, though this is rapidly changing with the advent of high-dimensional data from e.g. biologgers (not to mention the -omics cans of worms). But on the other, field data is absolutely rife with confounding effects, many of which are difficult to measure, and it often seems to me a bit of a personal choice whether or not certain things get included in the analysis.

I think this kind of thing has interesting implications for open data and data publication - if 70 can analysts get 70 different results from the same dataset, the importance of enabling independent verification of results is clear.

Posted: **Thu Jul 30, 2020 9:04 am**

The fundamental problem is that the systems involved are tremendously complicated, and unlike something like high energy physics we don't have hugely expensive machines running large numbers of experiments to get 'reasonable' data. So a) the statistical tools we have are too wimpy to answer the kind of questions we want to ask with the data we have available, and b) many scientists don't use them properly anyway.

Posted: **Thu Jul 30, 2020 9:09 am**

Bird on a Fire wrote: ↑
Wed Jul 29, 2020 8:02 pm
...

I think this kind of thing has interesting implications for open data and data publication - if 70 can analysts get 70 different results from the same dataset, the importance of enabling independent verification of results is clear.

I'm probably missing something here, but if 70 scientists can get 70 different results, what guarantee is there that independent verification is going to come up with something any more 'accurate' or 'right' or whatever you want to call it?

Posted: **Thu Jul 30, 2020 9:11 am**

Aitch wrote: ↑
Thu Jul 30, 2020 9:09 am

Bird on a Fire wrote: ↑
Wed Jul 29, 2020 8:02 pm
...

I think this kind of thing has interesting implications for open data and data publication - if 70 can analysts get 70 different results from the same dataset, the importance of enabling independent verification of results is clear.
I'm probably missing something here, but if 70 scientists can get 70 different results, what guarantee is there that independent verification is going to come up with something any more 'accurate' or 'right' or whatever you want to call it?

Independent verification should be able to point out that the dataset in this case is just noise so the result the first analyst got when they wrote the paper is meaningless.

Alternatively, what actually happens in the Nature paper is that one of the hypotheses stands out as being confirmed.

Posted: **Thu Jul 30, 2020 11:49 am**

Crikey.

Verification doesn't just mean "does a different team get the same result" though. It also encompasses making predictions from the data and designing (well-controlled) experiments to test those predictions. Not always easy, especially in neuroscience, but in my field, we use a lot of gene expression data ("x is upregulated in cancer" type stuff) and it's definitely a lot of work to follow up on the Big Data, which can be generated much faster than is useful sometimes!

Posted: **Wed Sep 02, 2020 2:58 pm**

I think there are two ways of looking at this:

1) Verifying a published paper and the conclusions the authors reached - To achieve this you need a fully set of supporting documentation and metadata, not just the dataset but also detailed methodologies for data creation, processing, and analysis, copies of algorithms used or software written, specific details of the hardware and software used, exact lists of reagents include manfacturers details, etc, etc. Only when you put all the pieces of the puzzle together is there any chance of verifying the published output.

2) Replicating the results to prove that the conclusions made are "real" not an artifact of the analysis - this requires the complete dataset and some documentation, and metadata so you can understand what the original researchers did, but the expectation is that a replicating team would then apply their own methodologies to the data to see if they can achieve the same results.

Of course there is little support for or glamour in doing either verification or replication studies so unless their is clear evidence of mistakes or malpractice then the original results are often just accepted.

Scrutable

Many analysts reveal irreplicability of results

Many analysts reveal irreplicability of results

Re: Many analysts reveal irreplicability of results

Re: Many analysts reveal irreplicability of results

Re: Many analysts reveal irreplicability of results

Re: Many analysts reveal irreplicability of results

Re: Many analysts reveal irreplicability of results