Photo credit: PR Company Handout
By Dorothy Bishop
For years, researchers squirreled their data away after completing a study. When I started out in research in the 1970s, there were few options for sharing data: there was no email or internet. I have dim memories of analysing data from the 1958 National Child Development Study. The files arrived on enormous disks that I had to take to the local computer centre to read.
Now, though, we have ways of not just storing, but electronically sharing data. Archiving is not trivial: it requires proper documentation of data, and anonymisation when human participants are involved. But the advantages are clear to see: data in an archive can be re-used by other scientists, increasing its potential value. Data can also be future-proofed, avoiding the scenario where key results exist only on a kind of floppy disk that no longer can be read.
But as we move to wider data-sharing new questions arise. In particular, who should have access to the data? The simplest answer is everyone: the scientist could just put their data out there, and anyone and everyone could view it. In many areas, this is unproblematic, but some scientists have reservations about completely free access, even if they agree in principle with open data.
In some cases, there are concerns that data may be misused by people with conflicted interests or a specific ideological agenda. A few weeks ago, there was uproar when it was found that Robert de Niro planned to screen a film, Vaxxed, at the Tribeca Film Festival. The film highlights an analysis of data on autism and vaccination from a large US database (CDC) which claimed to find a greatly increased rate of autism in children who had been vaccinated, provided they were African-American boys vaccinated in a specific time window. It was argued that there was a conspiracy to cover up this shocking statistic, even though the analysis was clearly flawed, the results were discrepant with the rest of the literature, and the paper was subsequently retracted. It could be argued that overall, this was a win for the self-correcting process of science, because the errors in the analysis were quickly discovered, and when Robert de Niro was made aware of the concerns about the misinformation in the film, he withdrew it from the festival. But there’s no doubt that damage was done. Once conspiracy theories get established, they can be difficult to dislodge. From the point of view of anti-vaxxers, the withdrawal of the film just provides further evidence that there is a conspiracy to silence those who speak the truth.
Would the situation have been different if there had been restrictions on access to the data? Probably not. The problem is not so much who has the data, as what they do with it. A particular danger comes from unrestricted data-trawling of the kind that was evident in the CDC analysis. Although these dangers are especially serious when those doing the analysis are determined to find a particular result, they are not negligible when reputable and relatively open-minded scientists do secondary analyses.
Large datasets allow for analytic flexibility, and it is all too tempting to trawl a dataset for “significant” associations. Exploratory analysis is important for scientific progress, but inferential statistics lose their meaning if the researcher has selected which data to analyse on the basis of the observed results. One answer is to reproduce findings in a new dataset. An alternative is to require those analysing the data to specify in advance what analyses they plan to do – this is directly parallel to the idea of pre-registration of yet-to-be-done studies, which is beginning to gain traction in many areas of science as a way of improving reproducibility by distinguishing hypothesis-testing from exploratory analyses.
But how would we keep everyone honest? If we place restrictions on who has access to the data and what they do with it, we could end up with those who collected the data acting as gatekeepers. This runs the risk that if scientists themselves have conflict of interest or ideological agendas, they might deny access to others on spurious grounds.
Continue reading by clicking the name of the source below.