To use my own field of study, which involves a lot of tracking animal movements, as an example, I think the key is to integrate archiving into the data collection and/or publishing process as much as possible, and provide worthwhile incentives.
In recent years, it's becoming increasingly common to use satellite tags (and similar) to track animals, as battery miniaturization lets us put them on smaller and smaller animals. Most manufacturers have configured them so that the data is automatically archived in a repository like Movebank
, and researchers can then download their data from there, along with specifying things like public visibility and embargoes.
However, the traditional method was much lower-tech, normally attaching plastic tags to the animals with some kind of individual code, and then spending time in the field trying to spot them. A lot of my PhD time is spent hanging out in wetlands at unsociable hours and in all weathers squinting through a telescope waiting for a bird to put its other leg down so I can read the code. Birdwatchers also contribute sightings to our project, and in return are sent a bird's "life history" showing where else it's been seen and when. Actually getting a data point can take hours of work, and my project supervisors also spend hours each evening liaising with members of the public, checking their records, replying to emails etc. They also spend their weekends and holidays collecting this stuff.
Our project has been running since the early 90s and is completely unique, and like all projects of that age is now quite useful for looking at stuff like impacts of climate change and habitat loss. I think it's pretty understandable that they don't want to just give it all away before we've finished looking at it - while science as a whole would probably benefit, the people who've actually put all the work in would lose out.
At the very least, some kind of mechanisms that allows data-authors to set conditions for data use would be good - for instance, if everybody using "my" (hypothetical) dataset would have to include me as a co-author on their publications, I'd be much more motivated to share it. If sharing just means that my competitors can also use my data to publish actual papers (which are counted) and all I get is a data citation (which generally are not, AFAICT), then I'm actually incentivised to make my data unusable by the current (rather counterproductive) incentive structure in academia.
Similarly, I know volunteers who have amassed huge datasets of how birds use particular areas, based on their observations recorded over decades. Every now and then somebody will email them and say "can you send me all your data, I want to write it up" - which enormously pisses them off. I've seen that kind of attitude, of appearing entitled to the fruits of others' labour without even acknowledging the work that's gone into it, lead to conservation projects lacking access to the data they need to stop developments.
It's all about respecting people's time, money and egos. Punishing people for not sharing data is one option, but I think stuff like (supposedly) mandatory trial registration in medical publication shows that those kinds of threats don't work without proper enforcement, and publishers are currently quite happy to take scientists' money without doing any actual work. I think a better solution would be to find ways to reward researchers for sharing data - publications, the de facto currency of academia, is probably the simplest thing to co-opt. A lot of journals now allow authors to specify individual 'author contributions', with 'provided data' being an option, which makes it transparent.
(Our group has been putting volunteer bird observers on papers since forever - I know birders with multiple Nature
papers who have no idea what that even means!)
All of which means that the kind of work kerrya1 is doing is both hugely valuable and probably something of an uphill struggle.