Scrutable

Posted: **Fri Jul 03, 2020 1:45 pm**

kerrya1 wrote: ↑
Fri Jul 03, 2020 9:40 am

Excellent, sounds similar to this work at Edinburgh https://media.ed.ac.uk/playlist/dedicat ... 0_bpqm3pzm

Do they use Omeka or another platform?

I'm not familiar with Omeka but here's one of their projects. Working in Arctic communities, there are additional challenges of internet bandwidth that I think precluded a bunch of common systems.

And at the moment, some of the people in that group are developing schema.org-based tools for discovery metadata as a lightweight tool for some of the less-resourced communities to document their data.

Posted: **Sat Jul 04, 2020 5:38 pm**

Are there repositories along the lines of github, but for sharing data instead of code? What would such a thing look like to satisfy various needs? How attractive would such a thing be? I can image requirements would vary hugely between disciplines.

Posted: **Sat Jul 04, 2020 6:53 pm**

bjn wrote: ↑
Sat Jul 04, 2020 5:38 pm
Are there repositories along the lines of github, but for sharing data instead of code? What would such a thing look like to satisfy various needs? How attractive would such a thing be? I can image requirements would vary hugely between disciplines.

Yeah, there's loads (which might be part of the problem!). For instance in ecology we have Dryad, as well as more generalist sites like Figshare where you can upload anything (data, media, manuscripts), get them a DOI and make them easily findable and citable.

Dryad has a review process where they check your submission makes sense - is it well documented? Can the files be opened? etc. Others are a bit more random, where the quality depends on what the user has uploaded.

It is also possible to upload data onto github as a CSV, of course - this is quite a common way of packaging example datasets along with analytical software packages, for instance. I published the data and code from my MSc thesis on github (not that anyone is likely to actually want it).

Posted: **Sat Jul 04, 2020 7:05 pm**

Bird on a Fire wrote: ↑
Sat Jul 04, 2020 6:53 pm

bjn wrote: ↑
Sat Jul 04, 2020 5:38 pm
Are there repositories along the lines of github, but for sharing data instead of code? What would such a thing look like to satisfy various needs? How attractive would such a thing be? I can image requirements would vary hugely between disciplines.
Yeah, there's loads (which might be part of the problem!). For instance in ecology we have Dryad, as well as more generalist sites like Figshare where you can upload anything (data, media, manuscripts), get them a DOI and make them easily findable and citable.

Dryad has a review process where they check your submission makes sense - is it well documented? Can the files be opened? etc. Others are a bit more random, where the quality depends on what the user has uploaded.

It is also possible to upload data onto github as a CSV, of course - this is quite a common way of packaging example datasets along with analytical software packages, for instance. I published the data and code from my MSc thesis on github (not that anyone is likely to actually want it).

There is also zenodo, managed by CERN but open to everyone. It includes versioning of the dataset and provide a DOI for each dataset. It allows up to 50 GB of data per dataset (and you can ask more on a case by case basis)

Posted: **Sat Jul 04, 2020 7:14 pm**

To use my own field of study, which involves a lot of tracking animal movements, as an example, I think the key is to integrate archiving into the data collection and/or publishing process as much as possible, and provide worthwhile incentives.

In recent years, it's becoming increasingly common to use satellite tags (and similar) to track animals, as battery miniaturization lets us put them on smaller and smaller animals. Most manufacturers have configured them so that the data is automatically archived in a repository like Movebank, and researchers can then download their data from there, along with specifying things like public visibility and embargoes.

However, the traditional method was much lower-tech, normally attaching plastic tags to the animals with some kind of individual code, and then spending time in the field trying to spot them. A lot of my PhD time is spent hanging out in wetlands at unsociable hours and in all weathers squinting through a telescope waiting for a bird to put its other leg down so I can read the code. Birdwatchers also contribute sightings to our project, and in return are sent a bird's "life history" showing where else it's been seen and when. Actually getting a data point can take hours of work, and my project supervisors also spend hours each evening liaising with members of the public, checking their records, replying to emails etc. They also spend their weekends and holidays collecting this stuff.

Our project has been running since the early 90s and is completely unique, and like all projects of that age is now quite useful for looking at stuff like impacts of climate change and habitat loss. I think it's pretty understandable that they don't want to just give it all away before we've finished looking at it - while science as a whole would probably benefit, the people who've actually put all the work in would lose out.

At the very least, some kind of mechanisms that allows data-authors to set conditions for data use would be good - for instance, if everybody using "my" (hypothetical) dataset would have to include me as a co-author on their publications, I'd be much more motivated to share it. If sharing just means that my competitors can also use my data to publish actual papers (which are counted) and all I get is a data citation (which generally are not, AFAICT), then I'm actually incentivised to make my data unusable by the current (rather counterproductive) incentive structure in academia.

Similarly, I know volunteers who have amassed huge datasets of how birds use particular areas, based on their observations recorded over decades. Every now and then somebody will email them and say "can you send me all your data, I want to write it up" - which enormously pisses them off. I've seen that kind of attitude, of appearing entitled to the fruits of others' labour without even acknowledging the work that's gone into it, lead to conservation projects lacking access to the data they need to stop developments.

It's all about respecting people's time, money and egos. Punishing people for not sharing data is one option, but I think stuff like (supposedly) mandatory trial registration in medical publication shows that those kinds of threats don't work without proper enforcement, and publishers are currently quite happy to take scientists' money without doing any actual work. I think a better solution would be to find ways to reward researchers for sharing data - publications, the de facto currency of academia, is probably the simplest thing to co-opt. A lot of journals now allow authors to specify individual 'author contributions', with 'provided data' being an option, which makes it transparent.

(Our group has been putting volunteer bird observers on papers since forever - I know birders with multiple Nature papers who have no idea what that even means!)

All of which means that the kind of work kerrya1 is doing is both hugely valuable and probably something of an uphill struggle.

Posted: **Sat Jul 04, 2020 7:20 pm**

See also: open access publishing.

My funder requires me to publish open-access, which is fair enough as it's public money. They do not, however, actually provide me with extra money to pay for it. "Article processing charges" for a decent ecology journal are normally more than a month of my stipend.

Seeing as I'm writing it for free, it's reviewed for free, and the handling editors are paid at most a pittance, why on earth should I fork out a grand to a multi-billion euro company just to host a pdf on their website?

I think universities/other institutions should start running non-profit publications. PLOS are pretty decent. Elsevier can f.ck right off - all that money and their typeset manuscripts are still hella fugly.

Posted: **Sat Jul 04, 2020 11:44 pm**

Please, please don't use uncontrolled data stores like figshare unless you have absolutely no other option. Dryad is a little better but they're still a bit of a data dumping ground.

There are professionally managed data centres for particular institutions, nations, and scientific disciplines, that can look after your data properly for the long-term, perform basic Q&A, make sure that it finds its way to appropriate aggregators like movebank (new similar initiatives are evolving all the time so there might be one for your data type in a few years), and often they can organise embargoes. So, metadata describing your dataset can be published and made searchable but the dataset itself can be made contingent on the owner's permission. I have never met or heard of a staffer from Dryad or Figshare attending any of the data conferences I've been a part of, which raises flags for me.

That said, the only Portuguese data manager I know works in Belgium, so BOAF would probably need to work with a discipline-specific data centre, rather than a local one.

As a professional data manager, I'm trying to persuade more people to work more openly because the data become much more valuable when they can be mashed together and resubsetted, but I don't have a heart of stone for people whose data are particularly labour- and time-intensive.

There's a trade-off to be found between "my dataset is a time series that will never be complete so I don't ever have to share it, even though the public paid for it" and "you must upload every datapoint to the cloud in real time so that office-based scientists can download it and start scooping you immediately".

In my discipline, for field data, a two year embargo is pretty reasonable for most purposes, with extensions negotiated on a case-by-case basis. Admittedly, many of my scientists work in the Antarctic Treaty some, so the legal expectation of data sharing has had time to percolate through the broader Southern Ocean community in a way that maybe have happened as much in other parts of the world.

And I totally agree that the incentives need to be fixed to properly value the scientific contributions of people who make their data properly findable, accessible, interoperable, and reusable (the FAIR principles) by fully rewarding them for their contribution to science. Us professional data folk are working that side of it too, we promise.

Co-authorship and even data citations are really tricky spaces to navigate, with no one-size fits all solution. I play with data collections that have thousands of contributors and the tools are only just being developed to find ways that an author could credit them all. It'll probably involve putting permanent identifiers on everything (DOIs for each record, ORCiDS for humans, etc) and a DOI for a file that is simply a list of the DOIs for all the records used in a particular paper. Then you "simply" need a system that can see all DOIs in citation lists in all journals, trace through those sub files and then assign credit through some metric to say "BOAF provided 6 observations to a paper with Y impact factor and 21 observations to a paper with Z impact factor and this is worth X amounts of credit towards his next grant".

You can see why there isn't yet an easy solution for institutions to simply adopt and use as their next metric.

Posted: **Sun Jul 05, 2020 5:36 pm**

Bird on a Fire wrote: ↑
Sat Jul 04, 2020 7:20 pm
See also: open access publishing.

My funder requires me to publish open-access, which is fair enough as it's public money. They do not, however, actually provide me with extra money to pay for it. "Article processing charges" for a decent ecology journal are normally more than a month of my stipend.

Seeing as I'm writing it for free, it's reviewed for free, and the handling editors are paid at most a pittance, why on earth should I fork out a grand to a multi-billion euro company just to host a pdf on their website?

I think universities/other institutions should start running non-profit publications. PLOS are pretty decent. Elsevier can f.ck right off - all that money and their typeset manuscripts are still hella fugly.

Hence Plan s - https://www.coalition-s.org/ intended to prevent academic pubishers double dipping and imposing longer and longer embargo periods before allowing open access publication.

Choosing an appropriate data repository is fraught with pitfalls, which is why we want researchers to speak to us before depositing their data in one. In terms of general open data repositories we generally recommend:
* Dryad for the biological sciences as it requires linkage to a published paper, which of course precludes data that doesn't link to one.
* One of the NCBI or EBI repositories for genomics/proteomics/etc. NERC funded people need to deposit in the appropriate data centre.
* ESRC and some MRC funded datasets qualify for the UK Data Archive.
*Astronomy and the big physics experiments do a pretty good job of managing their own data as handling that volume of data is a massive issue. *Zenodo is a sustainable any format repository but there is no quality assurance

Beyond this are a myriad of different repositories specializing in different data types or disciplines. To get an idea how complicated it can get just look at https://www.re3data.org/

Oh, and finally most research intensive Uni's now offer an instuitional data repository, but we view ours as a repository of last resort for those who can't find a better home for their data.

Posted: **Sun Jul 05, 2020 5:45 pm**

Bird on a Fire wrote: ↑
Sat Jul 04, 2020 7:14 pm

All of which means that the kind of work kerrya1 is doing is both hugely valuable and probably something of an uphill struggle.

Woohoo - a researcher who actually values me *does happy dance around living room*.

Seriously though I love my job; I love working with PGRs and early career researchers to help them understand and implement RDM in their work so they can build their careers, I love working with PIs to help them understand the changing landscape of Open Science and the ways it can protect their contribution to science, and I love talking to people who aren't scientists to see how open science can give them opportunities to really connect with research and how it impacts their everyday lives.

Posted: **Mon Jul 06, 2020 8:00 am**

As a slight aside (or maybe not

).

Years ago, I remember reading a science fiction book where each science- and it's branches- had become sooooooo specialised that research institutes/universities had to employ, what the book called JOATs -Jack of all Trades, to basically see if research in one area may have applications in another.
Can't for the life of me remember the name of the book, or the author

Posted: **Mon Jul 06, 2020 8:41 am**

Gentleman Jim wrote: ↑
Mon Jul 06, 2020 8:00 am
As a slight aside (or maybe not ).

Years ago, I remember reading a science fiction book where each science- and it's branches- had become sooooooo specialised that research institutes/universities had to employ, what the book called JOATs -Jack of all Trades, to basically see if research in one area may have applications in another.
Can't for the life of me remember the name of the book, or the author

Taking the aside a bit further, I remember a similar book only it was set on a exploration space-ship loaded with specialist scientists and one man who was not allowed to specialise, known as the Generalist, whose purpose was to do much the same as a JOAT, but with problems they faced.

Posted: **Tue Jul 07, 2020 5:38 pm**

Aitch wrote: ↑
Mon Jul 06, 2020 8:41 am

Gentleman Jim wrote: ↑
Mon Jul 06, 2020 8:00 am
As a slight aside (or maybe not ).

Years ago, I remember reading a science fiction book where each science- and it's branches- had become sooooooo specialised that research institutes/universities had to employ, what the book called JOATs -Jack of all Trades, to basically see if research in one area may have applications in another.
Can't for the life of me remember the name of the book, or the author

Taking the aside a bit further, I remember a similar book only it was set on a exploration space-ship loaded with specialist scientists and one man who was not allowed to specialise, known as the Generalist, whose purpose was to do much the same as a JOAT, but with problems they faced.

I'd forgotten it until you mentioned it. No idea what it was, but think it's quite old.

I sometimes think that's my role at work. Knowing enough of the other subjects to actually contribute, but not really being the expert, except for a few minor areas. But being able to understand enough to talk to the experts.

Posted: **Wed Jul 08, 2020 2:31 am**

Aitch wrote: ↑
Mon Jul 06, 2020 8:41 am
Taking the aside a bit further, I remember a similar book only it was set on a exploration space-ship loaded with specialist scientists and one man who was not allowed to specialise, known as the Generalist, whose purpose was to do much the same as a JOAT, but with problems they faced.

That's The Voyage of the Space Beagle by A. E. van Vogt.

Posted: **Wed Jul 08, 2020 9:56 am**

I thought it might be "Sucker Bait" by Isaac Asimov but the generalists in that story are called Mnemonics.

Posted: **Wed Jul 08, 2020 10:25 am**

I think I was thinking of the Asimov one, sounds most like the story I remember, though I misremembered the term used for the generalist character.

Thanks, chaps.

Scrutable

I agree with your conclusions completely, and your paper is still terrible

Re: I agree with your conclusions completely, and your paper is still terrible

Re: I agree with your conclusions completely, and your paper is still terrible

Re: I agree with your conclusions completely, and your paper is still terrible

Re: I agree with your conclusions completely, and your paper is still terrible

Re: I agree with your conclusions completely, and your paper is still terrible

Re: I agree with your conclusions completely, and your paper is still terrible

Re: I agree with your conclusions completely, and your paper is still terrible

Re: I agree with your conclusions completely, and your paper is still terrible

Re: I agree with your conclusions completely, and your paper is still terrible

Re: I agree with your conclusions completely, and your paper is still terrible

Re: I agree with your conclusions completely, and your paper is still terrible

Re: I agree with your conclusions completely, and your paper is still terrible

Re: I agree with your conclusions completely, and your paper is still terrible

Re: I agree with your conclusions completely, and your paper is still terrible

Re: I agree with your conclusions completely, and your paper is still terrible