It was announced yesterday that the Assemblathon 2 paper has won the 2013 BioMed Central award for ‘Open Data’ (sponsored by Lab Archives). For more details on this see here and here.

While it is flattering to be recognized for our efforts to conduct science transparently, it still feels a little strange that we need to have awards for this kind of thing. All data that results from publicly funded science research should be open data. Although I feel there is growing support for the open science movement, much still needs to be done.

One of the things that needs to become commonplace is for scientists to put their data and code in stable, online repositories, that are hopefully citable as independent resources (i.e. with a DOI). For too long, people have used their lab websites as the end point for all of their (non-sequence[1]) related data (something that I have also been guilty of).

Part of the problem is that even when you take steps to submit data to an online repository of some kind, not all journals allow you to cite them. This tweet by Vince Bufflo from yesterday illustrated one such issue (see this Storify page for more details of the resulting discussion):

A user wants to cite Scythe, but the journal won't allow .com URLs. Any way around this? #citegithub #isthisreally21stcentury
— Vince Buffalo (@vsbuffalo) April 22, 2014

Tools like arXiv.org, BioRxiv, Figshare, Slideshare, GitHub, and GigaDB are making it easier to make our data, code, presentations, and preliminary results more available to others. I hope that we see more innovation in this area and I hope that more people take an ‘open’ approach to other aspects of science, not just the sharing of data[2]. Luckily, with people around like Jonathan Eisen and C. Titus Brown, we have some great role models for how to do this.

How will we know when we are all good practitioners of open science? When we no longer need to give out awards to people just for doing what we should all be doing.

For the most part, journals require authors to submit nucleotide and protein sequences to an INSDC database, though this doesn’t always happen. ↩
I have written elsewhere about the steps that the Assemblathon 2 took to try to be open throughout the whole process of doing the science, writing the paper, and communicating the results. ↩