A timely call to overhaul how scientists publish supplementary material [Link]

Great new editorial piece in BMC Bioinformatics by Mihai Pop and Steven Salzberg that tackles a subject that people probably don't think about too much:

They highlight some of the problems that arise from the growing trend in some journals to publish very short articles that are accompanied by extremely lengthy supplementary material. They single out a few particularly lop-sided papers — including a 6-page article that has 165 pages of supplementary material — and make some solid observations about why this facet of publishing has become problem. Perhaps most importantly, citations that are buried in supplementary material do not get tracked by citation indices.

They conclude the paper with a proposal:

The ubiquitous use of electronic media in modern scientific publishing provides an opportunity for the better integration of supplementary material with the primary article. Specifically, we propose that supplementary items, irrespective of format, be directly hyper-linked from the text itself. Such references should be to specific sections of the supplementary material rather than the full supplementary text.

Yes, yes, a thousand times yes!

We asked 272 bioinformaticians…name something that makes you angry: more reflections on the poor state of software documentation.

I'd like to share the details of a recent survey conducted by Nick Loman and Thomas Connor that tried to understand current issues with bioinformatics practice and training.

The survey was announced on twitter and attracted almost 300 responses. Nick and Tom have kindly placed the results of the survey on Figshare so that others can play with the data (it seems fitting to talk about this today as it is International Open Access Week):

When you ask a bunch of bioinformaticians the question What things most frustrate you or limit your ability to carry out bioinformatics analysis? you can be sure that you will attract some passionate, and often amusing, answers (I particularly liked someone's response to this question "Not enough Heng Li").

I was struck by how many people raised the issue of poor, incomplete, or otherwise terrible software documentation as a problem (there were at least 42 responses that mentioned this). The availability of 'good documentation' was also listed as the 2nd most important factor when choosing software to use.

I recently wrote about whether this problem is something that really needs to be dealt with by journals and by the review process. It shouldn't be enough that software is available and that it works, we should have some minimal expectation for what documentation should accompany bioinformatics software.

Keith's 10 point checklist for reviewing software

If you are ever in a position to review a software-based manuscript, please check for the following:

  1. Is there a plain text README file that accompanies the software and which explains what the program does and who created it?
  2. Is there a comprehensive manual available somewhere that describes what every option of the program does?
  3. Is there a clear version number or release date for the software?
  4. Does the software provide clear installation instructions (where relevant) that actually work?
  5. Is the software accompanied by an appropriate license?
  6. For command-line programs, does the program give some sensible output when no arguments are provided?
  7. For command-line programs, does the program give some sensible output when -h and/or --help is specified (see this old post of mine for more on this topic)?
  8. For command-line programs, does the built-in help/documentation agree with the external documentation (text/PDF), i.e. do they both list the same features/options?
  9. For script based software (Perl, Python etc.), does the code contain a reasonable level of comments that allow someone with relevant coding experience to understand what the major sections of the program are trying to do?
  10. Is there a contact email address (or link to support web page) provided so that a user can ask questions and get more help?

I'm not expecting every piece of bioinformatics software to tick all 10 of these boxes, but most of these are relatively low-hanging fruit. If you are not prepared to provide useful documentation for your software, then you should also be prepared for people to choose not to use your software, and for reviewers to reject your manuscript!

Making code available: lab websites vs GitHub vs Zenodo

Our 2007 CEGMA paper was received by the journal Bioinformatics on December 7, 2006. So this means it was about 9 years ago that we had to have the software on a website somewhere with a URL that we could use in a manuscript. This is what ended up in the paper:

Even though we don't want people to use CEGMA anymore, I'm at least happy that the link still works! I thought it would be prudent to make the paper link to a generic top-level page of our website (/Datasets) rather than to a specific page. This is because experience has taught me that websites often get reorganized, and pages can move.

If we were resubmitting the paper today and if we were still linking to our academic website, then I would probably suggest linking to the main website page (korflab.ucdavis.edu) and ensuring that the link to CEGMA (or 'Software') was easy to find. This would avoid any problems with moving/renaming pages.

However, even better would be to use a service like Zenodo to permanently archive the code; this would also give us a DOI for the software too. Posting code to repositories such as GitHub is (possibly) better than using your lab's website, but people can remove GitHub repositories! Zenodo can take a GitHub repository and make it (almost) permanent.

The Zenodo policies page makes it clear that even in the 'exceptional' event that a research object is removed, they still keep the DOI URL working and this page will state what happened to the code:

Withdrawal
If the uploaded research object must later be withdrawn, the reason for the withdrawal will be indicated on a tombstone page, which will henceforth be served in its place. Withdrawal is considered an exceptional action, which normally should be requested and fully justified by the original uploader. In any other circumstance reasonable attempts will be made to contact the original uploader to obtain consent. The DOI and the URL of the original object are retained.

There is a great guide on GitHub for how you can make your code citable and archive a repository with Zenodo.