We asked 272 bioinformaticians…name something that makes you angry: more reflections on the poor state of software documentation.

I'd like to share the details of a recent survey conducted by Nick Loman and Thomas Connor that tried to understand current issues with bioinformatics practice and training.

The survey was announced on twitter and attracted almost 300 responses. Nick and Tom have kindly placed the results of the survey on Figshare so that others can play with the data (it seems fitting to talk about this today as it is International Open Access Week):

When you ask a bunch of bioinformaticians the question What things most frustrate you or limit your ability to carry out bioinformatics analysis? you can be sure that you will attract some passionate, and often amusing, answers (I particularly liked someone's response to this question "Not enough Heng Li").

I was struck by how many people raised the issue of poor, incomplete, or otherwise terrible software documentation as a problem (there were at least 42 responses that mentioned this). The availability of 'good documentation' was also listed as the 2nd most important factor when choosing software to use.

I recently wrote about whether this problem is something that really needs to be dealt with by journals and by the review process. It shouldn't be enough that software is available and that it works, we should have some minimal expectation for what documentation should accompany bioinformatics software.

Keith's 10 point checklist for reviewing software

If you are ever in a position to review a software-based manuscript, please check for the following:

  1. Is there a plain text README file that accompanies the software and which explains what the program does and who created it?
  2. Is there a comprehensive manual available somewhere that describes what every option of the program does?
  3. Is there a clear version number or release date for the software?
  4. Does the software provide clear installation instructions (where relevant) that actually work?
  5. Is the software accompanied by an appropriate license?
  6. For command-line programs, does the program give some sensible output when no arguments are provided?
  7. For command-line programs, does the program give some sensible output when -h and/or --help is specified (see this old post of mine for more on this topic)?
  8. For command-line programs, does the built-in help/documentation agree with the external documentation (text/PDF), i.e. do they both list the same features/options?
  9. For script based software (Perl, Python etc.), does the code contain a reasonable level of comments that allow someone with relevant coding experience to understand what the major sections of the program are trying to do?
  10. Is there a contact email address (or link to support web page) provided so that a user can ask questions and get more help?

I'm not expecting every piece of bioinformatics software to tick all 10 of these boxes, but most of these are relatively low-hanging fruit. If you are not prepared to provide useful documentation for your software, then you should also be prepared for people to choose not to use your software, and for reviewers to reject your manuscript!