Some brief thoughts on developing and supporting command-line bioinformatics tools

A student from a class I teach emailed me recently to ask for help with a bioinformatics problem. Our email exchange prompted him to ask:

…given that so many of the command line tools are in such a difficult state to use, why would biologists start using the command line? The bioinformatics community ought to be not only supply more free easy-to-use tools, but more such tools that work via the command line. Command line tools should be installable without having to compile the program, for example

I sympathize with the frustration of everyone who’s tried, and subsequently failed, to install and/or correctly use a piece of command-line driven bioinformatics software. I especially think that lack of documentation — not to mention good documentation — is a huge issue. However, I also sympathize with the people who have to write, release, and support bioinformatics code. It can be a thankless task. This was my reply to the student:

I agree with just about everything you have said, but bear in mind:

  1. Writing code is fun; writing documentation (let alone ‘good’ documentation) is a pain and often receives no reward or thanks (not that this is an excuse to not write documentation).
  2. So much software becomes obsolete so quickly, it can be hard to be motivated to spend a lot of time making something easy (or easier) to use when it you know that it will be supplanted by someone else’s tool in 12 months time
  3. Providing ‘free’ tools is always good, but sometime has to pay for it somewhere (at least in time spent working on code, writing documentation, fixing bugs etc).
  4. Providing tools that don’t require compilation means someone would have to make software available for every different flavor of Unix and Linux (not to mention Windows). It can be hard enough to make pre-compiled code work on one distribution of Linux. Software might never be released if someone had to first compile it for multiple systems. In this sense, it is a good thing that source code can be provided so at least you can try to compile yourself.
  5. A lot of software development doesn’t result in much of the way of publication success. People sometimes try publishing papers on major updates to their software and are rejected by journals (for lack of novelty). Without a good reward in the form of the currency that scientists need (e.g. publications) it is sometimes hard to encourage someone to spend any more time than is necessary to work on code.
  6. People that use freely developed software are often bad at not citing it properly and/or not including full command-line examples of how they used it (both of which can end up hurting the original developer through lack of acknowledgement). It is great when people decide to blog about how something worked, or share a post on forums like SeqAnswers.
  7. Many users never bother to contact the developer to see if things could be changed/improved. Some developers are working in the dark as to what people actually need from software. In this sense, improving code and documentation is a two-way street.
  8. To a degree, the nature of the command-line means that command-line tools will always be more user-unfriendly than a nice, shiny GUI. But at the same time, the nature of command-line tools means that they will always be more powerful than a GUI.