Teaser: a solution for our read mapping dilemma?

A paper recently published in Genome Biology by Smolka et al. may offer some help to the problem of choosing which read mapping program to use in order to align a set of sequencing reads to a genome:

The paper starts by neatly summarising the problem:

Recent and ongoing advances in sequencing technologies and applicationslead to a rapid growth of methods that align next generation sequencing reads to a reference genome (read mapping). By mid 2015, nearly 100 different mappers are available, although not all are equally suited for a given application or dataset.

The program Teaser attempts to automate the benchmarking of not just different mappers, but also (some of) the different parameters that are available to these programs. The latter problem should not be underestimated. The Bowtie 2 program describes almost 100 different command-line options in its documentation and many of these options control how Bowtie runs and/or what output it generates.

Teaser uses small sets of simulated read data, leading to very quick run times (< 30 minutes for many comparisons), but you can also supply real data to it. By default, Teaser will test the performance of five read mapping programs: BWA, BWA-MEM, BWA-SW, Bowtie2, and NextGenMap.

Impressively, you can run Teaser on the web as well as a standalone program. The web output includes results displayed graphically for many different test datasets (x-axis):

The paper concludes by asking the community to submit optimal parameter combinations to the Teaser GitHub repository

Teaser is easy to use and at the same time extendable to other methods and parameters combinations. Future work will include the incorporation of benchmarking RNA-Seq mappers and variant calling methods. We furthermore encourage the scientific community to contribute the optimal parameter combinations they detected to our github repository (available at github.com/Cibiv/Teaser) for their particular organism of interest. This will help others to quickly select the optimal combination of mapper and parameter values using Teaser.

I can't wait for the companion program Firecat!

 

2015-10-26 11.05: Updated to remove specific references to software versions of mapping tools.


Help us do science! I’ve teamed up with researcher Paige Brown Jarreau to create a survey of ACGT readers. By participating, you’ll be helping me improve ACGT and contributing to the SCIENCE on blog readership. You will also get FREE science art from Paige's Photography for participating, as well as a chance to win a t-shirt and other perks! It should only take 10–15 minutes to complete.

You can find the survey here: http://bit.ly/mysciblogreaders

The Bioboxes paper is now out of the box! [Link]

The Bioboxes project now has their first formal publication, with the software being described today in the journal GigaScience:

I love the concise abstract:

Software is now both central and essential to modern biology, yet lack of availability, difficult installations, and complex user interfaces make software hard to obtain and use. Containerisation, as exemplified by the Docker platform, has the potential to solve the problems associated with sharing software. We propose bioboxes: containers with standardised interfaces to make bioinformatics software interchangeable.

Congratulations to Michael Barton, Peter Belmann, and the rest of the Bioboxes team!

 

Updated 2015-10-15 18.18: Added specific acknowledgement of Peter Belmann.

New paper provides a great overview of the current state of genome assembly

The following paper by Stephen Richards and Shwetha Murali has just appeared in the journal Current Opinion in Insect Science:

Best practices in insect genome sequencing: what works and what doesn’t

In some ways I wish they had chosen a different title as the focus of this paper is much more about genome assembly than genome sequencing. Furthermore, it provides a great overview of all of the current strategies in genome assembly. This should be of interest to any non-insect researchers interested in the best way of putting a genome together. Here is part of the legend from a very informative table in the paper:

Table 1 — De novo genome assembly strategies:
Assembly software is designed for a specific sequencing and assembly strategy. Thus sequence must be generated with the assembly software and algorithm in mind, choosing a sequence strategy designed for a different assembly algorithm, or sequencing without thinking about assembly is usually a recipe for poor un-publishable assemblies. Here we survey different assembly strategies, with different sequence and library construction requirements.