BLAST bug (or feature?) in NCBI BLAST v2.2.30+
Something changed in the latest version of NCBI BLAST+ which breaks our CEGMA software. Compare the behavior of this simple TBLASTN command in v2.2.29+ and v2.2.30+ (from October 2014):
v2.2.29+
tblastn -db sample.dna -query sequence.prot -word_size 5
TBLASTN 2.2.29+
Database: sample.dna
1 sequences; 2,499,950 total letters
Query= 7292122___KOG0292
Length=1234
Score E
Sequences producing significant alignments: (Bits) Value
CHROMOSOME_I 1 15072418 38.9 0.002
v2.2.30+
tblastn -db sample.dna -query sequence.aa -word_size 5
BLAST query/options error: Compressed alphabet lookup table requires word size 6 or 7
Please refer to the BLAST+ user manual.
tblastn -db sample.dna -query sequence.prot -word_size 5
TBLASTN 2.2.29+
Database: sample.dna
1 sequences; 2,499,950 total letters
Query= 7292122___KOG0292
Length=1234
Score E
Sequences producing significant alignments: (Bits) Value
CHROMOSOME_I 1 15072418 38.9 0.002
tblastn -db sample.dna -query sequence.aa -word_size 5
BLAST query/options error: Compressed alphabet lookup table requires word size 6 or 7
Please refer to the BLAST+ user manual.
One step in the CEGMA pipeline involves running TBLASTN with a word size of 5. This no longer works in the latest version and the error message suggests that only a word size of 6 or 7 is permitted. I can confirm that this is the case by looking at the latest source code for the blast_option.c file:
else if (options->lut_type == eCompressedAaLookupTable &&
options->word_size != 6 && options->word_size != 7) {
Blast_MessageWrite(blast_msg, eBlastSevError, kBlastMessageNoContext,
"Compressed alphabet lookup table requires "
"word size 6 or 7");
return BLASTERR_OPTION_VALUE_INVALID;
}
The error message suggests I look at the BLAST+ user manual. I did this, and according to Table C5:
tblastn application options:
option = word_size
type = integer
default value = 3
description and notes = "Valid word sizes are 2-7."
There also seems to be no mention of this change in the release notes, all of which makes me think that this is a bug. So I will report this to the NCBI, but any CEGMA users out there may wish to hold off updating to v.2.2.30+.