Blast FilterThe purpose of this tool is:
- to parse a standard blast output (text format from NCBI or Wu Blast),
- to filter the HSPs which fulfill given constraints,
- to display extracted or computed HSP data into a tabular format
Data extracted from each HSP are values you can normally read from the blast output (query accession or name, e-value, score, identities, etc...)
Some max and min values are computed for each HSP from the extracted data listed above.
These min, max can be used further to select HSP that have their own values close to these min or max.
Some other values are computed for what we call Global HSP (GHSP).
A global HSP (GHSP) gather all HSPs available for the triplets (Query,Subject,+/- strand):
- Qidentities is the number of identities for the GHSP
- Qpositives is the same, but for positives
- Qstart and Qend are coordinates of the GHSP on the Query
- Qhsp_length is the length of the GHSP
Special rli function
The rli is a boolean computed data which denotes the existence between the query and the subject sequences of a conserved segment of length W with a percent of identities I.
Syntax of rli function is: rli($alignment, W, I) where W and I are respectively the Window size and the percent of Identities.
Filtering with constraints
Each data can be filtered if a constraint is specified.
A constraint is a valid perl expression where names of variables are the extracted or computed data names (except for rli). Thus if you want to select HSP that have their e-value greater that 1e-3, just type:
$e_value >= 1e-3
All entered constraints are combined together with a AND logical operator (&& in perl). If you want to use more sophisticated expresions you can use the free constraint text box.
Operators for numeric comparison are: < <= == != >= >
Operators for text comparison are: lt le eq ne ge gt (le stand for less or equal)
Logical operators: ! (not) && (and) || (or)
Pattern matching: =~ (looks like) !~ (doesn't look like) (nb: you need to read a little bit mor about regular expressions).
- Looking for conserved query sequence
- The purpose is to find sequences that are more or less conserved, i.e. an HSP with identities more or less equal to the length of the query sequence (say more than 75%). Thus the constraint could be:
$identities >= ($query_length * 0.75)
This works if the alignment is not split over several HSPs for the same subject sequence.
- Looking for conserved query sequence, even if the alignment is spread over several HSPs.
- Just use Qidentities instead of identities: $Qidentities >= ($query_length * 0.75)
- Looking for conserved sequences
- Here we want that identities are more or less equal to the length of the shortest query or subject sequence.
$identities >= (($query_length < $subject_length ? $query_length : $subject_length) * 0.75)
The expression ( Cond ? X : Y) return X if Cond is fulfilled, Y otherwise, thus here it gives the shortest length.
- Looking for best nr hit from blastx output
- Hits will have to fulfill:
- more than 50% of the length of the query sequence if conserved
- the suject sequence comes from RefSeq
- the subject sequence is neither predicted, not hypothetical
- the E value is greater than 1e-3
$Qpositives >= ($query_length * 0.5) && $subject_name =~ /^ref/ && $subject_def !~ /predicted/i && $subject_def !~ /hypothetical/i && $e_value <= 1e-3
Instead of using the checkboxes to specify data to be displayed, you can type the display command in the "custom display" textarea.
The syntax of this command is: display("\t",display_data_list)
where display_data_list is a comman delimited list of display_data (one at least)
and display_data is a valid expression using extracted or computed data from Blast output together with operators as seen before.
Output is a table and data which have their checkbox checked will be displayed, no matter if a constraint has been specified or not for it.
Each line in the table corresponds to a selected HSP.
Detailed info on data
Could be found here.
Blast Filter Results
This CGI is based on two perl scripts, one using BioPerl to parsed Blast output, and the second one to filter and reformat the parsed output.
Both scripts are "free" and can be obtained on demand at sigenasupport(at)jouy.inra.fr