Oligo Set Annotation
Previous links give access to oligo set annotations, generated by the sigReannot annotation pipeline built by the Sigenae Team.
Annotations can be searched and extracted using biomart or versioned files can be directly downloaded.
The pipeline is made of four steps:
Alignment with Ensembl transcripts, Ensembl NCRNA and genome assembly.
Specificity analyses: it tells us how specific a probe is, if this probe has a risk of cross-hybridization or if it hybridises on different transcripts. We consider a probe hitting the transcript set with more than 85% of identity is a “good hit” and a probe with a stretch of complementary sequence > 20 will give noise.
We comes to split our probes into 7 categories:
1: 1 good hit and no noise
2: 1 good hit with noise
3: no hit, 1 noise >= 30bp
4: no hit, 1 noise >= 20bp & < 30bp
5: no hit many noise
6: no good hit no noise
7: many good hits.
Annotation of probe related to one and only one gene: to get a relevant annotation we keep category 1,2,3 and 4.
First, thanks to the Ensembl API and to genes related to probes we can fetch orthologous genes, GO and cross reference gene identifiers.
Then, thanks to human orthologous HGNC identifers and KEGG data we can find which KO and pathway is related to which probe.
Here is a short description of annotation files:
Oligo Set Description file (in pdf format) is a two pages long description giving general information about the oligo set, short report about procedure
and figures about the whole annotation results.
General Annotation file is a table which gives generic annotation data like genomic position, gene name, orthologous genes in Human, Mouse, Rat ...
GO Annotation file is a table which gives relationships between oligos and Gene Ontology terms through oligo - gene name correspondences.
GO Matrices files are "matrices of correspondence" between oligos and GO categories for each ontologies (Biological Process, Cellular Component and
KEGG Pathway file is a table which gives relationships between oligos and KEGG pathways through oligo - human orthologous HGNC identifiers correspondences.
KEGG Gene file is a table which gives relationships between oligos and KEGG genes always through oligo - human orthologous HGNC identifiers correspondences.
Non obvious column descriptors in the General Annotation file:
origin: by default, the origin column is set to 'exon', this means the annotation comes from Ensembl transcript. Some extra annotations are fetched thanks to genomic sequence (origin = gene) or ESTs (origin = est). It retrieves hit in an UTR area or in an intron part according to Ensembl.
reverse_transcript: the potential hybridization on the opposite strand is resolved thanks to the "reverse_transcript" column. Empty or NULL field means the good hit is on the transcript strand, '1' means the only good hit is on the reverse transcript strand, and '2' means there is a good hit on the transcript strand but also on the opposite one. We choose to consider this later case ('2') as not to be a multi (category 7), the gene given is the one from the transcript strand.