cDNA Set Annotation Files
Previous links give access to cDNA set annotation files. These files are produced by the following procedure (next step was applied to accessions which do not fulfill previous one):
find number accessions linked with Ensembl Gene Id in Ensembl database (accession located on genome region containing a gene; location could be found in Ensembl or UCSC database),
blastn fasta sequences against species specific Ensembl transcripts (downloaded from Ensembl ftp server),
blastn fasta sequences against RefSeq RNA.
Two annotation files are generated. First one presents annotation data from Ensembl db. Second one presents annotation data from RefSeq RNA.
In annotation files, column number 2 contains information about the link between accession numbers and Ensembl Gene Id or RefSeq Id (third column):
LINKED_TO means that the association comes from Ensembl db (accession located in corresponding gene region),
xx:xx:xx gives blastn statistics about alignment between accession and Ensembl Gene or RefSeq sequence: first number is the query (accession) percent coverage: length of the query covered by HSP(s), second one is the hit (Ensembl Gene or RefSeq sequence) percent coverage: length of the hit covered by HSP(s), third one is the identity percent.
AJ816402 100:50:98.4 ENSBTAG00000000072
means that the similarity regions(s) or HSP(s) between AJ816402 and ENSBTAG00000000072 covered 100% of AJ816402 and 50% of ENSBTAG00000000072 and this or these HSP(s) show a global identity percent equal to 98.4%.
IDs absent from annotation lists are possibly not accession numbers (no way to find fasta sequence to perform blast) or do not match significantly with Ensembl transcripts or RefSeq RNA entries.
Blastn against Ensembl transcripts are performed using e-value cutoff equal to 1e-10.
Blastn against RefSeq are performed using e-value cutoff equal to 1e-5.
All other data (Entrez ID, HGNC, Localization ...) are extracted from the Ensembl database using BioMart with Ensembl Gene ID as entry point.