The aim of the assembly process is to produce contigs from the partial cDNA sequences. Because of the important number of sequences to be assembled this is done in a two step process. The first step builds clusters of sequences sharing at least 75 bp at an identity rate of 96% (used tool : MegaBlast). The second step constructs coherent contigs from the previous clusters (used tool : CAP3). This last step creates also the consensus sequences of the pseudo-genes.
In order to provide as much information as possible about the contigs, similarity searches are performed. This is done using different databases with which the contig sequences are compared. The used databases are :
Once the contigs built and their annotation computed, all data is loaded in a locally adapted Ensembl database. A graphical user interface permits to visualize the data with different views:
Graphical view - This view gives a graphical overview of the contig structure and similarity annotations. Each sequence (EST) or similarity feature is represented as a line (Ensembl name : GlyphSet). The color of the line gives you an indication about the type of sequence. The lines are described on the left of the panels. When you move your mouse pointer above a line a title will appear.If you click on a line, a menu will appear. If you move your pointer on the menu, you will be able to access the items. The lines are grouped by families.
Features view - The graphical synthetic view of the contig is very helpful to get a general idea, but it is rather painful to go over every element with the pointer to get a better knowledge of each element. Therefore we have built this new view which gives on one page all the information about the different similarity features. The features are presented in separated tables with at that time the ID and the definition. Some of the ID are clickable and link to the page of the given element. The tables of features have a title located on the up left hand side.
Assembly view - The graphical synthetic view of the contig does not give information about the quality of the alignment and about the mismatches. Therefore a new view has been added. This view presents the alignment of the sequences on the consensus. and clicking the 'Hide' button shows you all the mismatches between the sequences and the consensus.
Polymorphism view - This is what you get once you have clicked on the 'Hide' button of the Assembly view. Common bases between sequences and consensus are hidden to let appear only mismatches.
Blast / Blat - These tools permit to align contigs against different nucleic and proteic databases or genomes located on our server.
The Sigenae Contig Browser project provides several data sets. We aim however to also provide several routes of access to these data sets.
Entry Points - Data sets can be searched by contig name or SNP name using Search bar on the up right hand side.
Large-Scale Exports - Use BioMart the large-scale data mining tool for more advanced data queries. The tool allows you to export the data in HTML, Microsoft Excel and various text formats suitable for import into other database or analysis systems. More...
Small-Scale Exports - For each species the Sigenae Contig Browser provides a dedicated ExportView page. This data display is especially designed for small-scale data exports in HTML, text or zipped format.
All consensus sequences generated by the Sigenae Contig Browser project are freely available to download from the "Download data" page. More...