The Bloomsbury Centre for Bioinformatics

The Bloomsbury Centre for Bioinformatics

Director: Professor David Jones

Top_beads

                                                       

 

This course covers the core Ensembl databases and Application Programme Interfaces. This course is suitable for bioinformaticians and computer science researchers with experience in perl and object-oriented programming.

Course tutors: Patrick Meidl (Core) and Javier Herrero (Compara), EBI

Ensembl uses MySQL relational databases to store its information. A comprehensive set of Application Programme Interfaces (APIs) serve as a middle-layer between underlying database schemes and more specific application programmes. The APIs aim to encapsulate the database layout by providing efficient high-level access to data tables and isolate applications from data layout changes.

This 2-day workshop is aimed at developers interested in exploring Ensembl beyond the website. Participants will be expected to have experience in writing Perl programs and a background in object oriented programming techniques. Being familiar with databases (MySQL) would be an advantage.

The workshop covers various Ensembl databases and APIs. For each of them the database schema and the API design as well as its most important objects and their methods will be presented. This will be followed by practical sessions in which the participants can put the learned into practice by writing their own Perl scripts.

Ensembl Core databases and API
The set of species-specific Ensembl Core databases stores genome sequences and most of the annotation information. This includes the gene, transcript and protein models annotated by the Ensembl automated genome analysis and annotation pipeline. Ensembl Core databases also store assembly information, cDNA and protein alignments, external references, markers and repeat regions data sets.

Ensembl Compara database and API
The Ensembl Compara multi-species database stores the results of genome- wide species comparisons re-calculated for each release. The comparative genomics data set includes whole genome alignments and synteny regions. The comparative proteomics data set contains orthologue predictions, paralogue predictions and protein family clusters.

Note that the Variation database course will be held at a later date.

Lecture Slides
Slides for the course held 8/9 November are available. Download the PDF (1.1Mb)

spacer

Registration

Anyone interested in attending the course should contact BCB admin (admin@bcb.ac.uk).

small_circles

UCL logo The Bloomsbury Centre for Bioinformatics birkbeck logo