1. Orientation in data resources: sequence databases, searching, downloading. Genome sites and other "added value" resources. Tools for data access and manipulation.
2. Basic handling of sequence data. Identification of relevant sequence portions, file formats, reformatting utilities. Restriction site analysis and translation of DNA sequences.
3. Sequence similarity searches. Methods - BLAST, FASTA. Theoretical principles. Scoring matrices (PAM, BLOSUM). Special implementations of BLAST.
4. Motif searches and domain structure analysis. SMART, PROSITE and similar resources. Searching for specific signals (protein - localisation and degradation, DNA - binding sites). Pattern searches, pattern development.
5. Gene identification and gene building: algorithmic searching for coding sequences and intron/exon structures in chromosomal DNA.
6. Construction and interpretation of sequence alignments. Automated vs. manual methods (CLUSTAL vs. MACAW). Use of EST alignments for verifying gene structure predictions. Construction of protein sequence alignments and derived sequence patterns or profiles.
7. Phylogenetic trees: construction and critical interpretation. Problem of meaningful data selection. The PHYLIP package and its use.
The course aims towards acquisition of basic skills required for searching and analysis of biological sequence data with the aid of commonly available (web-based or freely downloadable) tools.