Charles Explorer logo
🇬🇧

Genome-oriented applications in plant evolution

Class at Faculty of Science |
MB120C54

Syllabus

Syllabus:

* Preliminary schedule:

Week 1 Introduction to Unix

Week 3 Common Data Types & Tools Part 1: common genomic data types and tools.

Week 5 Common Data Types & Tools Part 2: HPC computing, package managers, containers, and pipelines

Week 7 Python Basics

Week 9 In-class Presentations and assessment

Week 11 Code and pipeline reproducibility

(2-3 hour blocks every two weeks for better immersion.)

Annotation

Annotation:

This is a practical, plant genomics-based computational course taking students from beginning command line scripting and basic HPC cluster computing to pipeline reproducibility and container management, using real data from current research.

Participants will develop basic command line skills in Unix to manipulate data types commonly used in genomic studies. They will learn how to run scripts from others as well as write and develop their own. The module will cover Unix, HPC cluster computing, reproducibility, and basic python scripting and provide students with the tools to develop their own code and pipelines. Participants will work within interactive coding environments to maintain reproducibility in code and practice. By the end of the module, students will be equipped with the practical skills to write simple code to address diverse biological problems.

Learning Aims:

This module aims to introduce fundamental concepts of programming and data manipulation to students without prior experience of the command line. Participants will learn how to write useful, simple code in an interactive environment. Sessions will introduce fundamental programming concepts and approaches. Participants will then have sessions of directed learning to practice and develop the skills required to complete a series of programming challenges.

Learning Outcomes:

• Competence in the Unix environment, command line operations and scripting.

• Knowledge of information sources and guidance for solving common problems in computational biology.

• The ability to install and run scripts within an appropriate environment and install necessary co-requisites.

• Knowledge of how to packages to solve common problems in programming.

• Understanding how to use environments (e.g. Anaconda-based) and package managers for code portability.

• Knowledge of typical biological data file formats and ability to write new files or edit existing files in place.

• The ability to write simple scripts, employ the concepts of class-oriented programming and develop simple applications to reproducibly solve common problems.

The core of the work is thus hands-on practical experience with analysis of empirical data, supervised by Professor Yant - the format is approximating real work on own data generated during their independent research work (e.g. during a Masters or PhD project). Sample datasets will be provided, an analysis of own data in the project work is possible and welcomed but not required. Previous experience with scripting is welcome but not required.

Every student selects a topic and will run a project: i.e. will process a provided sample/own dataset using the presented tools and share their results with others during a short presentation. Credits will be given for the presentations and simple programming challenges.

This course will be held in English.