Lecture Course: "Discrete Algorithms for Computational Biology"

Instructors: · Gene Myers · Michael Schroeder · Michael Hiller

Synopsis:
This course will survey the range of computational problems arising in the analysis of genomic data and the principal algorithmic techniques used to solve them. The course is aimed at computer scientists that have had a course in data structures and elementary algorithmic concepts (e.g. order notation, divide-and-conquer, priority queue, etc.)

At the end of the course, one should in principle, (a) be able to design a dynamic programming solution to an appropriate problem, (b) have a solid understanding of sequence comparison and Hidden Markov Model methods, and (c) use this knowledge to rapidly be able to understanding and effectively use all the existing tools for sequence-based bioinformatics and to program the customized parts often needed in any given bioinformatic investigation.

Prerequisites:

  • Einführung in die Mathematik für Informatiker [INF-D/B-110]
  • Mathematische Methoden für Informatiker [INF-D/B-120]
  • Algorithmen und Datenstrukturen [INF-D/B-210]
  • Programmierung [INF-D-230, INF-B-240]

Format:
4 SWS (2V+2U)

Exam:
written

Special remarks:
The course is part of the Computational Biology minor program.

Syllabus:

  • Week 1 : A Quick Primer on Molecular Biology and Techniques ( (Myers)
    The central dogma, recombinant DNA technology, hybridization, probes, PCR, etc.
  • Week 2-5 : Sequence Comparison, theory and practice (Schroeder)
    The basic dynamic programming algorithm, gap cost variations, extension to patterns.
    Acceleration: indexing, filtration methods, FASTA and BLAST as examples.
    Multi-sequence alignment: scoring schemes, greedy/DCA/MSA/round-robin heuristics.
  • Week 6-9 : Gene Finding (Myers)
    Approaches: statistical, homology-based, Bayesian via Hidden Markov Models.
    Hidden Markov Models (HMMs): Viterbi and forward/backward algorithms
  • Week 10-13 : Phylogeny (Hiller)
    Jukes-Cantor model, maximum-likelihood method, distance-based methods, neighbor-joining, HMMs. Genome rearrangements
  • Week 14 : Optional Topics (per instructor and time permitting)
    RNA Secondary Structure: Definitions, Scoring schemes, dynamic programming approaches.
    Motif Finding: Repeat finding. Promoter and enhancer recognition. Signal peptide recognition.
    Genotyping: Basic genetics, haplotype determination, haplotype blocks, forensic identification.
    Genome Sequence Assembly: Technology overview. Overlap-layout-consensus paradigm. Approaches.