Lecture Course: "Data analysis: Statistical principals and computational methods"

Instructors: · Ingo Roeder · Lars Kaderali · Carsten Rother

When and Where Summer Semester 2015

Start date: 15. 4. 2015

Lecture: Wednesday, 9:20 - 10:50 (2. DS) - room APB-E001
Exercise: Wednesday, 11:10 - 12:40 (3. DS) - room APB-E006 (bring your laptop)

Synopsis:
This course will cover an introduction into key principles of statistical data analysis complemented with an overview of different computational methods. It is the aim to provide a solid basic knowledge that will allow to quickly learning and applying different, more sophisticated statistical and bioinformatical data analysis techniques at later stages of your education.

Specifically, the course will address general statistical principles (e.g. statistical significance, maximum-likelihood/least square principles, correlation/causality concepts, Bayesian calculus) as well as the basics of statistical modeling (e.g. multivariate linear regression or generalized linear models). Building upon these principles you will get to know different machine learning strategies. Here the course will combine a presentation of numerical and optimization algorithms with a number of application examples from biology and medicine.

Prerequisites:

  • Calculus,
  • Vector algebra,
  • Probability (Mathematische Methoden für Informatiker) [NF-B-120, Modul 110]

Format:
4 SWS (2V+2U)

Exam
written

Special remarks:
The course is part of the Computational Biology minor program.

Lecture Materials:
See here and here.

Syllabus:

  • Statistical principles and paradigms
    • Concept of statistical significance
    • Frequentist versus Bayesian statistics
    • Maximum likelihood and least square principles
    • Correlation and causality concepts
  • Statistical models
    • General linear model and its application (e.g. multivariate lin. regression, ANOVA)
    • Generalized linear model and its application (e.g. logistic regression, linear mixed-effect models)
  • Numerical methods
    • Gradient descent, simulated annealing, evolutionary algorithms
  • Unsupervised methods
    • Principal component analysis, independent component analysis
    • Clustering (e.g. kNN, hierarchical)
  • Supervised methods
    • Linear separation, the perceptron, support vector machines, decision trees/cart, regression
  • Optional topics (if time permits)
    • Expectation-maximization algorithm, Bayesian decision theory, regularization, bias-variance tradeoff, cross-validation, bagging, boosting, jackknife, bootstrap