Formation/Cours

Computation System Biology

Etablissement : EDN – Ecole Du Numérique

Langue : Anglais

Formation(s) dans laquelle/lesquelles le cours apparait :

Période : S4

Prérequis

Objectif(s)

Contenu

Basics concepts of statistics including Probability Theory and Generalized Linear Models.
Basic knowledge of multi-omics data, including transcriptomics, genetics and metagenomics.
Basic knowledge of machine learning methods, including different learning tasks and core concepts of predictive modelling (bias-variance trade-off, cross-validation, feature selection and hyperparameter tuning)
Advanced skills in programming, including either Python or R

1- Understanding the core concepts of probability and statistics for the analysis of biological data. The student is expected to understand the traditional approaches to estimate parameters and the related inferential concepts across a large panel of common statistical distributions used for the analysis of biological data.

2- Understanding the traditional inferential methods and their implementation into bioinformatics tools for the analysis of transcriptomics, genetics and metagenomics data. Student is expected to understand which method is the most appropriate for a specific question, while able to interpret results adequately.

3- Understanding the standard approaches and methods for integrating multi-omics data within predictive models. The student will be expected to understand the pros and cons of each methods or strategies and being able to apply them properly depending on the question.

Core concepts of Statistics:
– Usual probability laws

o Normal distribution

o Binomial distribution

o Poisson

o Negative Binomial

– Inferential vs Predictive statistics
– Parametric vs non-parametric statistics
– Frequentist Statistics

o Limit central theorem, OLS and ML Estimation

o P-values, confidence intervals, and multiple comparisons

o Examples

– Bayesian Statistics

o Bayes Theorem, prior, likelihood and posterior

o Gibbs sampling, (H)MCMC, and VB

o Bayes Factor and Credible Intervals

o Examples

– Validation Quiz

Standard Statistical Models (4 hours CM, 2 +2 TD/TP)

– Linear Regression

o Ordinary Least Squares and Maximum Likelihood for parameter estimation

o Hypothesis testing

o Model assumptions

o Examples in R and Python

– Generalized Linear Regression

o Logistic Regression

o Poisson Regression

o Negative Binomial Regression

o Newton Raphson for parameter estimation

o Examples in R and Python

– Linear Mixed Models

o ML and REML for parameter estimation

o Model interpretation

o Hypothesis testing

o Example in R and Python

– Validation Quiz

Specific aim: The student should understand the difference between common statistical approaches, identify standard techniques for estimating model parameters and properly interpret models commonly encountered in practice.

Analysis of RNA-Seq data
– Count Data, overdispersion and heteroskedasticity
– Data Normalizations and batch effects
– Standard Analytical Workflows
– Differential Analysis

o DESeq2

ï‚§ Model Characteristics
ï‚§ Model Interpretation & Vizualisation
ï‚§ Extensions to longitudinal designs

o Examples

– Validation Quiz
Specific aim: the student should understand and implement the DESeq2 model while interpret the results properly.
Analysis of Genetics data
– Structures of Genetics data

o Whole Genome vs Whole Exome sequencing data

ï‚§ Coding vs non-coding
ï‚§ Epigenetics mechanisms of gene transcription

o Mutations, SNPs and allele dosage

o Hardy-Weinberg Equilibrium

o Linkage (Des)Equilibrium

o Population Structures: ancestry and cryptic relatdness

– GWAS

o Core concepts: Heritability, Genetic Variance, and Multiple Comparisons

o Continuous phenotypes

o Dichotomous phenotypes

o Examples in R and python

– Rare-Variant Association tests

o Burden test

o SKAT

o SKAT-O

o P-value combinations

o Examples

– Advanced concepts: Family-based studies, Functional annotations, Fine-mapping
– Validation Quiz
Specific aim: the student should understand the core concepts of genetics, able to distinguish, implement and interpret methods for common and rare variants.

Metagenomics:

– Compositional Data

o Data normalizations

o Statistical models for compositional data

o Differential Analysis

ï‚§ ANCOM-BC

o Co-occurrence networks of species

ï‚§ Spiec-easi

o Differential co-occurrence networks of species

ï‚§ MDiNE

o Examples

Specific aim: the student will be more familiar with metagenomics data and should able to distinguish the different models for differential analysis, and (differential) co-occurrence networks of species.
Reminder