Charles Explorer logo
🇬🇧

Introduction to Machine Learning with R

Class at Faculty of Mathematics and Physics |
NPFL054

Syllabus

Machine learning - basic concepts, examples of practical applications, theoretical foundations. Supervised and unsupervised learning. Classification and regression tasks. Classification into two, or more classes. Training and test examples. Feature vectors. Target variable and prediction function. Machine learning development process. Curse of dimensionality. Clustering.

Decision tree learning. Learning algorithm, splitting criteria and pruning. Random forests.

Linear and logistic regression. Least squares methods. Discriminative classifiers.

Instance-based learning. k-NN algoritmus.

Naive Bayes classifier. Bayesian belief networks.

Support Vector Machines. Large and soft margin classifier. Kernel functions.

Ensemble methods. Unstable learning algorithms. Bagging and boosting. AdaBoost algorithm.

Parameters in machine learning. Hyperparameters tuning. Searching parameter space. Gradient descent algorithm. Maximum likelihood estimation.

Experiment evaluation. Working with development and test data. Sample error, generalization error. Cross-validation, leave-one-out method. Bootstrap method. Performance measures. Evaluation of binary classifiers. ROC curve.

Statistical tests. Statistical hypotheses, one-sample and two-sample t-tests, chi-square tests. Significance level, p-value. Using statistical tests for classifier evaluation. Confidence intervals.

Overfitting. How to recognize and avoid. Regularization. Bias-variance decomposition.

General principles of feature selection. Feature selection using information gain, greedy algorithms.

Dimensionality reduction, Principal Component Analysis.

Foundations of Neural Networks. Single Perceptron, Single Layer Perceptron. The architecture of multi-layer feed-forward models and the idea of back-propagation training. Remarks on deep learning.

Annotation

Lectures cover both theoretical background and practical algorithms of Machine Learning (ML). The emphasis is placed on comprehensive understanding of the ML process, which includes data analysis, choice of ML algorithm, learning parameters tuning, statistical evaluation and model assessment.

Lab sessions aim at practical experience with ML tasks using existing R libraries. Homework assignments are practical exercises using R.

The last assignment is the most extensive and includes comprehensive processing of a typical, not very demanding problem, and writing a report on solution variants and t