Charles Explorer logo
🇬🇧

Statistical Methods in Data Mining Systems

Class at Faculty of Mathematics and Physics |
NDBI031

Syllabus

Data mining, which exists as a separate area at the overlap between mathematics and computer sience since the early nineties, relies methodologically on machine learning, statistics, and the theory of databases. Whereas machine learning and database methods are covered by other lectures, the present lecture is the first of two dealing with the connection between data mining and statistics. It reviews statistical methods implemented in key examples of three main kinds of commercial data mining system, as well as in one academic systems used in teaching data mining at several Czech universities, including ours. This lecture is freely continued by the summer term lecture DBI029: Statistical aspects of data mining.

- Data mining and its connection to statistics

- Main types of data mining systems

- Statistical methods in Clementine, an example of a general data mining system

- Statistical methods in DecisionSite, an example of a system for on-line decision support by means of data mining

- Matlab as an example of a more universal system including data mining methods

- Descriptive statistics in Matlab

- Linear regression and its generalizations in Matlab

- Multivariate statistical analysis in Matlab

- Hypotheses testing in Matlab

- 4FT-Miner - an academic data mining system combining observational logic and the analysis of four-fold tables

- Quantifiers of observational logic based on parameter estimation

- Quantifiers of observational logic based on hypotheses testing

Annotation

Data mining relies methodologically on machine learning, statistics, and the theory of databases. This is the first of two lectures dealing with its connection to statistics.

It reviews statistical methods implemented in key examples of three main kinds of commercial data mining system, as well as in one academic systems used in teaching data mining at several Czech universities, including ours. This lecture is freely continued by the summer term lecture NAIL105 Internet and Classification Methods.