Charles Explorer logo
🇬🇧

Streaming algorithms for Big Data

Class at Faculty of Mathematics and Physics |
NTIN114

Syllabus

- approximate counting with less than log(n) bits

- counting distinct elements and frequency moments

- estimating quantiles and distribution functions

- random sampling based on items' frequencies

- lower bounds using communication complexity

- streaming algorithms for graphs: matching, counting triangles, etc.

- geometric streaming algorithms: estimating the diameter, clustering using coresets

- streaming algorithms for vectors and matrices

- adversarially robust streaming algorithms

Annotation

This course focuses on streaming algorithms that are used to sequentially process massive datasets using low memory. As a result, they produce a concise summary of the dataset that approximately preserves relevant data properties.

We will cover basic problems such as counting distinct elements, estimating quantiles and random sampling, as well as algorithms for graph or geometric streams. We will also discuss how to prove lower bounds in this model using communication complexity.

Since streaming algorithms typically employ randomness, we will assume knowledge on the level of the course