Charles Explorer logo
🇬🇧

Data Science in R for Students of Humanities

Class at Faculty of Mathematics and Physics |
NPFL112

Syllabus

1. Basic concepts of R, advantages of R in data analysis as a subdiscipline of programming

2. Tables, vectors, loading a table file, vector as a table column, variable types as vector classes, selection (subsetting) of elements, rows and columns in base R

3. ggplot2 graphics library, mapping variables to aesthetic scales, types of graphs and scales (geom_, scale_ functions)

4. Data wrangling - dplyr library: selection and manipulation of rows (filter, slice, arrange) and columns (select, rename, mutate, if_else, case_when)

5. Data wrangling - groups (group_by, across, rowwise), aggregation (count, summarize)

6. Table joins (SQL-like)

7. "tidy data" concept, conversion between "wider" and "longer" table format for use with dplyr and ggplot2, tidyr (pivot_longer, pivot_wider, unite and separate)

8. Operations on strings, regular expressions incl. "look-around"

9. The concept of iteration in R: vectorization, loop, apply family functions and map family functions from the purrr library in common user situations

10. Text mining with the help of automatic syntactic annotation, interaction with the API of the UDPipe syntactic parser Favorite datasets: gapminder (https://www.gapminder.org/data/), built-in datasets iris, diamonds, corpora

Annotation

The humanities have seen an irreversible paradigm shift towards Digital Humanities, based on automatic quantitative analysis of (big) data.

We will teach you:

- to clean and structure data into neat tables;

- to discover trends, recurring patterns, and outliers

- basics of modern data visualization

We use the open-source programming language R along with its advanced RStudio IDE and tidyverse, the globally popular collection of professional data-scientific tools.