Charles Explorer logo
🇬🇧

Basic R

Class at Faculty of Arts |
APS300426E

Syllabus

Introductiona. R framework and available software to use it, installing packages and solving problems (different OS, missing libraries, R versions, StackOverflow)b. data types, base R functionsc. saving and loading data, Rdata files, work environmentd. R documentation and CRAN, creating project and its structuree. DataCamp courses

Introduction IIa. R syntax, cycles, conditions, apply family functions, writing functions, „OOP" in Rb. Git, installing packages from GitHubc. best practices, defensive programming,

Tidyverse package – ggplot2, dplyr, tidyR, readr, purrr, tibble, stringer, forcats

Statistics in R – correlation, regression, t-test, anova, chí-square, probability and distributions

Data visualization in R (ggplot, plotly, lattice, gganim)

Visualization best practices

R Markdown – slides, HTML pages, pdf files, docx documents, LaTeX, bibTeX and CSS basics

R shiny

Psychometrics in R – lavaan, psych, psychometrics, mirt, mirtCAT

Missing data, types of missing data, consequences to parametric statistics, imputation methods, multiple imputation

Text analysis and text mining – quanteda, word2vec; basic steps (); text statistics and summaries, readability indices, word frequency, similarity

Basics of unsupervised and supervised machine learning

Preparation for the exam - selection of suitable methods of analysis and visualization for different types of data, communication of your findings

Annotation

This course acquaints students with data science methods with application in the environment of the R. language. It expands the previous knowledge of statistical methods acquired in the bachelor's degree or self-study.

Data science is a combination of various fields, including mathematics, statistics, computer science, information science, machine learning and artificial intelligence. An article in the Harward Business Review refers to data science as "The Sexiest Job of the 21st Century" (Davenport & Patil, 2012). The most commonly used tools in this area are Python, SQL and R.

R is a programming language and environment designed for statistical analysis of data and their graphical display. It is an implementation of the programming language S under a free license. Because it's free, R has already outpaced commercial software such as SPSS in terms of users. At the same time, it provides users with a number of features beyond the free software, such as Jasp or Jam. The functionality of the R environment can be extended using libraries called packages, of which more than 15,000 are available in the CRAN repository. R is thus very variable and can be used for a number of different tasks.

Davenport, Thomas H., and D. J. Patil. "Data Scientist: The Sexiest Job of the 21st Century." Harvard Business Review 90, no. 10 (October 2012): 70–76.

Rodriguez Salgado, J. J. (2021, December 9). What does a data scientist do? breaking down the responsibilities of data scientists. DataCamp Community. Retrieved December 19, 2021, from https://www.datacamp.com/community/blog/what-does-a-data-scientist-do