The course will focus on practical aspects of data processing and preparation in Digital Humanities and is aimed at students of humanities in general, although most of the techniques and example material will reflect the usual needs of linguists or historians. The prospective students should have basic IT skills, but no previous experience with methods or apps used in the course is required.
The classes will usually start with a short introduction of an (usually) online and (usually) publicly available data source, the method of its mining and continues with the tools and techniques allowing its users to exploit the mined data for further analysis. The class may end with examples of such analyses, but those are not the focus of the course. Students looking for data analysis courses may consider e.g. Statistics (not only with R) for corpus and quantitative linguistics (AMLV00046), English Diachronic Corpora (AAA500147) etc.
The tools students will train with may include (but are not limited to): text editors with advanced RegExp capabilities (e.g. JEdit, EditPad), XML editors and processors (JEdit or oXygen), spreadsheet processor (Microsoft Excel), relational databases (MS Excel PowerPivot, MySQL), programming languages for text processing (Perl, Python, R) etc. The choice of the actual tools and techniques depends on the class composition, student interests and needs (feel free to bring in your own projects).