This course would be perfect for all undergraduate and graduate students, who have zero to little experience in computational text processing and would love to get the basic knowledge of how to quantitatively process big amounts of textual data.
The course is going to be carried out as hybrid course with interchangeable online and offline classes. A particular focus is going to be on text processing for the Ukrainian language, however, it will also introduce you to the main ideas of natural language processing that can also be applied to the other Slavic and Non-Slavic languages as well.
The course is going to tackle the following topics:
-Installing Python and setting up the programming environment
-Basics of programming in Python: loops, functions, libraries, syntax
-Creating your own corpus: web scraping, tools for corpus creation, Twitter API
Text preprocessing: cleaning, named entity recognition, tokenisation and lemmatisation
-Storing and managing big data: working with pandas
-Basics of quantitative text analysis: word frequency, building ngrams, topic modelling
-Visualizing quantitative data: matplotlib and seaborn
-Vectorization: word2vec, Fasttext, Glove
-Introduction to machine learning: basic principles and ideas
-Using machine learning methods for text classification: fine-tuning existing models
-Application of existing annotated corpora and developing your own annotation guidelines
-Introduction to transformers: sentiment analysis with transformer models.
|