TAPI is led by JSTOR Labs, developed alongside Constellate, a text analytics platform.

Courses

Week 1 (6/7-6/11)


All times are Eastern Standard Time

Monday (6/7) Wednesday (6/9) Friday (6/11)
9-10:30am Python Basics Python Basics Python Basics
10:30am-12pm Intro to R Intro to R Intro to R

Python Basics

Instructor: Nathan Kelber

If you've never programmed before, this course is a great introduction. Taught from a humanist perspective, this course will help you start writing your first code and unlock the potential of text analysis.

Introduction to R Programming

Instructor: Jacalyn Huband

This course is a gentle introduction to R programming. With an emphasis on text analysis, this course will help you begin your adventures in programming.




Week 2 (6/14-6/18)


All times are Eastern Standard Time

Monday (6/14) Wednesday (6/16) Friday (6/18)
9-10:30am Images to Text: Intro to OCR Images to Text: Intro to OCR Images to Text: Intro to OCR
10:30am-12pm Do Things with Topic Models Do Things with Topic Models Do Things with Topic Models
1:00-2:30pm Data Analysis with Pandas Data Analysis with Pandas Data Analysis with Pandas

Images to Text: A Gentle Introduction to Optical Character Recognition with PyTesseract

Instructor: Hannah Jacobs

This course will introduce the concept of “Optical Character Recognition” (OCR), various tools available for performing OCR, and important considerations for successfully OCRing digitized text. Using Tesseract in Python, we’ll walk through the entire process using a variety of examples to show the range of challenges scholars can face when performing OCR. By the end of the course, participants should be able to use the course’s Jupyter Notebooks to perform OCR on their own; they should be able to identify possible technical challenges presented by specific texts and propose potential solutions; and they should be able to assess the degree of accuracy they have achieved in performing OCR.

How to Do Things with Topic Models

Instructor: Rafael Alvarado

This workshop will introduce students to the concept of topic models and how they have been used to advance humanistic research. Topics to be covered include topic models as a general task in text analytics, creating topic models from scratch using Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF), visualizing their results, evaluating their performance, and interpreting their results. In addition, students will be exposed to examples of how topic models have been used in humanistic and social science research. Work will be conducted using Python 3 and Jupyter Notebooks.

Data Analysis with Pandas

Instructor: Melanie Walsh

This workshop will introduce students to a popular Python package known as Pandas, a tool for data analysis and manipulation that is widely-used among data scientists. Participants will learn how to work with CSV files and JSON files, how to filter and aggregate data, how to make bar charts and time series plots, how to merge datasets with common values, and more. All case studies and examples will feature data relevant to the humanities, such as (potentially) library circulation data, screenplay data, and social media data.




Week 3 (6/21-6/25)


All times are Eastern Standard Time

Monday (6/21) Wednesday (6/23) Friday (6/25)
9-10:30am Machine Learning Machine Learning Machine Learning
10:30am-12pm Visualizing Humanities Data Visualizing Humanities Data Visualizing Humanities Data

Machine Learning

Instructor: William Mattingly

This workshop will introduce students to machine learning (ML), from its early beginnings to its modern applications; students will also be introduced to a branch of ML known as deep learning. We will specifically address how ML can be used to solve text-based problems. Day 1 will focus on the basics of ML, the key concepts and terms that practitioners must know. Day 2 will be dedicated to a common ML problem: text classification. Day 3 will focus on an adjacent problem: topic modeling. On both days, students will be provided a worfklow for solving these problems. Students will leave this workshop with a firm understanding of ML conceptually and a basic understanding of how to engage in ML via Python. Finally, students will be provided with the resources for further learning.

Visualizing Humanities Data

Instructor: Zoe LeBlanc

This course will introduce participants to some of the foundations and horizons of visualizing humanities data. To help us generate datasets we will lightly explore some text analysis methods, and then focus on some of the possibilities and pitfalls of visualizing data derived from these methods. In particular, this course will introduce participants to the principles of the grammar of graphics and exploratory data analysis through using the Python library Altair and Jupyter Notebooks. The goal of this course is to help participants learn how to incorporate visualizing humanities data into their research workflows, for both sharing aggregated information and making arguments.




Week 4 (6/28-7/2)


All times are Eastern Standard Time

Monday (6/28) Wednesday (6/30) Friday (7/2)
9-10:30am Text Analysis in Ancient/Medieval Languages Text Analysis in Ancient/Medieval Languages Text Analysis in Ancient/Medieval Languages
10:30am-12pm Machine Learning Machine Learning Machine Learning
1:00-2:30pm Named Entity Recognition Named Entity Recognition Named Entity Recognition

Text Analysis in Ancient/Medieval Languages (A Case Study with Latin)

Instructor: William Mattingly

This workshop will introduce students to natural language processing (NLP) and text analysis in ancient and medieval languages. We will use Latin as a case study. Day 1 will focus on the basics of NLP and spaCy, one of the leading NLP libraries for Python. Day 2 will address the textual problems of working with ancient/medieval languages, including how to handle highly-inflected languages; lemmatization without a lemmatizer; and accounting for textual, geographical, and temporal variances of the language. Day 3 will address a single text analysis problem: named entity recognition (NER) in Latin. On this final day, we will develop a workflow for solving this problem. Students will leave this workshop with a strong understanding of NLP and NER. They will also have an understanding of how to solve text analysis problems in highly-inflected or dead languages. Students will be provided with the resources for further learning. Finally, students will leave the workshop with a working NER model that they can use and improve in the future.

Machine Learning

Instructor: Grant Glass

This course will introduce you to many techniques available to analyze textual data with different Machine Learning techinques in Python. You will be introduced to the theory and method of Machine Learning and given some practical skills on how to write and execute machine learning code in Python. Some basic experience with Python will be required for participation in the class coding projects, but feel free to join us if you want to have a better understanding of what Machine Learning techniques can do for humanists. Generally speaking, this class will help you think about humanities problems through the lens of Machine Learning.

Named Entity Recognition

Instructor: Zoe LeBlanc

This course will introduce participants to one of the core areas of natural language processing - named entity recognition. While annotating datasets with set standards is one of the oldest areas of DH research (particularly with the Text Encoding Initiative), this course will focus on some of the newer approaches for identifying and annotating objects of interest in any given text. The course will focus on using the Python library Spacy with both it's built-in functionality, and also learning how to expand upon it for more specific uses. While this course is taught in English, participants are encouraged to bring sources in multiple languages. Ultimately, participants will learn both how to leverage NER in their research and how to tailor NER to their specific textual sources.

Created by:

Funded by:

Hosted by: