Understanding Shakespeare is a collaborative project between JSTOR Labs and the Folger Shakespeare Library . It’s a research tool that allows students, educators and scholars to use the text of Shakespeare’s plays to quickly navigate into the scholarship written about them—line by line. Users simply click next to any line of text in a play and relevant articles from the JSTOR archive immediately load.

The corresponding links to articles on JSTOR were created using a suite of textual analysis tools performed on the JSTOR archive. Learn more about Understanding Shakespeare in the FAQs, below, or try it for yourself.

Understanding Shakespeare FAQS

Who can use Understanding Shakespeare?

Understanding Shakespeare is a free resource that is open to the public. The Folger Digital Texts are free to use for all non-commercial uses. Each article title links to the full-text on the main JSTOR website, which is available to people at participating institutions or with individual access to JSTOR. Many articles in Understanding Shakespeare are also available for online reading with a free Register and Read account.

The site is expected to be especially useful to upper-level undergraduate and graduate students. It may also be useful to educators teaching these plays, scholars writing about them, and even actors preparing their line-readings.

How did you create the links between the play and the scholarly articles?

The links between the Folger play and the articles on JSTOR were created with a two-step process. First, for each play, a candidate set of articles was selected by performing a full-text search on JSTOR for the play’s title and “shakespeare*.” Second, we performed a fuzzy-text-matching algorithm between the play and any words appearing either in block quotes or between quotation marks.

Is it possible that this methodology might miss some quoted passages?

Yes. We limited the selection of content to those that we had page-scans for, which means that both current issues of journals as well as books were not included. It is also possible (but, we believe, not especially likely) that an article quotes a play without mentioning the play’s title and Shakespeare’s name. The analysis was performed with a snapshot of our content, so content added after the latest snapshot will not automatically get added. Lastly, we used fuzzy-text-matching to overcome some problems caused by either JSTOR’s OCR-created full text or by alternate spellings of the play’s text. It is likely that some matches were missed by this methodology. For example, if a scholarly article only includes the Middle English and not the modern spellings, or if it quotes a passage not included in the Folger digital text, then it would be missed.

I want to geek out. Can you go into more detail about how you created the links?

You asked for it. The play-to-JSTOR article links in the Labs Shakespeare prototype are generated by matching quoted text in articles on JSTOR to text in the Folger digital text edition of the target play. The selection of candidate articles from the JSTOR corpus was performed using full-text queries on the JSTOR search index. The queries included keywords designed to find articles with references to specific plays and additional search filters restricting the results set to digitized-text content only (both journal articles and pamphlets). These filters explicitly remove books and journal current issues from consideration. The keywords used in the search were purposely broad to identify as many candidate articles as possible from the archive corpus. The full-text queries used for some of the plays are listed below:

Macbeth: shakespeare* AND macbeth

Romeo and Juliet: shakespeare* AND romeo AND juliet

Hamlet: shakespeare* AND hamlet

Julius Caesar: shakespeare* AND julius AND caesar

Twelfth Night: shakespeare* AND (twelfth OR 12th) AND night

Henry V: shakespeare* AND henry AND v

The query is based on the premise that an article quoting a Shakespeare play will include the word “Shakespeare” (including the possessive form) and the primary word(s) from the play title. Any articles containing Shakespeare quotes that do not satisfy this condition would not be included in the text analysis.

Using this approach the number of articles analyzed for those plays were:

Macbeth: 14,581
Romeo and Juliet:  9,385
Hamlet: 27,974
Julius Caesar: 15,610
Twelfth Night: 10,425
Henry V: 40,805

Once we had a selection of candidate articles for each play, we compared the text of the play to that in the articles. Both block and inline quotes were considered in the matching process. Block quotes were identified by using OCR word/line coordinates to identify text passages indented from surrounding text. Inline quotes were identified by text bounded by quotation (“) characters. The identified quotes from the articles on JSTOR and the Folger play text were normalized to remove all punctuation (including line breaks) and rejoin any words split by line break hyphenation. All text was then converted to lowercase. Once the text was normalized, candidate matches were found using a fuzzy text matching process based on the Levenshtein distance measure. Levenshtein is a similarity measure of two texts that counts the minimum number of operations (the removal or insertion of a single character or the substitution of one character for another) required to transform one text into the other. Using this approach we find the substring from the play text with the smallest Levenshtein edit distance for each quote. For the best match, a similarity score is computed as the ratio of the Levenshtein distance and the length of the quoted text.

In the case of Macbeth, a total of 14,581 articles were analyzed. A total of 88,987 separate quotes were matched in these articles. After applying filters to the match candidates we ended up with a total of 6,071 matches that met our filtering thresholds. These matches were found in 1,155 articles. The thresholds for filtering candidate matches include a similarity score (calculated from the Levenshtein edit distance and the match length) and the match size. This approach works well for larger quotes but tends to include a good number of false hits on smaller passages (generally of 15-20 characters or fewer) when the quote consists of words/phrases in common use today. In this version of the prototype we’re using fairly conservative values of 0.8 and 15 for the similarity score and minimum match length, which minimizes the number of false hits but has the unfortunate consequence of filtering out some good matches too. A future refinement of the filtering process will likely include a measurement of how common a phrase is in modern usage. This would enable us to keep an 11 character quote like “hurly burly” but inhibit something like “is not true”.

Can I give feedback on Understanding Shakespeare?

We’d love to hear from you! Please contact us at labs@ithaka.org.

Privacy Policy

Thank you for
contacting us!