Learn and perform text analysis, build datasets, and share analytics course materials. Open the black box of text analysis with Constellate, from JSTOR and Portico.

Visit Constellate


Text analytics, or the process of deriving new information from pattern and trend analysis of the written word, has the potential to revolutionize research across disciplines. Sadly, there is a massive hurdle facing those eager to unleash its power.  The coding skills and statistical knowledge that text mining requires can take years to develop.   All too often, researchers learn about the promise of text mining, only to have it revealed that the promise can be realized solely by the select few with the necessary technical skills.  Ted Underwood, Professor of English at the University of Illinois, likens this scenario to researchers being presented with a “deceptively gentle welcome mat, followed by a trapdoor."


ITHAKA has addressed this problem by building Constellate, a text analytics platform aimed at teaching and enabling a generation of researchers to text mine. Two of ITHAKA’s services, JSTOR and Portico, are the initial sources of content for the new platform, which now includes Chronicling America, collections from Documenting the American South, the South Asia Open Archives and Independent Voices from Reveal Digital.  

Constellate provides value to users in three core areas -- they can teach and learn text analytics, build datasets from across multiple content sources, and visualize and analyze their datasets:

Learn & Teach

  • Template and Tutorial Code: Work with template Jupyter Notebooks to analyze your dataset and learn about text analytics (with additional environments forthcoming, such as R Studio).
  • Lessons and Documentation: Lessons and educational materials created by a community of experts, including those from the NEH-funded Text Analysis Pedagogy Institutes.
  • Collaborative Teaching Materials Creation: Users may create, edit, reuse and collaborate in the creation of tutorials, code, documentation, and other educational resources for text analysis (our tutorial notebooks are all available in GitHub, in addition to being accessible in our Analytics Lab).


  • Multiple Collections: Anchor collections from JSTOR and Portico, with additional content sources continually added (such as Library of Congress’ Chronicling America). Further details about the collections are available.
  • Data Download in JSON
    • All content - bibliographic metadata, unigrams, bigrams, trigrams
    • Open content - bibliographic metadata, full-text, unigrams, bigrams, trigrams
  • Dataset Dashboard: Easily view datasets you have built or accessed.


  • Analytics Lab: Integrated computational environment powered by BinderHub that will allow users to seamlessly analyze text content using provided template Jupyter Notebooks and tutorials.
  • Visualize: Built-in visualizations for your datasets.
  • Work with Rights Restricted Full Text: We are investigating the best way to meet this need -- please contact us at if you need rights restricted full-text or just want to talk about your research.

Interested in Participating?

Reach out to us to participate in our beta program and get access to larger datasets and text analytics classes.