Overview
The Summary displays 7 categories of information formatted in a bulleted list.
The first bullet provides an overview of the corpus, including the number of documents in the workset, the number of words in the workset, and the number of unique words in the workset.
The second point provides the top longest documents (by number of words) in the corpus, and the shortest documents. Following each title the actual number of words is provided in brackets. As well the point illustrates the distribution of document length across the corpus through a small thumbnail pic just to the right of the point’s keyword.
The third point provides the documents with the top vocabulary densities, and the documents with the lowest. Following each title the vocabulary density for the document is indicated in brackets. As well the point illustrates the distribution of vocabulary density across the corpus through a small thumbnail pic just to the right of the point’s keyword.
The fifth point provides an estimation of the readability of each of the documents.
Exploring Textual Data with the Torchlite Summary Widget:
In our Hackathon, we're excited to introduce participants to the "Torchlite Summary" widget, a powerful tool designed to enhance your experience with textual data analysis.
Developed using Python in a Jupyter Notebook environment and leveraging the HathiTrust Research Center (HTRC) APIs, this widget offers a streamlined approach to summarizing and exploring large datasets. Whether you're analyzing literary works, historical documents, or any extensive text corpus, "Torchlite Summary" provides key insights through word frequencies, trends, and thematic overviews.
We encourage all participants to utilize this tool to uncover hidden patterns, compare textual features, and drive their projects to new depths of analysis. A step-by-step guide within the notebook will help you get started, offering insights into API integration, data visualization, and customization options to tailor the tool to your specific research questions.
Code Base of Torch lite Summary