HTRC Extracted Features
Public domain, downloadable
Structured data consisting of human-supplied (catalog) metadata and algorithmically-derived features
From 17.1 million volumes (i.e., not quite in sync with HTDL)
Linked-data compliant (JSON-LD)
HTRC Extracted Features (EF)
The features are:
Volume- and page-level
Selected data and metadata
Extracted from raw text
Position the researcher to begin analysis
Some standard natural language & statistical preprocessing is already done
Per-volume features
Excerpted from catalog metadata, including:
Title
Author
Language
Publication data
Identifiers
[Subjects]
HTRC Extracted Features API documentation is available here: https://htrc.stoplight.io/docs/ef-api/06db4dc572b49-ef​
API CALLS
GET EF data for a volume by volume id
https://tools.htrc.illinois.edu/ef-api/volumes/{clean-htid}
Check if a volume exists (HEAD)
https://tools.htrc.illinois.edu/ef-api/volumes/{clean-htid}
GET volume metadata by volume id
https://tools.htrc.illinois.edu/ef-api/volumes/{clean-htid}/metadata
GET subset of pages of volume by volume id
https://tools.htrc.illinois.edu/ef-api/volumes/{clean-htid}/pages
Create workset (POST)
https://tools.htrc.illinois.edu/ef-api/worksets
DELETE workset by workset id
https://tools.htrc.illinois.edu/ef-api/worksets/{workset-id}
GET workset
https://tools.htrc.illinois.edu/ef-api/worksets/{workset-id}
GET workset volumes by workset id
https://tools.htrc.illinois.edu/ef-api/worksets/{workset-id}/volumes
GET workset volumes metadata by workset id o
https://tools.htrc.illinois.edu/ef-api/worksets/{workset-id}/metadata
OBSERVABLE Notebooks
OBSERVABLE DOCUMENTATION (comprehensive)
OBSERVABLE DOCUMENTATION (shorter)
Last updated