HTRC Extracted Features

  • Public domain, downloadable

  • Structured data consisting of human-supplied (catalog) metadata and algorithmically-derived features

  • From 17.1 million volumes (i.e., not quite in sync with HTDL)

  • Linked-data compliant (JSON-LD)

HTRC Extracted Features (EF)

The features are:

  • Volume- and page-level

  • Selected data and metadata

  • Extracted from raw text

Position the researcher to begin analysis

  • Some standard natural language & statistical preprocessing is already done

Per-volume features

Excerpted from catalog metadata, including:

  • Title

  • Author

  • Language

  • Publication data

  • Identifiers

  • [Subjects]

HTRC Extracted Features API documentation is available here: https://htrc.stoplight.io/docs/ef-api/06db4dc572b49-ef​

API CALLS

  • GET EF data for a volume by volume id

    • https://tools.htrc.illinois.edu/ef-api/volumes/{clean-htid}

  • Check if a volume exists (HEAD)

    • https://tools.htrc.illinois.edu/ef-api/volumes/{clean-htid}

  • GET volume metadata by volume id

    • https://tools.htrc.illinois.edu/ef-api/volumes/{clean-htid}/metadata

  • GET subset of pages of volume by volume id

    • https://tools.htrc.illinois.edu/ef-api/volumes/{clean-htid}/pages

  • Create workset (POST)

    • https://tools.htrc.illinois.edu/ef-api/worksets

  • DELETE workset by workset id

    • https://tools.htrc.illinois.edu/ef-api/worksets/{workset-id}

  • GET workset

    • https://tools.htrc.illinois.edu/ef-api/worksets/{workset-id}

  • GET workset volumes by workset id

    • https://tools.htrc.illinois.edu/ef-api/worksets/{workset-id}/volumes

  • GET workset volumes metadata by workset id o

    • https://tools.htrc.illinois.edu/ef-api/worksets/{workset-id}/metadata

OBSERVABLE Notebooks

OBSERVABLE DOCUMENTATION (comprehensive)

OBSERVABLE DOCUMENTATION (shorter)

Last updated