# HTRC Extracted Features

### [HTRC Extracted Features dataset](https://analytics.hathitrust.org/datasets#ef)​ <a href="#htrc-extracted-features-dataset" id="htrc-extracted-features-dataset"></a>

* Public domain, downloadable
* Structured data consisting of human-supplied (catalog) metadata and algorithmically-derived features
* From 17.1 million volumes (i.e., not quite in sync with HTDL)
* Linked-data compliant (JSON-LD)

### HTRC Extracted Features (EF) <a href="#htrc-extracted-features-ef" id="htrc-extracted-features-ef"></a>

The features are:

* Volume- and page-level
* Selected data and metadata
* Extracted from raw text

Position the researcher to begin analysis

* Some standard natural language & statistical preprocessing is already done

### Per-volume features <a href="#per-volume-features" id="per-volume-features"></a>

Excerpted from catalog metadata, including:

* Title
* Author
* Language
* Publication data
* Identifiers
* \[Subjects]

![](https://lh7-us.googleusercontent.com/C-h_DI4CA6WZ2jZBswkOy7RMC4RMv5bnCejYOBm-Ii-W_z_h3r5hbGWjqaFLVfzJfoUvqMvWk05s9eoOnls6zvt5nZR5YVn0ulLwmfFMgM30I8qI3Hr_6pO-QXbsX2MMX-vmdoVb8pUF0BEhMi_NrA=s2048)

> #### **HTRC Extracted Features API documentation is available here:** [*https://htrc.stoplight.io/docs/ef-api/06db4dc572b49-ef*](https://htrc.stoplight.io/docs/ef-api/06db4dc572b49-ef)​ <a href="#htrc-extracted-features-api-documentation-is-available-here-https-htrc.stoplight.io-docs-ef-api-06db" id="htrc-extracted-features-api-documentation-is-available-here-https-htrc.stoplight.io-docs-ef-api-06db"></a>

## API CALLS&#x20;

* **GET** EF data for a volume by volume id&#x20;
  * <https://tools.htrc.illinois.edu/ef-api/volumes/{clean-htid}&#x20>;
* Check if a volume exists (**HEAD**)&#x20;
  * <https://tools.htrc.illinois.edu/ef-api/volumes/{clean-htid}&#x20>;
* **GET** volume metadata by volume id&#x20;
  * <https://tools.htrc.illinois.edu/ef-api/volumes/{clean-htid}/metadata&#x20>;
* **GET** subset of pages of volume by volume id&#x20;
  * &#x20;<https://tools.htrc.illinois.edu/ef-api/volumes/{clean-htid}/pages&#x20>;
* Create workset (**POST**)&#x20;
  * <https://tools.htrc.illinois.edu/ef-api/worksets&#x20>;
* **DELETE** workset by workset id&#x20;
  * <https://tools.htrc.illinois.edu/ef-api/worksets/{workset-id}&#x20>;
* **GET** workset&#x20;
  * <https://tools.htrc.illinois.edu/ef-api/worksets/{workset-id}&#x20>;
* &#x20;**GET** workset volumes by workset id&#x20;
  * <https://tools.htrc.illinois.edu/ef-api/worksets/{workset-id}/volumes&#x20>;
* &#x20;**GET** workset volumes metadata by workset id o
  * <https://tools.htrc.illinois.edu/ef-api/worksets/{workset-id}/metadata>

## OBSERVABLE Notebooks&#x20;

[`OBSERVABLE DOCUMENTATION (comprehensive)`](https://observablehq.com/@observablehq/documentation)

[`OBSERVABLE DOCUMENTATION (shorter)` ](https://htrc.gitbook.io/torchlite/broken-reference)

* [Exploring API ](https://observablehq.com/@jswatsch/torchlite-ef-api)
* [Word Cloud](https://observablehq.com/@jswatsch/torchlite-workset-word-cloud)
* [Contributor Map](https://observablehq.com/d/e69a3c5185393caa)
