Tagged in

Text Mining

Sense and Sentences
Sense and Sentences
Data mining the past, by Peter Organisciak
More information
Followers
29
Elsewhere
More, on Medium

A Dataset of Term Stats in Literature

Following up on Term Weighting for Humanists, I’m sharing data and code to apply term weighting to literature in the HTRC’s Extracted Features dataset.

Prepared from 235,000 Language and Literature (i.e. LCC Class P) volumes, I’ve…


HTRC Feature Reader 2.0

I’ve released an overhaul of the HTRC Feature Reader, a Python library that makes it easy to work with the Extracted Features (EF) dataset from the HathiTrust. EF provides page-level feature counts for 4.8 million volumes, including part-of-speech tagged term counts, line and sentence…