Collection 38: Reddit on the Humanities

(Reddits mentioning "humanities," "liberal arts," or "the arts" from 2006-2019)

Collection 38 (C-38) includes 124,340 Reddit comments longer than 225 words from 2006 to 2019 containing the terms humanities, liberal arts, or the arts.

Explanation

 WE1S's rationale and methodology for collecting Reddit posts to study public discourse about the humanities (especially by students) is explained in the blog post by WE1S's lead Reddit researcher Raymond Steding: "A Digital Humanities Study of Reddit Student Discourse about the Humanities" (2019).

As Steding notes, he initially collected 3.3 terabytes of Reddit data from 2006 to 2018 (approximately five billion comments) by downloading it from pushshift.io in JSON format. Data for 2019 was later added.

This data was filtered to retain only comments containing at least one of the terms humanities, liberal arts, or the arts. Then comments under 225 words were subtracted to improve the coherence of WE1S's topic models of the collection.

WE1S did not weed out duplicates in this collection; nor filter posts and comments based on Reddit user "karma" scores.

Suggested Citation

WE1S Project, Collection 38: Reddit on the Humanities (Reddits mentioning "humanities, "liberal arts," or "the arts" from 2006-2019), 2020, doi:[TBD].


Collection Metadata


Topic Models of This Collection

Model Family 1 (created July 18, 2019): models for 100 and 200 topics

25 topics 50 topics 100 topics 150 topics 200 topics 250 topics
Dfr-browser
TopicBubbles
pyLDAvis
Metadata7D
GeoD
DendrogramViewer
Diagnostics

WE1S Developers Only