Collection 28: Tweets containing keyword "humanities," c. 2014-2017.

Collection 28 (C-28) of the WE1S Twitter corpus consists of 799,744 tweets containing the keyword "humanities" from authors who tweeted the term "humanities" more than once between Jan. 1, 2014, and Dec. 31, 2017. (See also C-29, which aggregates tweets by author.)

Explanation: Our WE1S Twitter Corpus originally consisted of 1,589,462 individual tweets containing the keyword "humanities," posted between January 1st, 2014 and June 30th, 2019. However, due to the astronomical number of individual documents, we decided that reduction was necessary to have a workable model. We opted to include only tweets from 2014-2017 because Twitter switched to a larger character limit in Nov. 2017, and we wanted to work with the older format.

Even with these parameters, further reduction was necessary. Thus, in this collection we decided to omit tweets of authors who only tweeted once about the humanities. This step limited the corpus to 799,744 individual tweets, each of which is treated as an individual document.

The advantage of this method is that it retains tweets as individual documents, maintaining the ability to link directly to the tweets in the model. However, due to the large number of documents, visualizations can still become unstable and/or unresponsive. (But see C-29 for a variant corpus that aggregates authors' tweets. See also M-8 for an evaluation of trade-offs between C-28 and C-29.)

Suggested Citation

WhatEvery1Says (WE1S) Project. (July 20, 2020). Collection 28: Tweets containing keyword "humanities," c. 2014-2017. Zenodo. DOI 10.5281/zenodo.4940253.

Collection Metadata

Topic Models of This Collection

Model Family 1 (created June 20, 2019): models for 50, 100, 150, 200, 250 topics

Visualizations for this model family:

50 topics 100 topics 150 topics 200 topics 250 topics

WE1S Developers Only

This start page for the collection last revised: June 12, 2021