Collection 32: U.S. Top Newspapers (sample of all articles)

A collection of word-frequency and other data representing 204,617 unique articles (no duplicates or close variants) published during 2012-2018 in 15 top U.S. newspapers and their associated online blogs. WE1S and other researchers use this data to look for broad patterns and help guide closer study.

Included is data based on an approximately 1:40 proportional balance between articles mentioning “humanities” (about 5,000) and a sample of articles on everything else (about 200,000 more or less “random” documents found through searching on common English words). In essence, the collection is a sampled representation of “everything” in these sources for these years (limited by the fact that it is not feasible to know how many articles were actually published in these publications, to determine how completely they were collected in available database repositories, or to harvest everything from such databases.)

In this collection, the word “humanities” occurs 7,226 times in 4,976 documents. “Science” or “sciences” combined occur 25,693 times. (“science: 22,811 times in 12,319 documents; “sciences” 2,882 times in 2,277 documents). Mentions of the “humanities” are thus 28% the number of mentions of “science(s).

News sources in this collection (in order of number of articles for each) are: New York Times, Chicago Tribune, Los Angeles Times, News Day, New York Post, Houston Chronicle, Daily News, USA Today, Dallas Morning News, Denver Post, Washington Post, Boston Globe, Star Tribune (Minneapolis), Seattle Times, Tampa Bay Times.

Suggested Citation

WhatEvery1Says (WE1S) Project. (November 14, 2019). Collection 32: U.S. Top Newspapers (sample of all articles). Zenodo. DOI 10.5281/zenodo.4940326.


Collection Metadata

  • Created by: Lindsay Thomas
  • Created on: November 14th, 2019, 12:00:00 am
  • WE1S Collection Registry ID: 20191114_1518_us-humanities-comparison-top-newspapers-unsampled
  • Data sources: LexisNexis (via LN Web Services Kit).
  • Collection dataset ("non-consumptive use" data derived from the original texts): DOI 10.5281/zenodo.4940326

Topic Models of This Collection

Model Family 1 (created November 14, 2019): models for 25, 50, 100, 150, 200, 250 topics

Visualizations for this model family:

25 topics 50 topics 100 topics 150 topics 200 topics 250 topics
Dfr-browser
TopicBubbles
pyLDAvis
DendrogramViewer
Diagnostics

WE1S Developers Only

This start page for the collection last revised: June 13, 2021