Collection 37

Articles containing the words “science” or “sciences” but that have been classified as not being about science, c. 1998-2018

A collection of word-frequency and other data representing 87,278 unique articles (no duplicate or close-variant documents) that contain the words "science" or “sciences” but that have not been classified as being about science published from 1998-2018 in 610 U.S. top-circulating and student newspapers and their associated blogs. The collection includes 13,628 articles from U.S. top-circulating newspapers and 73,650 articles from student newspapers. Supervised classification models have classified these articles as not being about science; this collection therefore helps WE1S understand what articles that contain the words "science" or “sciences” but that aren't about science per se are like.

News sources in Collection 37 include 15 top-circulation U.S newspapers: Boston Globe, Chicago Tribune, Daily News (New York), Dallas Morning News, Denver Post, Houston Chronicle, Los Angeles Times, New York Post, New York Times (and its blogs), Newsday (New York), Seattle Times, Star Tribune (Minneapolis, MN), Tampa Bay Times, USA Today, Washington Post. Also included are documents from 599 U.S. campus newspapers, among which the top 10 sources in the collection are: The Stanford Daily (Stanford University), The Marquette Tribune (Marquette University), The Daily Bruin (UC Los Angeles), The California Aggie (UC Davis), The Daily Cardinal (UW Madison), The Badger Herald (UW Madison), Cornell Daily Sun (Cornell University), Daily Californian (UC Berkeley), The Harvard Crimson (Harvard), The Daily Universe (Brigham Young University).

Kinds of Sources (by Tags)

Sources in Collection 37 are associated with the following non-exclusive metadata categories, which describe the kinds of sources in the collection. Of the 87,278 total documents: 51,429 are from publications located at doctoral institutions; 48,031 are from publications located at US public colleges or universities; 26,325 are from publications located in the North East; 25,057 are from publications located at US private colleges or universities; 20,800 are from publications located in the South; 20,045 are from publications located in the Midwest; 13,642 are from US (non-campus) newspapers; 13,628 are from US top-circulating newspapers; 12,524 are from publications located on the West Coast; 8567 are from publications located at Hispanic-serving institutions; 6287 are from publications located in the Rocky Mountain region or the Southwest; 5841 are from publications located at liberal arts institutions; 4341 are from Catholic colleges or universities; 4078 are from publications located at institutions within the UC system; 3779 are from publications located at institutions within the Ivy League; 2724 are from publications located at science/tech and/or agricultural schools; 2660 are from publications located at institutions within the Cal State system; 2045 are from publications located at community colleges; 1918 are from publications located at (Protestant) Christian collegs or universities; 1100 are from publications located at collegs or universities affiliated with the Church of Jesus Christ of Latter-Day Saints; 518 are from publications located at Historically Black Colleges and Universities; 450 are from US publications located outside the contiguous US; 331 are from non-US colleges; 324 are from publications located in Europe; 313 are from publications located at Jewish colleges or universities; 287 are from publications categorized as multiregional or as having national reach; 152 are from publications located at women's colleges; 44 are from websites; 41 are from publications that self-identify as centering on issues of race, ethnicity, and/or cultural heritage; 14 are from news wires or aggregators; 7 are from publications associated with the England Labour Party in the UK; 7 are from publications identified as non-US newspapers; 7 are from publications identified as multiregional UK publications; 7 are from publications located in Canada; 3 are from radio programs; 2 are from academic sources. Sources are assigned to categories based solely on explicit publication information and/or self-identification.

Suggested Citation

WhatEvery1Says (WE1S) Project. (June 7, 2020). Collection 37: Articles containing the words “science” or “sciences” but that have been classified as not being about science, c. 1998-2018. Zenodo. DOI 10.5281/zenodo.4958256.


Collection Metadata


Topic Models of This Collection

Model Family 1 (created June 7, 2020): models for 25, 50, 100, 150, 200, 250 topics

Visualizations for this model family:

25 topics 50 topics 100 topics 150 topics 200 topics 250 topics
Dfr-browser
TopicBubbles
pyLDAvis
DendrogramViewer
Diagnostics

WE1S Developers Only

This start page for the collection last revised: June 14, 2021