Data Sets for Mind on Statistic, 3rd edition (Utts and Heckard)Click here for an index of Datasets available.Datasets are available in each of these formats:
A few data sets have some missing values due to nonresponse and these are represented either by an asterisk or a blank entry depending on the file format and type of variable. Descriptions of the Data SetsThe links in the table immediately below can be used to jump
to the description of a data set. Within each description, there
is a link that can be used to view a text version of the data
set. beans Fifteen students each put as many beans into a cup as they were able in 15 seconds. All students did the task twice, once with their dominant hand and once with their nondominant hand. (Source: The authors) There are three columns of data: Column Name
Description
Age and physical measurement data for n=19 female wild bears. (Source: Minitab, Inc.). There are five columns of data:: Column Name
Description
Data for n = 1,560 plays of the Pennsylvania lottery game
called Cash 5. In the game the state randomly picks 5 numbers
between 1 and 39. The data are used in an example in Chapter
9. This game is called Fantasy 5 in California and some data
for that state are in the dataset Fantasy5. There are six columns of data:: Column Name
Description
Ages and annual salaries for the CEOs of the 60 top ranked
small companies in America in 1993. Source: Forbes,
Nov. 8, 1993, "America's Best Small Companies,"
There are two columns of data:: Column Name
Description
The data used for Examples 16.1, 16.12, and 16.13. There are three columns of data:: Column Name
Description
Heights (inches) for 55 females and 45 males. The data are used by the Sampling applet in Section 3.7. (Source: The authors.) There are two columns of data:: Column Name
Description
cholest Cholesterol levels for 28 heart attack patients measured 2, 4, and 14 days after the heart attack, and also cholesterol levels for 30 other patients who did not have a heart attack. (Source: Minitab, Inc.) There are four columns of data: Column Name
Description
Body weights and the time it takes to chug a 12ounce beverage for n=13 college students. The data were submitted by a student for a class project. (Source: William Harkness, Pennsylvania State University) There are two columns of data: Column Name
Description
Amount of sleep per night and whether person feels sleep deprived for n=86 college students. (Source: Laura Simon, Pennsylvania State University) There are two columns of data: Column Name
Description
Data for n = 2,318 plays of the California lottery game called Fantasy 5. In the game, the state randomly picks 5 numbers between 1 and 39. This game is called Cash 5 in Pennsylvania and some data for that state are in the dataset Cash5. (Source: California Lottery website, www.calottery.com) There are six columns of data: Column Name
Description
Data for the n = 365 daily plays in 1999 of the California lottery game called Fantasy 5. In the game, the state randomly picks 5 numbers between 1 and 39. The data in this data set are a subset of the data set Fantasy5. (Source: California Lottery website, www.calottery.com) There are six columns of data: Column Name
Description
firstladies Ages of death for First Ladies of the United States. Data
are given only for those who have died as Oct. 19, 2005. Column Name
Description
GSS02 Data are for thirteen variables and n =2765 respondents in the 2002 General Social Survey, a national survey done by the National Opinion Research Center at the University of Chicago Some questions are only asked of a portion of of the survey participants, so there is quite a bit of missing data. GSS93 gives 1993 data for the first eleven variables in this dataset. (Source: SDA archive at UC Berkeley website, http://csa.berkeley.edu:7502) There are eleven columns of data: Column Name
Description
Data are for eleven variables and n = 1,606 respondents in the 1993 General Social Survey, a national survey done by the National Opinion Research Center at the University of Chicago Some questions are only asked of about twothirds of the survey participants, so there is quite a bit of missing data. (Source: SDA archive at UC Berkeley website, http://csa.berkeley.edu:7502) There are eleven columns of data: Column Name
Description
n=500 randomly selected observations from the GSS93 data set. The random selection was done by the authors using Minitab. (Source: SDA archive at UC Berkeley website, http://csa.berkeley.edu:7502) There are eleven columns of data: Column Name
Description
Heights and stretched handspans for n=167 college students. Each student decided which of their hands to measure. The data are used for examples in Chapters 5 and 14. (Source: The authors) There are three columns of data: Column Name
Description
Tip percents on restaurant bills for male and female servers who either draw a happy face on the bill or do not in a randomized experiment.The data are used for Example 16.6 and an example in Supplemental Topic S4. (Source: Professor Bruce Rind of Temple University) There are three columns of data: Column Name
Description
heightfoot Heights and foot lengths for n=33 college students. (Source: William Harkness, Pennsylvania State University) There are two columns of data: Column Name
Description
Actual weight and ideal weight for n=182 college students. Actual weights were reported by the students The "ideal" weight is the response to, "What is your ideal weight?" The data were collected in the Fall semester of 1999. (Source: William Harkness, Pennsylvania State University) There are four columns of data: Column Name
Description
Actual and ideal weights for the n=63 males in the idealwt data set. (Source: A subset of the idealwt data set) There are four columns of data: Column Name
Description
Actual and ideal weights for the n=119 females in the idealwt data set. (Source: A subset of the idealwt data set) There are four columns of data: Column Name
Description
A sample of 63 students wrote as many letters of the alphabet, in order as capital letters, as they could in 15 seconds using their dominant hand, and then repeated this task using their nondominant hand. The variables dom and nondom contain the raw data for the results. (Source: Laura Simon, Pennsylvania State University) There are three columns of data: Column Name
Description
Responses by n = 24 college students to the question, "About how many music CDs do you own?" Data are used for Example 2.7. (Source: The authors.) There are two columns of data: Column Name
Description
Data for n = 299 eruptions of the Old Faithful geyeser. (Source: Hand, et. al., 1994, A Handbook of Small Data Sets ) There are two columns of data: Column Name
Description
Data from n=190 college students in a statistical literacy class. The survey is described in Section 2.1 of the text, and the data are the basis for several examples in Chapter 2. (Source: The authors) There are nine columns of data: Column Name
Description
Data from the n=103 females in the pennstate1 data set. (Source: The data are a subset of the pennstate1 data set.) There are nine columns of data: Column Name
Description
Data from the n=87 males in the pennstate1 data set. (Source: The data are a subset of the pennstate1 data set.) There are nine columns of data: Column Name
Description
Data from n=205 students in a statistics class for students in the social and behavioral sciences. The survey was done in the Spring semester of 2000. (Source: The authors) There are eight columns of data: Column Name
Description
Data from n=227 students in a statistics class for students in the social and behavioral sciences. The survey was done during the Spring semester of 1997. (Source: William Harkness, Pennsylvania State University) There are seven columns of data: Column Name
Description
Data from n=75 students in a statistics class for biology and premed students The survey was done during the Spring semester of 1999. (Source: Laura Simon, Pennsylvania State University) There are four columns of data: Column Name
Description
Data from n=175 students in a statistics class for liberal arts and social science students The survey was done during the Spring semester of 2000. (Source: The authors.) There are three columns of data: Column Name
Description
Persons per household in the United States in Census years from 1850 to 2000. (Source: The World Almanac and Book of Facts, 1999, p. 383 and the U.S. Census Bureau) There are two columns of data: Column Name
Description
Physical measurements for n=55 college students. Measurements were made by the students during a class activity. (Source: William Harkness, Pennsylvania State University.) There are ten columns of data: Column Name
Description
poverty Poverty rates, teen birth rates and violent crime rate in the states of the U.S and in the Disctirct of Columbia. Data are for the year 2000. (Source: U.S. Census Bureau, www.census.gov) There are three columns of data: Column Name
Description
ProfBooks Number of pages, price, and whether the book is hardcover or softcover for n=15 books on a professor's bookshelf. (Source: The authors) There are three columns of data: Column Name
Description
Students in a statistics class measured their pulse rates before and after marching in place for one minute. The sample size is n = 40. (Source: Laura Simon, Pennsylvania State University) There are three columns of data: Column Name
Description
Annual rainfall in inches in Davis, California from 1951 to 1997. The data ae used for Example 2.9. There are two columns of data: Column Name
Description
For the 50 States and the District of Columbia, the statewide mean verbal and math SAT scores in 1998. The data set also includes the estimated percent of high graduates who took the SAT in each state. (Source: The World Almanac and Book of Facts, 1999, p. 245) There are four columns of data: Column Name
Description
For n=30 drivers, the driver's age and the maximum distance at which the driver can read a highway sign at nigh. The data are used in examples in Chapters 5 and 14. (Source: Based on data collected by Last Resource, Inc., Bellefonte, PA) There are two columns of data: Column Name
Description
Hours of sleep the previous and hours of studying the previous day for n=116 students The data are used for a correlation example in Chapter 5.. (Source: The authors) There are two columns of data: Column Name
Description
Highway death rates (per 100 million vehicle miles of travel) and maximum speed limits in 10 countries. (Source: "Fiftyfive mph speed limit is no safety guarantee," D. J. Rivkin, New York Times (letters to the editor), Nov. 25, 1986, p. 26.) There are three columns of data: Column Name
Description
Data collected during 2004 and 2005 from students in statistics classes at a large state university in the northeastern United States. (Source: The authors.) There are three columns of data and n = 690: Column Name
Description
Geographic latitude and mean temperature in January, April, and August for 20 cities in the United States. (Source: The World Almanac and Book of Facts, 1999, p. 220 and p. 456) There are five columns of data: Column Name
Description
Data from an experiment done to examine the "anchoring" effect, which is that a person's estimate of a quantity is influenced by numbers given in prior questions. At a conference for teachers of business statistics, n = 45 participants were randomized to two experimental groups. One group answered the question "Do you think the population of Turkey is more than 10 million?" The other group answered the question " Do you think the population of Turkey is more than 70 million?" All participants were then asked to estimate the population of Turkey, to the nearest million. If the anchoring effect holds, participants asked if the population is more than 70 million would tend to give higher estimates than those who were asked if it is more than 10 million. (Source: The authors) There are two columns of data: Column Name
Description
Data from n=173 college students in two different statistics classes. One class was for students not in the liberal arts (n=148), while the other class was for students in the liberal arts (n=25). The data were collected in the Spring quarter of 2000. (Source: The authors) There are twelve columns of data: Column Name
Description
Data for the n=94 females in the UCDavis1 data set. (Source: This is a subset of the UCDavis1 data set.) There are twelve columns of data: Column Name
Description
Data for the n=79 males in the UCDavis1 data set. (Source: This is a subset of the UCDavis1 data set.) There are twelve columns of data: Column Name
Description
Data from n=239 college students. The data were collected in the Fall quarter of 2000. (Source: The authors) There are fourteen columns of data: Column Name
Description
Selfreported heights and parents' heights for n=18 college students. (Source: A subset of the dataset UCDavis2) There are five columns of data: Column Name
Description
Selfreported heights and parents' heights for n=94 college women. (Source: A subset of the dataset UCDavis1) There are four columns of data: Column Name
Description
wineratings Quality ratings of Pinot Noir wines made in three different regions. (Source: Minitab, Inc.) There are four columns of data: Column Name
Description
Weight and height data for n=43 college men. The data are used in examples in Chapter 14. (Source: Laura Simon, Pennsylvania State University) There are three columns of data: Column Name
Description
Data from the 2001 Youth Risk Behavior Surveillance System survey of U.S. high school students. These data are for 12th graders, aged 17 and 18 years old, who say they drive. (Source: http://www.cdc.gov/nccdphp/dash/yrbs/) There are five columns of data: Column Name
Description
YouthRisk03 Data from the 2003 Youth Risk Behavior Surveillance System survey of U.S. high school students. These data are for 12th graders, aged 17 and 18 years old, who say they drive. (Source: http://www.cdc.gov/nccdphp/dash/yrbs/) There are five columns of data: Column Name
Description

