Overview
Learning Objectives
Evidence Statementes
Students learn about threats to validity, such as sample size, confounding variables, etc.
Product Outcomes
Materials
Preparation
As good Data Scientists, the staff at the animal shelter is constantly gathering data about their animals, their volunteers, and the people who come to visit. But just because they have data doesn’t mean the conclusions they draw from it are correct! For example: suppose they surveyed 1,000 cat-owners and found that 95% of them thought cats were the best pet. Could they really claim that people generally prefer cats to dogs?
Have students share back what they think. The issue here is that cat-owners are not a representative sample of the population, so the claim is invalid.
There’s more to data analysis than simply collecting data and crunching numbers. In the example of the cat-owning survey, the claim that "people prefer cats to dogs" is invalid because the data itself wasn’t representative of the whole population (of course cat-owners are partial to cats!). This is just one example of what are called Threats to Validity.
Give students time to discuss and share back. Answers: The dog-park survey is not a random sample, the dogs are friendlier towards whomever is giving them food, etc.
Life is messy, and there are always threats to validity. Data Science is about doing the best you can to minimize those threats, and to be up front about what they are whenever you publish a finding. When you do your own analysis, make sure you include a discussion of the threats to validity!