instagram

(Also available in Pyret)

Students consider possible threats to the validity of their analysis.

Lesson Goals

Students will be able to…​

  • Define several types of Threats to Validity

  • Identify those threats by reading the description of an analysis

  • Identify those threats in their own analysis

Student-facing Lesson Goals

  • Let’s identify issues that could affect our data analysis.

Materials

Supplemental Materials

Classroom Visual:

Preparation

🔗Threats to Validity 20 minutes

Overview

Students are introduced to the concept of validity, and a number of possible threats that might make an analysis invalid.

Launch

As good Data Scientists, the staff at the animal shelter are constantly gathering data about their animals, their volunteers, and the people who come to visit.

But just because they have data doesn’t mean the conclusions they draw from it are correct!

Suppose the shelter staff surveyed 1,000 cat-owners and found that 95% of them thought cats were the best pet.

Could they really claim that people generally prefer cats to dogs?

There’s more to data analysis than simply collecting data and crunching numbers.

In the example of the cat-owning survey, the claim that “people prefer cats to dogs” is invalid because the data itself wasn’t representative of the whole population (of course cat-owners are partial to cats!).

There are several major Threats to Validity you should be on guard against:

  • Selection bias - Data was gathered from a sample that is not representative of the population.

    • This is the problem with surveying cat owners to find out which animal is most loved!

  • Bias in the study design - Data was gathered in such a way that it influenced the results, for example, researchers might:

    • Use a “loaded” question like “Since annual vet care comes to about $300 for dogs and only about half of that for cats, would you say that owning a cat is less of a burden than owning a dog?” This could easily lead to a misrepresentation of people’s true opinions.

    • Ask questions in an order that primes respondents by previous questions when responding to a later question. That priming could include providing information, points of comparison that influence rating on a scale, elicit emotional response, etc.

    • Judge other cultures on the standards of their own culture rather than the standards of the culture being studied. This is known as "culture bias".

  • Poor choice of summary data - Even if the selection is unbiased, sometimes outliers are so extreme that they make the mean completely useless at best - and misleading at worst. For example:

    • It doesn’t make sense to average in the wealth of billionaires when you’re trying to get a sense of the wealth of a typical family in a community.

  • Confounding variables - Sometimes there’s an unaccounted for variable that is lurking in the background, influencing both of the variables we are studying and confusing the relationship between them. For example:

    • A study might find that cat owners are more likely to use public transportation than dog owners. But it’s not that owning a cat means you drive less: people who live in big cities are more likely to use public transportation, and also more likely to own cats.

More examples of confounding variables can be found in the correlations lesson: Correlation Does Not Imply Causation!.

And there are many other threats to validity out there!

Investigate

Optional Project: When Data Science Goes Bad

In this Project: When Data Science Goes Bad, students pretend to be terrible data scientists who develop and support claims based on faulty sampling techniques (selection bias, bias in the study design, poor choice of summary data, and confounding variables). This is a fun opportunity for your students to demonstrate their understanding of the impact of various threats to validity.

Life is messy, and there are always threats to validity.

Data Science is about doing the best you can to minimize those threats, and to be up front about what they are whenever you publish a finding.

When you do your own analysis, make sure you include a discussion of the threats to validity!

Synthesize

Why is it important to consider potential threats to validity?

Want to check student mastery of the content you’ve just taught? Administer Threats to Validity Checkpoint 1 (Desmos) to get a snapshot of your students' current level of mastery. Make sure you have created a link or code for your class to the assessment.

If you’d prefer to wait until your students have completed the entire lesson to check mastery, we also offer a cumulative assessment at the end of Fake News!, below.

🔗Fake News! 20 minutes

Overview

Students are asked to consider the ways in which statistics are misused in popular culture, and become critical consumers of some statistical claims. Finally, they are given the opportunity to misuse their own statistics, to better understand how someone might distort data for their own ends.

Launch

You have already seen a number of ways that statistics can be misused:

  • Using the mean instead of the median with heavily-skewed data

  • Using a correlation to imply causation

  • Incorrectly explaining the r-value from Linear Regression as corresponding to something happening "some percentage of the time" instead of describing "the percentage of the variation that is explained by the explanatory variable"

There are many other ways to mislead the audience, including:

  • Intentionally using the wrong chart - Suppose someone was asked to prepare a report on the demographics of the people holding positions of power in their city government. If the city had a significant Black population, and no Black elected officials, it should be cause for further investigation. But, if someone were trying to avoid addressing the issue, they might opt to display a pie chart (hiding that lack of representation) instead of displaying a bar chart (that would show an empty bar) in hopes that nobody would even notice the issue! Note: Pie charts could be used responsibly for this same scenario if a pie chart displaying the demographics of the city’s population was presented alongside a pie chart of the demographics of the city’s elected officials!

  • Changing the scale of a chart - Changing the y-axis of a scatter plot can make the slope of the regression line seem smaller ("look, that line is basically flat anyway!") or larger ("look how quickly things have changed!").

With all the news being shared through newspapers, television, radio, and social media, it’s important to be critical consumers of information!

Investigate

  • On Fake News, you’ll find some deliberately misleading claims made by slimy Data Scientists.

    • Identify why each of these claims should not be trusted.

  • Once you’ve finished, turn to Lies, Darned Lies, and Statistics.

    • Come up with four misleading claims based on data or displays from your dataset.

  • Trade papers with another group, and see if you can figure out why each other’s claims are not to be trusted!

  • What "lies" did you tell?

  • Was anyone able to stump the other group?

Synthesize

  • Where have you seen statistics misused in the real world?

  • Over the next several weeks, keep your eyes peeled for misused statistics and bring the examples you find to class to share!

Want to check student mastery of the content you’ve just taught? Administer Threats to Validity Checkpoint 2 (Desmos) to get a snapshot of your students' current level of mastery. Make sure you have created a link or code for your class to the assessment.

Alternatively, we offer a compilation of all four Checkpoints in Threats to Validity Cumulative Assessment (Desmos).

These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, 1738598, 2031479, and 1501927). CCbadge Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.