Checking Your Work

Contribute email twitter instagram facebook

Students consider the concept of trust and testing — how do we know if a particular analysis is trustworthy?

Prerequisites

None

Relevant Standards

Select one or more standards from the menu on the left (⌘-click on Mac, Ctrl-click elsewhere).

CSTA Standards

2-AP-17: Systematically test and refine programs using a range of test cases
3B-AP-21: Develop and use a series of test cases to verify that a program performs according to its design specifications.

K-12CS Standards

6-8.Computing Systems.Troubleshooting: Comprehensive troubleshooting requires knowledge of how computing devices and components work and interact. A systematic process will identify the source of a problem, whether within a device or in a larger system of connected devices.
9-12.Computing Systems.Troubleshooting: Troubleshooting complex problems involves the use of multiple sources when researching, evaluating, and implementing potential solutions. Troubleshooting also relies on experience, such as when people recognize that a problem is similar to one they have seen before or adapt solutions that have worked in the past.
P6: Testing and Refining Computational Artifacts

Next-Gen Science Standards

HS-SEP5-4: Use simple limit cases to test mathematical expressions, computer programs, algorithms, or simulations of a process or system to see if a model “makes sense” by comparing the outcomes with what is known about the real world.

Oklahoma Standards

OK.L1.IC.C.02: Test and refine computational artifacts to reduce bias and equity deficits.

Lesson Goals

Students will be able to… - Create a subset of data to verify that a given transformation works as-advertised, using attributes of the transformation and the dataset.

Student-facing Lesson Goals

Let’s learn how to test the trustworthiness of a data analysis.

Materials

Preparation

Make sure all materials have been gathered.
Decide how students will be grouped in pairs.
Computer for each student (or pair), with access to the internet.
Student workbook, and something to write with
Make sure all students can access the Trust-but-Verify Starter File

Supplemental Resources

Language Table

Types

Functions

Values

Number

num-sqrt, num-sqr, mean, median, modes

4, -1.2, 2/3

String

string-repeat, string-contains

"hello", "91"

Boolean

==, <, <=, >=, string-equal

true, false

Image

triangle, circle, star, rectangle, ellipse, square, text, overlay, bar-chart, pie-chart, bar-chart-summarized, pie-chart-summarized, histogram

🔵🔺🔶

Table

count, .row-n, .order-by, .filter, .build-column

🔗Confirming Analysis 30 minutes

Overview

Students learn how to create a Testing Table, which is small enough to reason about and can be used to test whether code does the right thing.

Launch

Samples are taken in Data Science and Computer Programming for two different reasons. One of the main purposes of Data Science is to take a representative sample from a larger population, and use information from the sample to infer what’s true about the whole population. In programming, we often extract a smaller Table from a larger one, for the purpose of testing that our code seems to do what it’s supposed to. In this lesson, we focus on the tasks of programmers, and consider best practices for setting up a Testing Table that helps us check our code.

Uber and Google are making self-driving cars, which use artificial intelligence to interpret sensor data and make predictions about whether a car should speed up, slow down, or slam on the brakes. This AI is trained on a lot of sample data, which it learns from. What might be the problem if the sample data only included roads in California?
Law enforcement in many towns has started using facial-recognition software to automatically detect whether someone has a warrant out for their arrest. A lot of facial-recognition software, however, has been trained on sample data containing mostly white faces. As a result, it has gotten really good at telling white people apart, but often can’t tell the difference between people who aren’t white. Why might this be a problem?
Why might it be a bad thing to only test medicines only on men (or only on women), before prescribing them to the general public?

Testing Matters!

A good Testing Table should be representative of the population, and relevant to what’s being analyzed. A good Testing Table should have…

At least the columns that matter — whether we’ll be ordering or filtering by those columns.
Enough rows to include different circumstances that are relevant to the task at hand. For instance, if our code is supposed to extract certain cats from the animals table, our Testing Table should include at least one animal that’s not a cat.
Rows that aren’t already sorted, if our analysis is supposed to sort for us.

Data scientists usually think in terms of samples that best serve the purpose of performing inference: Samples should be representative of the entire population, and large enough to get us fairly close to the truth about that population. Computer programmers need to think in terms of Testing Tables that best serve the purpose of verifying that their code does what it’s supposed to: The Tables should be designed to call attention to any imperfections in the code’s instructions.

Investigate

Testing Tables can also be used to verify that a certain analysis is correct. Code that filters a table to show only cats can’t be verified with a Testing Table that already has only cats. (Why not?)

Code that shows only the kittens…sorted in ascending order by weight must be verified by a Table containing cats, non-cats, old and young cats… and rows that aren’t already sorted!

Turn to “Trust, but verify …” (Page 67) in your student workbook.
You’ve been given a function called fixed-cats and a description of what it claims to do.
List the names of the animals that you would use in a Testing Table to verify whether the function works as advertised. When you’ve finished, open the Trust-but-Verify Starter File. There are three versions of fixed-cats here. Are they all correct? If not, which ones are broken?
Turn to “Trust, but verify…” (Page 68). Using the same Starter File, construct a Testing Table and figure out which (if any) of the functions are correct!

Synthesize

Complex analysis has more room for mistakes, so it’s critical to think about a Testing Table that allows us to trust that our code really does what it’s supposed to!

How would you check whether or not a facial recognition system was equally accurate for everyone?

These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, and 1738598). Bootstrap:Data Science by Emmanuel Schanzer, Nancy Pfenning, Emma Youndtsmith, Jennifer Poole, Shriram Krishnamurthi, Joe Politz, Ben Lerner, Flannery Denny, and Dorai Sitaram with help from Eric Allatta and Joy Straub is licensed under a Creative Commons 4.0 Unported License. Based on a work at www.BootstrapWorld.org. Permissions beyond the scope of this license may be available by contacting schanzer@BootstrapWorld.org.