(Also available in CODAP)
Students are introduced to the Data Cycle, a four-step scaffold for answering questions from a dataset…and then generating the next question! Students learn to identify - and ask - statistical questions, by comparing and contrasting them with other kinds of questions.
Lesson Goals |
Students will be able to…
|
Student-facing Lesson Goals |
|
Materials |
|
Supplemental Materials |
|
🔗Introducing the Data Cycle 10 minutes
Overview
Students learn about the Data Cycle, which is a scaffold to support them in asking questions, thinking about how those questions relate to the data in front of them, analyzing that data, interpreting the results, and, ultimately, sharing their Data Story.
*The Data Cycle is from the Mobilizing IDS project and the Guidelines for Assessment and Instruction in Statistics Education (GAISE) Report.
Launch
Data Science is all about asking questions of data.
-
Sometimes the answer is easy to compute.
-
Sometimes the answer to a question is already in the dataset - no computation needed.
-
And sometimes the answer just sparks more questions!
-
With your partner, answer the questions below with any of the following:
-
How much does Snuggles weigh?
-
Are more animals male or female?
-
What strategies did you use?
-
Answers will vary! Sample followup questions for responses are provided.
-
"We looked at the table and counted"
-
What did you look at? What did you count?
-
-
"We used the starter file and the
count
function"-
What inputs did you give the count function? And how did you know which inputs to use?
-
-
"We made a bar or pie chart"
-
How did you know to use that display? How did you know what rows and columns to use?
-
-
Data Scientists ask a ton of questions, and each question adds a chapter to their data story. Even if a question turns out to be a dead-end, it’s valuable to share what the question was and what work you did to answer it!
The Data Cycle is a roadmap, which helps guide us in the process of data analysis.
1) We Ask Questions after observing the data.
2) We Consider Data by thinking about which parts of the dataset we need to answer our question. Sometimes we don’t have what we need, so we find our data elsewhere.
3) We Analyze the Data by completing calculations, creating data displays, creating new tables, or filtering existing tables. The result of this step are calculations, patterns, and relationships.
4) We Interpret the Data by answering our original question and summarizing the process we took and the results of our analysis. Sometimes the data cycle ends here, but often these interpretations lead to new questions… and the cycle begins again.
Investigate
Let’s take the Data Cycle for a spin!
(1) Ask Questions: First we have to think of a good question.
-
In the future, you will be coming up with your own questions. But, for demonstration purposes, this round let’s investigate: *Are more animals fixed or unfixed?*
-
Turn to your partner and discuss what ideas you have about how you might answer this question.
(2) Consider Data: To get the computer to answer our question, we’ll need to decide what part of our dataset to focus on!
-
With your partner turn to the Animals Table or open the Animals Spreadsheet.
-
Do we need to look at all the rows to answer this question, or just some of them?
-
All the rows!
-
-
Do we need all the columns to answer this question, or just some of them?
-
Just the fixed column.
-
(3) Analyze the Data:
Once we know where to look, we can write code to build a table or display.
We could use count
, bar-chart
or pie-chart
to do this analysis and answer our question.
Pie charts might be the best choice, because we care more about the ratio ("2x as many fixed as unfixed") than the actual count ("20 fixed vs. 10 fixed").
Once we decide that we want a pie-chart
, and that we’re using it to look at the fixed
column, the next step is to read the Contract and write the code!
-
Open the Animals Starter File and click "Run".
-
With your partner, build a pie chart to determine whether more animals are fixed or unfixed.
(4) Interpret the Data: Now that we’ve built our display, we can think about what we can learn from it and what else we might want to know. Even the answers to "simple" questions can lead to more interesting questions down the road!
-
What does
true
mean in the display?
-
"Fixed" is a Boolean column, so true means "yes - the animal is fixed"
-
-
Are more animals fixed or unfixed?
-
fixed
-
-
How could we describe that more specifically?
-
56.3% of the animals are fixed.
-
The ratio of fixed animals to unfixed animals is 18 to 14 or 9 to 7.
-
4 more animals are fixed than unfixed.
-
-
What other questions might come from counting the ratio of fixed to unfixed animals?
-
Sample responses: Is there a higher percentage of fixed dogs or fixed cats? At what age do animals get fixed? Do fixed animals get adopted more quickly than unfixed animals?
-
The Data Story describes how each step in the Data Cycle was used to go from a question to an answer, and then to the next question. When analyzing a real dataset, Data Scientists might explore lots of questions, resulting in many different Data Stories to tell.
Let’s take a look at a story that’s been written about the Data Cycle we just completed:
"We wanted to know if more animals at the shelter were fixed or unfixed. To answer this, we made a pie-chart
using the "fixed"
column of all the animals in the shelter. We found that more animals were fixed (18) than unfixed (14). This made us wonder if that percentage is the same for all species and all ages - and whether fixed animals get adopted faster than unfixed ones."
-
What information did they include in the data story? Did they leave anything out?
-
What steps from the Data Cycle do you see in this story?
-
The story included…
-
The question ("We wanted to know if more animals at the shelter were fixed or unfixed.")
-
The data considered ("…the
"fixed"
column of all the animals in the shelter.") -
The analysis ("…we made a
pie-chart
…") -
The interpretation ("..more animals were fixed (18) than unfixed (14)")
-
The Wonders those findings generated ("if that percentage is the same for all species and all ages - and whether fixed animals got adopted faster…")
-
-
Each chapter in the Data Story is valuable, and each turn of the Data Cycle is another chapter to add to your story!
Synthesize
-
What are the four steps of the Data Cycle?
-
Ask Questions
-
Consider Data and decide which rows and columns we need
-
Analyze the Data with calculations and displays
-
Interpret the Data to answer our questions and consider what new questions we have
-
-
What happens when we finish the data cycle?
-
We write our data story.
-
We start a new data cycle to answer our new questions!
-
🔗What Questions Can We Ask? 15 minutes
Overview
Students consider the range of questions we can ask about data and practice categorizing them as "lookup", "arithmetic", "statistical" or questions that simply can’t be answered based on the data.
Launch
How do we know what questions to ask? There’s an art to asking the right questions, and good Data Scientists think hard about what kind of questions can and can’t be answered.
Most questions can be broken down into one of four categories:
-
Lookup questions - Answered by only reading the table, no further calculations are necessary! Once you find the value, you’re done! Examples of lookup questions might be “How many legs does Felix have?” or "What species is Sheba?"
-
Arithmetic questions - Answered by doing calculations (comparing, averaging, summing, etc.) with values from one single column. Examples of arithmetic questions might be “How much does the heaviest animal weigh?” or “What is the average age of animals from the shelter?”
-
Statistical questions - These kinds of questions are the most interesting! And are often best asked with "in general" attached, because we expect some variability and the answer isn’t black and white. If we ask "are dogs heavier than cats?", we know that not every dog is heavier than every cat! We just want to know if it is generally true or generally false!
-
Questions we can’t answer - We might wonder where the animal shelter is located, or what time of year the data was gathered! But the data in the table won’t help us answer that question, so as Data Scientists we might need to do some research beyond the data. And if nothing turns up, we simply recognize that there are limits to what we can analyze.
-
What kind of question is "Are more animals fixed or unfixed?"? How do you know?
-
It’s an arithmetic question because answering it requires comparing two simple calculations.
-
-
What kind of question is "Are snails or tarantulas taller?"? How do you know?
-
It’s a question we can’t answer because there isn’t any information in this data set about the heights of the animals.
-
-
What kind of question is "How old is Toggle?" How do you know?
-
It’s a lookup question because it can be answered by just looking at the table.
-
-
What kind of question is "Are older animals adopted more quickly than younger animals?" How do you know?
-
It’s a statistical question because we expect some variability in the data and are wondering what is happening in general.
-
Investigate
Find the table at the bottom of Which Question Type?.
For now, complete only the "Question Type" column - ignore the other columns titled "Which Rows" and "Column(s)".
-
Have students return to the Wonders they wrote on Questions and Column Descriptions in the Introduction to Data Science lesson. Decide whether they are Lookup, Arithmetic, Statistical or Can’t Answer questions?
-
For more practice, have students complete Question Types: Animals, by coming up with examples of each type of question for the Animals Dataset.
Common Misconceptions
Students generally struggle to make the leap into asking statistical questions. It’s worth taking time on this, to support them coming up with better (and more engaging!) questions later.
-
They may think that "What’s the average weight of the animals?" is a statistical question, because "average" is a term that shows up in statistics. But computing the average is just pure arithmetic!
-
A statistical question would be "What’s the typical weight of an animal?", because it does not specify a particular arithmetic process. The answer could be the mean, the median, or even the mode! Figuring out which one to use depends on the distribution of the data, which we discuss in detail in our Measures of Center lesson.
Synthesize
-
How would you explain the difference between Lookup, Arithmetic, and Statistical questions?
-
When you looked back at your Wonders from the Animals Dataset, were they mostly Lookup questions? Arithmetic? Statistical?
-
What are some examples of statistical questions the owner of a sports team might ask? Or a researcher who is trying to see if a cancer drug is effective? Or a principal who wants to know what will help their students the most?
🔗What Data Do We Need? 20 minutes
Overview
Students bridge from a human-language question into something more formal, by specifying the rows and columns they would need to examine.
Launch
Tables are made of Rows and Columns.
Each Row represents one member of our population.
-
In the Animals Dataset, each row represents a single animal.
-
In a weather forecast, each row might represent the weather at a particular hour.
-
In a dataset of students, each row might represent one of you!
Columns, on the other hand, represent information about each row.
-
Every animal, for example, has columns for their name, species, sex, age, weight, legs, whether they are fixed or unfixed, and how long it took to be adopted.
-
Our weather table might have columns for temperature, wind, and whether or not it will rain.
-
Every student could have columns for their name, height, hair color, birthday, favorite food, etc.
When considering data…
-
We first ask: Which Rows do we need?
-
Then we ask: Which Column(s) do we care about?
If we want to know which animal is the heaviest,
-
we are interested in every row of our table,
-
and we’ll focus on the
pounds
column of our table.
If we want to know which cat is the heaviest, we only care about rows for cats, so
-
first, we’ll need to make a new table of the rows for cats,
-
then, we’ll focus on the
pounds
column of our new table.
Data scientists filter tables to make new tables all the time!
While we haven’t learned how to filter and build tables in Pyret yet, we are ready to start thinking about it…
Which Rows and Columns do we need to answer each of the following questions?
-
How old is Mittens?
-
We only need one row for Mittens, and we just need the
age
column
-
-
Are more animals fixed or unfixed?
-
We needed to look at all the rows, but the only column we care about is
fixed
.
-
-
How many fixed animals are rabbits?
-
First, we’ll make a new table of just the rows for fixed animals.
-
Then, we’ll focus only on the
species
column in our new table.
-
Investigate
-
Return to Which Question Type? For each question, determine:
-
Which columns would you look at?
-
Write your answers in the last two columns of the table at the bottom.
-
-
Complete Data Cycle: Consider Data.
Have students share their answers and discuss any questions they have about these pages.
Synthesize
How does asking "Which rows? Which columns?" help us figure out what code to write?
🔗Data Cycle Practice 15 minutes
Overview
Students are introduced to the Data Cycle Pages they will be working with for the remainder of Bootstrap:Data Science.
If you’d like to start your students with a mini version of the Data Cycle, have them complete Data Cycle: Analyzing with Count and test their code in Pyret.
Launch
Throughout the remainder of Bootstrap:Data Science we will be using Data Cycle pages to help us answer our questions and tell our data stories.
-
Let’s take a moment to Notice and Wonder about how Data Cycle: Distribution of Fixed Animals is formatted.
So far we have always worked with the Animals Starter File, which is a sample taken from a larger data set.
To complete this page we will be working with the Expanded Animals Starter File.
-
What else do you Notice?
-
Be sure to surface the following:
-
the directions at the top tell you what kind of display you are going to make.
-
there is a box in the top right corner where you will circle what kind of question is being asked.
-
the first data cycle has already filled in which rows and columns you’ll need, but you’ll have to fill that in yourself for the second data cycle.
-
there is a fill in the blank sentence in the interpret section as well as room for you to write some questions
-
-
What do you Wonder?
Investigate
-
When you’re done, work on Data Cycle: Distribution of Categorical Columns.
-
For this page you will need to come up with your own questions.
-
You might be able to use a question from your first data cycle!
-
Synthesize
How do Contracts and the Data Cycle work together to help us figure out what code to write to answer our questions?
These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, 1738598, 2031479, and 1501927). Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.