A while back, we found that a nation’s total GDP explained about 30% of the variation in median life expectancy. In other words, our predictor wasn’t that great. That’s not a surprise: more populous countries will usually have a higher GDP! We’re just comparing big countries to small ones, not rich countries to poor ones. Do you think that the GDP per person - which is a measure of individual wealth - would be a better indicator?
Turn to Page 36 and take two minutes to write down what you think.
We have our countries table, but that only tells us the total GDP. We need to find a way to add a new column to the table, which is calculated based on other columns. By the end of this lesson, you’ll know how to do just that.
Extending Tables
Overview
Learning Objectives
Evidence Statements
Product Outcomes
Materials
Preparation
Extending Tables
(Time 10 minutes)
Turn to Page 37 in your workbook, and complete the exercise.
In this activity, students manually compute entries to new columns, based on values in a table’s columns that area already filled in.
In this workbook exercise, you’ve done something very different from what we’ve learned so far: you computed a new column based on existing data in the table. In the sock exercise, this means creating a new column called price-per-sock, and filling in a new value for each row, based on the price and socks values in that row. Creating a new table with an additional column is called extending a table.
Do these outputs match your answers for the workbook exercise?
extend produces a table with a new column, containing data that is computed by an expression over each row. It is similar sieve in that it applies this expression to each row, but is different because it CREATES data instead of removing it.
Let’s examine the code used to create the basketball-extended table:
The extend keyword comes first, since we might want to use our new columns for sieving, sorting or selecting later.
It is followed by the table being extended, in this case basketball.
Then, the using keyword, in the same place as the sieve commands This tells Pyret which columns we will use to extend the table.
Next are the names of the columns being used, followed by a colon :
The expression on the next line is very special:
We give our new column the name total, followed by a colon :, then an expression to compute the value of each entry in the column. This expression is the sum of a player’s points in each game.s
Finally, the query ends like all other table queries; with the end keyword.
We encourage you to question the class individually on what they think is the purpose of each component of the extend query. Ask "what do the names after the using keyword are", etc.
In your Definitions Area, follow the directions in the code for the first exercise, to extend the stores table in Pyret. You should create a column called price-per-sock.
This could also be a group activity, where you are live coding and prompting the class to help you write the correct definition.
Suppose we want to find out which menu items have the highest amount of sodium per gram. Use the sodium and serving-size columns to write an extend query that will answer this question.
Body Building
Overview
Learning Objectives
Evidence Statements
Product Outcomes
Materials
Preparation
Body Building
(Time 15 minutes)
Now let’s get some practice using extend queries with our Table Plans! Turn to Page 38, and read the word problem carefully.
On a sheet of scrap paper, sketch a sample Start and End Table to make sure you have a clear picture of what you need to do.
Now start your table plan: do you need to add any new columns to the table? Yes! We need to add a protein-per-gram column, which is computed using the protein and serving-size columns. Write the extend query to do this.
Do we need to get rid of any rows? Yes! We want to keep only the rows that have a protein-per-gram higher than .12. Write the sieve query to do this.
Do the rows need to be any particular order? It’s not clear from the word problem, so we can choose: do we skip this step, or should we order them so your aunt can immediately see which food is best?
Do we have any unecessary columns? Yes! All we care about are the name, calories, and protein-per-gram columns. Write the select query to do this.
Once you’re done with your Table Plan, program your queries in the Definitions Area, under "4. Body Building".
Debrief and review with the class.
On page Page 45, write down what an order query is for.
Have some student volunteer what they wrote.
Term-Length
Overview
Learning Objectives
Evidence Statements
Product Outcomes
Materials
Preparation
Term-Length
(Time 15 minutes)
How common is it for a president to serve two terms? We’d like to see a histogram comparing 1-term vs. 2-term presidents. The length of a term is defined by .
Turn to page Page 39, and read the word problem carefully.
Complete the Table Plan to create the table you’ll need.
Extract the columns you’ll need to create the histogram
Define the histogram series, and then plot the histogram. If you’ve forgotten how to make a historgram series, a plot, or how to draw the plot, don’t forget to refer back to your contracts page!
Economic Indicators
Overview
Learning Objectives
Evidence Statements
Product Outcomes
Materials
Preparation
Economic Indicators
(Time 30 minutes)
Economics data scientists use something called a gdp-per-capita to compare countries’ productivity. This is calculated by dividing a country’s GDP by population. This estimates how productive the average citizen of a particular country is:
Have students calculate this by hand for one or two countries in the countries table.
The United Nations has hired our class to come up with a report on Africa’s economy. They want us to investigate whether population size or per-capita GDP is the more significant predictor for median life expectancy.
Turn to page Page 40, and read the word problem carefully.
Complete the Table Plan to create the table you’ll need.
Extract the columns you’ll need to create the scatterplots and predictor functions for both relationships: "population v. life expectancy" and "per-capita-GDP v. life expectancy".
Define the two scatterplot series, and then plot them both.
Define your predictor functions and r-squared values. Plot your predictor functions on top of your scatterplots, and use the r-squared values to determine which predictor better explains median life expectancy.
The UN also wants following information:
A bar chart of the gdp-per-capita for every country in Africa.
The mean and median gdp-per-capita values for countries in Africa.
Add code to your definitions window to answer these questions.
Closing
Overview
Learning Objectives
Evidence Statements
Product Outcomes
Materials
Preparation
Closing
(Time 15 minutes)
Open the Extend Syntax Errors file, and see if you can fix all the bugs you find. Once you’re done, uncomment each query by removing the hash sign (#) and click Run.
Take a few minutes and record your findings on Page 36. Do your findings match your hypothesis? What new questions does this raise?
You now have all the tools you need to complete your analysis for your final project! Open your Project File and Report, and finish the remaining questions. Be sure to explain why you chose to remove any rows with sieve or add any columns with extend, and to add the r-squared value for any correlations you find through linear regression.