Exercises

If you ever get really stuck on an exercise, skip it and ask for help!

Exercise 4.1: Plotting

a) Simulate 10000 throws of two dice, and record in a list the highest of the two dice at each throw.

  • Plot the frequency of the numbers 1 through 6.

  • In your simulation, how many throws have 1 as the highest of the two dice?

  • In your simulation, how many throws have 6 as the highest of the two dice?

b) Create a list with 20 elements [1, 2, 4, 8, 16, ...], each element being the double of the preceding one. Plot the growth of the elements in the list.

c) Start with the following list.

guestInfo = ['23', '3', '18', '2', '21', '3', '75', '19', '68', '12', '34', '9', '34', '7', '36', '7', '28', '21', '21', '4', '55', '14', '55', '18', '45', '21', '52', '16', '54', '12', '21', '6', '21', '6',]

The list guestInfo contains strings representing information about different guests of a resort. The first element represents the age of a guest, while the second element represents how many days this guest stayed at the resort (‘’length of stay’’). Likewise for every pair of subsequent elements. So, for example, guestInfo[4] represents the age of a guest and guestInfo[5] represent how many days this guest stayed; guestInfo[6] represents the age of a different guest, while guestInfo[7] represents how many days this different guest stayed. Perform the tasks described below.

c1) Print the number of guests whose information is represented by the list guestInfo.

c2) Rearrange the list so that that ages are in ascending order. So, after the change, guestInfo[0] will be the age of the youngest guest and guestInfo[1] will be the length of this guest’s stay.

c3) Convert to an integer every element of the list.

c4) The list guestInfo contains multiple guests of the same age. We are interested in the average length of stay for each age. So we derive the list below containing the age and expected value of the length of stay of guests of that age.

guestInfoAverage = [18, 2.0, 21, 4.75, 36, 7.0, 45, 21.0, 52, 16.0, 54, 12.0, 55, 16.0, 68, 12.0, 75, 19]

From guestInfoAverage and respecting order, produce two lists: guestInfoAge and guestInfoAverageDays. The first includes only the ages in guestInfoAverage (even positions). The second includes only the expected values of the lengths of stays (odd positions).

c5) Produce a lineplot of guestInfoAge and guestInfoAverageDays to see if there is any trend relating age and the expected length of stay. (Add useful text to make your chart user-friendly.)

c6) Think about how you could have proceeded if the initial list guestInfo contained information gaps marked as ‘’n/a’’.

Exercise 4.2: Correlation

a) Without using the method perarsonr, calculate the Pearson correlation coefficient of the following two lists, which represent the ages and life expectancy at each age. (You may use methods from numpy.)

ages = [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]
expected_years = [21.58, 20.0, 20.08, 19.35, 18.0, 17.0, 17.18, 16.47, 15.0, 15.07, 14.39, 13.0, 13.05, 12.4, 11.0, 11.14, 10.53, 9.94, 9.37, 8.0, 8.28, 7.0, 7.26, 6.0, 6.33, 5.0, 5.48, 5.08, 4.0, 4.37]

(Data from https://www.health.ny.gov/health_care/medicaid/publications/docs/gis/20ma08_att_i.pdf male column)

b) Create a list with the 82 integers from 18 to 99 in ascending order (i.e. [18, 19, …, 99]). Create another list of the same length with random integers between 0 and 20.

  • What value for the Pearson correlation coefficient do you expect for these sets of data? Why?

  • Calculate the Pearson correlation coefficient of the two lists using the function pearsonr.