{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data Processing and Statistics"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Learning goals:\n",
    "\n",
    "After this week, the student will be able to:\n",
    "- Read and write files using Python\n",
    "- Calculate the expected value, variance and standard deviation of a set of numbers\n",
    "- Write programs using user input\n",
    "- Solve decision problems using expected values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Statistical measures\n",
    "\n",
    "Before the lecture you should read these excerpts covering <a href=\"https://uio.instructure.com/courses/33071/files/1339652\">An Introduction to Expected Value and Decision</a>. \n",
    "\n",
    "The lecture will focus on how we estimate expected value from measurements, so you should first have a grasp on how we find the expected value exactly when we know the underlying distribution, and how we can reason using expected values."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reading and Writing Text Files "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The standard way to access text files in Python is using the built-in function `open`. This function takes two arguments; the first argument is the name of the file, and the second argument specifies whether the file shall be read or written to. If the second arguments is `\"r\"`, it means we want to open the file for reading, whereas giving `\"w\"` means we want to open the file for writing. If some text is to be appended to the contents of an already existing file, we can use `\"a\"` as an argument. Giving `\"w+\"` as the second argument to `open` will tell the machine that the file is to be both written to and read. The default option for the second arguments is `\"r\"`, so if we don't provide a second argument to `open`, Python will assume that we want to be in read-mode.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Writing Text to a File"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Writing to a text file in Python can be done by using the built-in function `open`, and giving the string `\"w\"` as the second argument to the `open`-function. In the following example, a file with the name `textfile.txt` is created. The file name extension `.txt` is customary for text files, but we could in principle give the file any extension we want. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "ofile = open(\"textfile.txt\", \"w\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The command above creates a file on the computer, called `textfile.txt`, and creates a `File` object that we call `ofile`. `File` objects is something we use in Python to handle files. We want to proceed by filling this file with some information. Below we fill the file with the sentence \"This is line number\" and then the number of the line. This is done by creating a string constaining the quote, and writing this string to the file `textfile.txt` through the `write`-function on the `File`-object, as shown below. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "for i in range(10):\n",
    "    ofile.write(f\"This is line number {i+1:.0f}\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When we are finished writing to the file, we close it using the `close`-function on the `File` object. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [],
   "source": [
    "ofile.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This makes sure that everything we wanted to write to the file is actually written there, and it can prevent slowdown when very large files which are opened are no longer needed. This is similar to ejecting a USB stick in the operating system before actually detaching it from the computer physically. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reading Text from a File"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we want to analyze texts, we usually want to read files, rather than writing them. To read files, we use the built-in Python function `open`, with `\"r\"` as the second argument, to create a `File`-object that reads a particular file. We can use the `read`-function on the `File`-object to get the entire contents of the file returned as a single string. This is shown in the example below, which opens the file that we just created in the last section."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This is line number 1\n",
      "This is line number 2\n",
      "This is line number 3\n",
      "This is line number 4\n",
      "This is line number 5\n",
      "This is line number 6\n",
      "This is line number 7\n",
      "This is line number 8\n",
      "This is line number 9\n",
      "This is line number 10\n",
      "\n"
     ]
    }
   ],
   "source": [
    "ifile = open(\"textfile.txt\", \"r\")\n",
    "content = ifile.read()\n",
    "print(content)\n",
    "\n",
    "ifile.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you only want to read a single line, you can use the `readline` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This is line number 1\n",
      "\n"
     ]
    }
   ],
   "source": [
    "ifile = open(\"textfile.txt\", \"r\")\n",
    "content = ifile.readline()\n",
    "print(content)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `File`-object reads from the beginning, and cannot re-read something it has read."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This is line number 2\n",
      "\n",
      "This is line number 3\n",
      "This is line number 4\n",
      "This is line number 5\n",
      "This is line number 6\n",
      "This is line number 7\n",
      "This is line number 8\n",
      "This is line number 9\n",
      "This is line number 10\n",
      "\n",
      "The file has reached the end, so nothing will be printed below\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(ifile.readline())\n",
    "print(ifile.read())\n",
    "print(\"The file has reached the end, so nothing will be printed below\")\n",
    "print(ifile.read())\n",
    "\n",
    "ifile.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When we ask Python to open the file `textfile.txt`, Python looks for this file in the same folder as where we are running Python from. We typically run from the same path as where the `.py` file is saved. Therefore, we usually want to have the text file in the same folder as the Python program reading it. If your text file is close to your python script, but not in the same folder, you can use the relative path to the file to open it. For instance: `Poems/textfile.txt`.\n",
    "\n",
    "If the text file is somewhere completely different, you can tell the computer explicitly where to look for it. If the file `\"textfile.txt\"` lies in the `Documents`-folder on your computer, you can (on unix-based operating systems) use the absolute path `\"/Users/username/Documents/textfile.txt\"`, as the first argument of the `open`-function. The username is the name of the user at the computer you are using."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "### In-class exercises \n",
    "\n",
    "**a)** Read the file `zarathustra.txt` from the course webpage. Use the `split` function to divide the text into a list of words, and then count the number of occurences of the name \"Zarathustra\". Note that \"Zarathustra\" and \"zarathustra\" are regarded as different words by the computer.\n",
    "\n",
    "**b)** Read the file `prince.txt` from the course webpage. Use various string operations to clean the text in the following way\n",
    "  1. Make all letters lowercase\n",
    "  2. Remove special signs such as `.,:;` so that they don't occur at the end of words. \n",
    "  3. Split into a list of words\n",
    "  4. Count the number of occurences of the following words\n",
    "    1. \"prince\"\n",
    "    2. \"dante\"\n",
    "    3. \"pope\"\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Working with data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have now learned how to read text from a file into python. Next we need to actually do something useful with it.\n",
    "\n",
    "Given a text, we might be interested in how long the words are, if it is different for books written for adults and children. Are words typically longer in norwegian than in english? Has it changed over time? How diverse are the words? How long are the sentences?\n",
    "\n",
    "In this section, we will go through how to calculate the average of a list of numbers, and later the standard deviation.\n",
    "\n",
    "First, we get hold of a list of words."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['chapter', '1', 'it', 'is', 'a', 'truth', 'universally', 'acknowledged', 'that', 'a', 'single', 'man', 'in', 'possession', 'of', 'a', 'good', 'fortune', 'must', 'be', 'in', 'want', 'of', 'a', 'wife']\n"
     ]
    }
   ],
   "source": [
    "ifile = open(\"assets/prideandprejudice.txt\", \"r\", encoding=\"utf-8\") # the text file is found on the course website under \"Extra Resources\"\n",
    "text = ifile.read()\n",
    "text = text[text.find(\"#START\") + 6:] # removing all text before the actual contents of the book\n",
    "# making the text easy to work with\n",
    "for sign in [\"?\", \"!\", \".\", \"-\", \",\", \"“\", \"”\", \"_\", \":\", \"(\", \")\"]:\n",
    "    text = text.replace(sign, \" \") # removing unwanted characters\n",
    "text = text.lower() # we do not care about the case\n",
    "words = text.split() # making a list of words\n",
    "\n",
    "print(words[:25])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "ifile.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we create a list containting the lengths of all the words."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4.421409125200905\n"
     ]
    }
   ],
   "source": [
    "lengths = []\n",
    "for word in words:\n",
    "    lengths.append(len(word))\n",
    "from numpy import mean    \n",
    "print(mean(lengths))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Printing the first 25 numbers to make sure it worked:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[7, 1, 2, 2, 1, 5, 11, 12, 4, 1, 6, 3, 2, 10, 2, 1, 4, 7, 4, 2, 2, 4, 2, 1, 4]\n"
     ]
    }
   ],
   "source": [
    "print(lengths[:25])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "````{admonition} The length of the first 1000 words\n",
    ":class: toggle\n",
    "Here are the lenghts of the first 1000 words if you need a list of numbers to do some statistics on.\n",
    "\n",
    "```\n",
    "lengths = [7, 1, 2, 2, 1, 5, 11, 12, 4, 1, 6, 3, 2, 10, 2, 1, 4, 7, 4, 2, 2, 4, 2, 1, 4, 7, 6, 5, 3, 8, 2, 5, 2, 4, 1, 3, 3, 2, 2, 3, 5, 8, 1, 13, 4, 5, 2, 2, 4, 5, 2, 3, 5, 2, 3, 11, 8, 4, 2, 2, 10, 3, 8, 8, 2, 4, 3, 2, 5, 2, 5, 9, 2, 4, 2, 6, 4, 3, 4, 2, 3, 3, 3, 4, 3, 5, 4, 11, 4, 2, 3, 2, 4, 2, 6, 7, 4, 2, 3, 3, 3, 2, 2, 8, 4, 3, 3, 4, 3, 4, 4, 4, 3, 3, 4, 2, 3, 5, 2, 2, 6, 4, 2, 6, 2, 3, 3, 4, 2, 4, 3, 3, 5, 2, 5, 3, 4, 11, 3, 4, 2, 4, 2, 3, 1, 4, 2, 9, 2, 7, 2, 4, 3, 10, 6, 3, 2, 4, 3, 4, 4, 3, 4, 4, 4, 11, 2, 5, 2, 1, 5, 3, 2, 5, 7, 4, 3, 5, 2, 8, 4, 2, 4, 4, 2, 6, 2, 1, 6, 3, 4, 2, 3, 3, 5, 3, 3, 2, 4, 9, 4, 2, 4, 2, 6, 4, 2, 6, 12, 4, 2, 2, 2, 4, 10, 6, 10, 3, 4, 2, 3, 8, 3, 2, 2, 2, 3, 5, 2, 3, 3, 2, 4, 4, 4, 2, 3, 4, 7, 2, 2, 7, 2, 6, 2, 6, 2, 4, 2, 2, 4, 1, 6, 3, 2, 5, 8, 4, 2, 4, 8, 1, 4, 4, 1, 4, 5, 3, 3, 5, 3, 2, 3, 3, 2, 6, 4, 2, 4, 2, 6, 7, 3, 4, 3, 3, 3, 2, 2, 8, 3, 4, 4, 4, 1, 2, 8, 2, 3, 8, 3, 2, 4, 2, 4, 3, 6, 2, 8, 4, 6, 8, 3, 3, 3, 4, 2, 3, 2, 2, 4, 6, 4, 2, 3, 4, 2, 4, 4, 3, 2, 4, 3, 9, 3, 4, 5, 3, 2, 4, 2, 2, 5, 1, 3, 2, 8, 3, 4, 3, 3, 3, 5, 3, 2, 2, 3, 3, 4, 4, 2, 10, 5, 7, 4, 2, 5, 6, 3, 2, 3, 3, 2, 8, 2, 3, 2, 4, 2, 7, 3, 4, 3, 3, 4, 2, 3, 5, 2, 4, 3, 7, 2, 1, 9, 4, 3, 2, 5, 2, 6, 3, 1, 2, 3, 7, 2, 2, 8, 13, 3, 4, 1, 5, 3, 4, 5, 2, 9, 3, 5, 2, 4, 4, 8, 2, 3, 3, 6, 2, 4, 5, 1, 5, 3, 3, 5, 4, 6, 2, 5, 2, 3, 2, 4, 3, 4, 6, 2, 3, 3, 2, 7, 4, 2, 5, 4, 3, 13, 2, 2, 4, 4, 1, 6, 3, 1, 6, 3, 3, 8, 4, 9, 4, 5, 4, 2, 13, 2, 5, 2, 3, 3, 2, 4, 3, 7, 3, 4, 5, 3, 10, 2, 2, 6, 2, 4, 7, 3, 2, 7, 3, 4, 4, 5, 2, 9, 6, 3, 4, 2, 3, 2, 4, 2, 10, 3, 2, 2, 5, 3, 2, 3, 2, 3, 3, 3, 4, 10, 6, 1, 4, 3, 2, 7, 4, 2, 4, 4, 2, 3, 4, 3, 1, 4, 4, 1, 3, 5, 2, 3, 2, 6, 3, 2, 2, 6, 7, 2, 3, 8, 9, 2, 7, 2, 3, 6, 6, 1, 4, 5, 2, 1, 4, 4, 3, 2, 6, 5, 1, 6, 3, 4, 2, 2, 4, 5, 5, 2, 3, 1, 3, 6, 4, 3, 7, 3, 1, 2, 4, 3, 2, 3, 4, 2, 8, 2, 4, 3, 4, 2, 4, 8, 2, 5, 3, 3, 3, 6, 6, 3, 3, 10, 4, 4, 4, 2, 4, 4, 2, 9, 4, 7, 3, 4, 3, 3, 5, 3, 8, 4, 5, 6, 3, 5, 3, 9, 4, 2, 9, 4, 3, 7, 2, 6, 3, 3, 3, 5, 4, 3, 8, 2, 4, 1, 3, 3, 4, 7, 2, 6, 2, 3, 4, 2, 10, 3, 2, 4, 6, 3, 7, 2, 2, 4, 1, 4, 1, 4, 7, 3, 4, 6, 4, 3, 2, 3, 7, 1, 4, 5, 3, 7, 4, 4, 13, 5, 4, 6, 5, 2, 5, 2, 3, 2, 3, 4, 4, 1, 6, 3, 1, 4, 3, 4, 3, 4, 2, 3, 4, 2, 3, 4, 5, 3, 2, 4, 8, 1, 4, 4, 4, 3, 13, 2, 4, 2, 2, 3, 2, 2, 2, 6, 4, 6, 4, 5, 3, 4, 3, 5, 4, 6, 4, 2, 2, 4, 4, 4, 5, 3, 6, 1, 4, 5, 4, 3, 2, 6, 3, 2, 3, 1, 7, 2, 5, 5, 9, 6, 7, 3, 7, 4, 3, 10, 2, 5, 3, 6, 5, 3, 4, 12, 2, 4, 3, 4, 10, 3, 9, 3, 4, 3, 4, 9, 2, 7, 3, 3, 1, 5, 2, 4, 13, 6, 11, 3, 9, 6, 4, 3, 3, 12, 3, 7, 7, 7, 3, 8, 2, 3, 4, 3, 2, 3, 3, 9, 8, 3, 6, 3, 8, 3, 4, 7, 1, 2, 6, 3, 5, 3, 8, 2, 5, 3, 6, 2, 2, 7, 2, 3, 6, 8, 2, 5, 3, 6, 2, 3, 4, 6, 8, 3, 4, 4, 2, 6, 3, 3, 3, 4, 3, 7, 5, 3, 5, 3, 4, 3, 3, 2, 9, 2, 2, 2, 3, 4, 9, 2, 3, 9, 6, 9, 3, 6, 8, 8, 2, 8, 1, 3, 2, 8, 9, 3, 4, 1, 4, 2, 7, 4, 4, 2, 5, 2, 3, 3, 2, 1, 3, 2, 4, 4, 2, 7, 5, 4, 3, 6, 11, 5, 2, 3, 3, 2, 5, 3, 3, 6, 5, 4, 9, 4, 2, 5, 4, 3, 2, 3, 10, 3, 4, 3, 4, 8, 2, 9, 3, 1, 2, 3, 7, 3, 4, 4, 2, 3, 4, 5, 3, 3, 3, 6, 2, 3, 3, 3, 2, 1, 7]\n",
    "\n",
    "\n",
    "```\n",
    "\n",
    "````"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Calculating Expected Value\n",
    "\n",
    "Before we analyze the lengths of words, we will look at an example which better illustrates the statistical properties expected value and standard deviation calculated on a sample of a larger dataset.\n",
    "\n",
    "One of the things we can infer from looking at data is the expected value of an event, in this case the expected age of death of a newborn.\n",
    "\n",
    "To find the expected value of an event from data, we take the average of the values observed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The expected age is 48.0 years\n"
     ]
    }
   ],
   "source": [
    "ages = [2, 11, 23, 64, 89, 91, 10, 20, 95, 8, 36, 100, 84, 62, 6, 16, 7, 89, 19, 62, 19, 27, 94, 80, 4, 21, 68, 97, 64, 72]\n",
    "\n",
    "n = len(ages) #The number of datapoints\n",
    "\n",
    "total = 0 #We tally up the total to calculate the average at the end\n",
    "\n",
    "for age in ages:\n",
    "    total = total + age\n",
    "    \n",
    "EV = total / n\n",
    "\n",
    "print(f\"The expected age is {EV:.1f} years\") #round rounds a number to a specified number of decimals, here 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "The data shows that the life expectancy of this population is 48 years. This does not mean that a person will most likely become 48 years old however, only that the average age of many people will be around 48 years. If you look at the data, you can actually see that not a single person died in their 40's or 50's, showing that expected value and most likely value are two very different things.\n",
    "\n",
    "This behaviour is somewhat typical of life expectancy data. Due to high child mortality rates in developing countries, the life expectancy might be very low, while adults becoming old is still quite common. You see this in Sierra Leone where the life expectancy at birth is 52.5 years, while at 5 years old it is 59.7, a much larger increase than you see in Norway, where it goes from 80.6 to 80.8. (Source: <a href=\"https://www.worldlifeexpectancy.com/country-health-profile/sierra-leone\">worldlifeexpectancy.com</a>)\n",
    "\n",
    "Keep in mind that we only found the expected value of our sample. The expected value of the actual population can be slightly different. The larger the sample, the more certain we can be that the values we calculate are actually close to the actual values for the population. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "### In-Class Expected Value Exercises\n",
    "\n",
    "Alice and Bob both offer you what they call a great oppurtunity to get rich quick. They say that by giving them 10 dollars, you have a good chance of making large gains. Below are example gains from people taking them up on their offers in the past:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [],
   "source": [
    "alice = [-10, -10, -10, -10, -10, -10, -10, 1000, -10]\n",
    "bob = [-10, 200, -10, 200, -10, 200, 200, 50, 50]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**a)** Whose offer has the highest expected gains?\n",
    "\n",
    "**b)** How would you describe what you can expect when taking them up on their offer? Does anything but the expected value matter?\n",
    "\n",
    "**c)** Calculate the average word length of Pride and Prejudice. Calculate the average word length of The Three Little Pigs text. The text files can be found on the course website under \"Extra Resources\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Calculating Variance and Standard Deviation\n",
    "\n",
    "To describe this large spread in ages of death we can calculate the variance of the data. To do this we take the average of the squared differences from the expected value(not as complicated as it sounds!). We could have just looked at the average difference from the expected value, but it is often of more interest to make large deviations from the expected value result in very large changes in the variance.\n",
    "\n",
    "When the data we have is only a sample of the total population, the variance tends to be too small compared to the actual variance of the total population. To account for this, we divide by $n - 1$ instead of $n$ when taking the average. The standard deviation is the square root of the variance, and is the value most often used to describe the amount of dispersion in the data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{figure} assets/VarianceFigure.png\n",
    "---\n",
    "name: my-figure\n",
    "---\n",
    "Example of calculation of sample variance and standard deviation\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The variance is 1247.6 years squared\n"
     ]
    }
   ],
   "source": [
    "total = 0\n",
    "\n",
    "for age in ages:\n",
    "    squaredDifference = (age - EV)**2\n",
    "    total = total + squaredDifference\n",
    "    \n",
    "variance = total / (n - 1)\n",
    "\n",
    "print(f\"The variance is {variance:.1f} years squared\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This number is very large and not very intuitive (It has units years squared!). The standard deviation is the square root of the variance, and is often much more useful."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The standard deviation is 35.3 years\n"
     ]
    }
   ],
   "source": [
    "std = variance**0.5\n",
    "\n",
    "print(f\"The standard deviation is {std:.1f} years\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This number describes how spread out our data is. If all of the ages were in the range 60-80, the standard deviation would be much smaller."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "### In-Class Variance and Standard Deviation Exercises\n",
    "\n",
    "\n",
    "You have contracted a deadly disease and have 12 months left to live. There is an operation which has a chance to greatly increase your remaining time here on earth.\n",
    "\n",
    "This is how long people in your exact position lived after taking the operation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [],
   "source": [
    "monthsLeft = [0, 30, 2, 0 , 1, 0, 412, 0, 2, 1, 0, 5, 0, 12, 44, 0, 0, 0, 5, 0, 1, 0, 0, 0, 203, 40, 6, 0, 0, 3, 2, 0, 2, 0, 0, 5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**a)** Calculate the expected value of remaining months if you take the operation.\n",
    "- Should you take the operation?\n",
    "\n",
    "**b)** Calculate the sample variance of the remaining months if you take the operation.\n",
    "\n",
    "**c)** Calculate the standard deviation of the remaining months if you take the operation.\n",
    "- Is the operation risky?\n",
    "\n",
    "**d)** How large a percentage died within 0 months of taking the operation?\n",
    "\n",
    "**e)** Calculate the variance of the expected word length of a text.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## User Input - Communicating with your program"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Python has a built in funtion called `input()` which prints a message to the terminal and then returns whatever the user writes in the terminal as a string."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdin",
     "output_type": "stream",
     "text": [
      "What is your name?:  Karl Henrik\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello Karl Henrik\n"
     ]
    }
   ],
   "source": [
    "name = input(\"What is your name?: \")\n",
    "print(\"Hello \" + name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you want to input a number and then use it for calculations, you need to turn the string into an integer or float first."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Type conversion"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can turn one type of variable into another type like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "myString = \"15\"\n",
    "myInt = int(myString) #The string has to contain a whole number! \"15.2\" would not work\n",
    "myString2 = str(myInt)\n",
    "myFloat = float(myInt)\n",
    "myFloat2 = float(myString) #The string has to contain only a number! \"Hey 24.4\" would not work, \"24.4\" would"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Turning a float into an integer can often be useful, because some functions will not accept a float as an input, even if it is a whole number.\n",
    "\n",
    "To input a number and save it as an integer, you could do something like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "name": "stdin",
     "output_type": "stream",
     "text": [
      "How old are you? 23\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "In five years, you will be 28 years old\n"
     ]
    }
   ],
   "source": [
    "numberAsString = input(\"How old are you?\")\n",
    "\n",
    "number = int(numberAsString)\n",
    "number = number + 5\n",
    "\n",
    "print(f\"In five years, you will be {number} years old\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "User input can quickly make a program take much longer to write and test. If you want to include it, you should wait until everything else works."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "### In-Class User Input Exercises\n",
    "\n",
    "**a)** Write a program that asks you for your name, and prints it out with a greeting.\n",
    "\n",
    "**b)** Write a program that takes a number as a user input, converts it to an integer, and prints out twice the number.\n",
    "\n",
    "---"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}