Exercises
Contents
Exercises¶
Exercise 6.1: Rhetorics of US Presidential Inaugural Adresses¶
In the following exercises, the task is to study the US Presidential Inaugural Adresses introduced in the lecture notes, to say something about the rethorics of different American presidents. The adresses of interest for the following exercises will be the speeches of Lincoln (1861), Roosevelt (1937) and Trump (2017).
a) Find the most frequently occuring words in the speeches of Lincoln (1861), Roosevelt (1937) and Trump (2017).
Hint: You may want to save the vocabulary items of each speech in a dictionary where the keys of the dictionary are the vocabulary items, and the dictionary values are the number of times each vocabulary item appears in the text. The function sorted
in python can then be used to sort the dictionary items. The following line sorts the values of a dictionary called “wi” in decreasing order, returning a list of tuples containing the most frequent word and the number of times it occurs as first element, etc.:
sorted_tuple = sorted(wi.items(), key = lambda x : x[1], reverse=True)
b) Find the hapaxes of the same three speeches. A hapax is a word that occurs only once in the given context.
c) Find all long words, consisting of 15 or more characters.
d) Find all medium-long words of relatively high frequency. That is, find all words consisting of 6 or more symbols, making up more than 0,3 % of the speeches.
e) From the results found above, do you see any differences or similarities between the presidents? If so, is it possible to conclude something about f.ex. the president’s character, rhetorics and supporters?
Exercise 6.2: Evolution of the Language of The Bible¶
In this exercise we will study the evolution of the English language through two versions of The Bible, King Jame’s Bible (KJB) from 1611 (http://www.gutenberg.org/cache/epub/10900/pg10900.txt), and The World English Bible (WEB) from 2000 (http://www.gutenberg.org/cache/epub/8294/pg8294.txt).
a) Looking at the two versions of The Bible, find all words that both consist of more than 6 characters and make up more than 0.2% of the respective texts. Comment the result.
b) Tabulate the total number of words used in the two versions of The Bible, and find average sentence length and average word length for each text. Comment the result.
c) Find the lexical diversity score of the two texts. The lexical diversity of a text is defined as the range of word types relative to the size of the text. Comment the result.
d) Plot the cumulative word length distribution for the two versions of The Bible, i.e. plot the percentage of words of length 1, length 2, length 3 etc, as a function of the length of the words. What do you observe?
Hint: You may want to make use of the built-in Python function set
, which returns a set of the items of a list. A set is a collection of distinct objects.
Another useful function is the function count
which counts the number of times that a given object occurs in a chosen list. It can be used in the following way,
number = chosen_list.count(chosen_object)