Text Files
Contents
Text Files¶
Here are some text files you can use to practice reading and analyzing texts.
Books from Project Gutenberg:¶
These texts include a license and table of contents which you should remove before doing analysis on them.
The following code snippet reads a file, and removes all parts of the resulting string which are unwanted for the text analysis. The #START
“marker” was added by us in the texts to make this easier.
file = open("prideandprejudice.txt", "r", encoding="utf-8")
text = file.read()
text = text[text.find("#START") + 6:] # removing all text before the actual contents of the book