Text Files

Here are some text files you can use to practice reading and analyzing texts.

Books from Project Gutenberg:

These texts include a license and table of contents which you should remove before doing analysis on them.

The following code snippet reads a file, and removes all parts of the resulting string which are unwanted for the text analysis. The #START “marker” was added by us in the texts to make this easier.

file = open("prideandprejudice.txt", "r", encoding="utf-8")
text = file.read()
text = text[text.find("#START") + 6:] # removing all text before the actual contents of the book

prince.txt

prideandprejudice.txt

zarathustra.txt

threelittlepigs.txt

The Nunavut Hansard

This file contains the proceedings of the Nunavut parliament in both Inuktikut (Greenlandic) and English.

SentenceAligned.v3.txt