INFX 141 / CS 121 • DAVID G. KAY • UC IRVINE • WINTER 2015
Text Processing Functions
This assignment is to be done individually; you may not use code written by your classmates. Use code found over the Internet at your own peril -- it may not do exactly what the assignment requests. If you do end up using code you find on the Internet, you must disclose the origin of the code. As stated in the collaboration guidelines, concealing the origin of a piece of code is plagiarism. Use Piazza for general questions whose answers can benefit you and everyone.
Project Skeleton: http://www.ics.uci.edu/~kay/courses/i141/hw/Assignment2.zip
Part A: Utilities (20 points)
Write a method that reads in a text file and returns a list of the tokens in that file. Write a method to print out frequency results.
Part B: Word Frequencies (20 points)
Count the total number of words and their frequencies in a token list. ￼
Part C: 2-grams (30 points)
A 2-gram is two words that occur consecutively in a file. For example, "two words", "words that", and "that occur" are all 2-grams from the previous sentence.
Count the total number of 2-grams and their frequencies in a token list.
Part D: Palindromes (30 points)
A palindrome is a words or phrase that reads the same in both directions. For example, these are all palindromes: "kayak", "Do geese see god", "A man, a plan, a canal--Panama". Count the total number of palindromes and their frequencies in a text file.
Once you have implemented your palindrome counting algorithm, please perform a short analysis of its runtime complexity: Does it run in linear time relative to the size of the input? polynomial time? exponential time? This analysis should go in the
analysis.txt file in this package.
Submitting Your Assignment
Submit your assignment via Checkmate (
Your assignment will be graded on the following four criteria: