Extending Functionality

Phase 3


Overview

In this phase, we enhance the basic functionality of FabFlixs developed in Phases 1 and 2.


References


What to Turn In

  1. A WAR file, labeled with your team name, containing the Web application portion of FabFlixs, including your LEDA source code, including a ReadMe file explaining how to compile, load, and otherwise prepare your system for use, and how to run it. Make sure your archive's extention is WAR. The root directory of the WAR hierarchy should be your team's name.

    Use these conventions for the database you submit:

    Be sure to include all source code files in your WAR, in the directory sources within the WEB-INF subdirectory.

  2. In one ZIP file, labeled with your team name, your updated employee access system and a ReadME file explaining how to compile, load, and otherwise prepare it for use, and how to run it. Make sure the ZIP file's extension is ZIP.


Enhancing FabFlixs with Stored Procedures

Fuzzy Searching

Currently, when a FabFlixs customer is searching for a movie by its title, director or (one of) its stars (that is, searching via a "key" value), the user must spell that key correctly; otherwise, the desired movie will not be retrieved. For example, some stars' names, such as "Schwarzenegger," are sufficiently difficult to spell that customers are likely to spell them incorrectly—and perhaps come to think that we do not stock the Terminator movies when, in fact, we do!

To make it more likely that customers find the movies they desire—and thus increase our likelihood of making sales—the exact string matching in FabFlixs for key searches is to be replaced with a fuzzy matching, one that returns a movie whose value is close to the key's value.

The designers have decided that using the built-in SQL pattern matching commands help in this endeavor, but they are not sufficient; however, combining their capabilities with the Levenshtein Edit Distance Algorithm (LEDA) does a much better job. (See References, above, for a description of these algorithms.) You must use LEDA; you may use the built-in SQL commands as much or as little as you see fit consistent with the goal of quickly finding the movies that are likely to be the ones for which the customer is searching.

The LEDA algorithm is to be implemented as a dynamic function in C (a "PostgreSQL ready" version of it in C is available via a reference link above). To add the LEDA function to FabFlixs, follow the instructions in the "Writing Dynamic C Language Functions..." reference above, substituting the location and name of the LEDA program for the program in the instructions.

Access LEDA via the interface ISCLOSE. ISCLOSE takes three parameters: the first is the key as a string, the second the string against which the key is being compared, the third the maximum distance (as computed in the LEDA algorithm) the two strings can differ and still be considered similar to each other. ISCLOSE returns true if the strings are within or equal to the given difference and false if they are further apart than the given distance. OR the results from LEDA and the built-in SQL pattern matching methods you use to look up keys; for example:

lastName ILIKE '%Schwarseneger%' OR ISCLOSE(lastName, 'Schwarseneger', 2)

The LEDA distance is up to you; you may well find that you get more reasonable results when you change the distance based on the strings being searched; for instance, small LEDA distances work well on short strings, and larger ones tend to produce better results when used with long strings. One good way to get ideas on what distances to use is to employ fuzzy search capabilities on existing web pages; for instance, entering misspelled strings into Google and seeing what string it suggests is one way to gauge how it uses edit distances.

The user interface should not be changed; the customer continues to enter in a key value as before, but now FabFlixs returns results that are similar to, not just the same as, the one requested.

Creating a Movie's Record Set

Adding a movie to a database using the employee "back-end" system developed in Phase 1 is a tricky thing. Because a movie might have several actors and several genres, the employeee has to create a record in the movies table and perhaps records in the stars and genres tables; depending on the order in which these things are done, s/he might then have to create several records in the stars_in_movies and genres_in_movies tables in order to link records properly. Clearly, it would be advantageous to have a function that would take as arguments movie attributes, stars and genres, and correctly create the movie's entire set of records. (We realize that the employee access system may already do this to some extent; but if so, the actions are hidden in various "add records to the database" functions. Adding this function makes this task explicitly available and visible.)

So, required is the stored procedure add_movie. It prompts for all of a movie's information, all of the information for each star in the movie and all of the information for each genre with which the movie is to be associated. If a record corresponding to the star or genre exists, it is linked to the movie; if not, it is created and then linked to the movie. All necessary stars_in_movies and genres_in_movies records are also created. The procedure should print informative status messsages to the user as it performs its task. If a movie's record set cannot be correctly made, the procedure so informs the user, and no changes to the database are made. add_movie is to be written in procedural SQL (known as PL/pgSQL in postgreSQL).

Add this procedure to the back-end system developed in Phase 1. Be sure to have a menu selection for it!

Cleaning the FabFlixs Database

It is an axiom that, in a production database system such as FabFlixs, errors in or with records occur; we need a way to find these errors so they can be addressed.

Required is a way to check the database for errors that also produces a report of those errors. The report is to be an HTML file, such that one can open it in a browser and see a well-organized, easily-read report of the problems discovered, with each problem sufficiently well-described so that an employee (using the FabFlixs back-end system) can go into the database and fix the problems. Any reasonable approach is acceptable, as long as it is implemented as a menu option in the employee system developed in Phase 1: the user issues the command; as the report is generated, status messages are sent to the screen (so the user is assured all is going well); when done, the complete path to the report is provided, and the program returns to the menu.

[Hint: several approaches are available to implement this report; for instance, it could all be done via a dynamic C function (the back-end program accessing it via a callable procedure) that produces the HTML file directly. Perhaps an easier approach (though also perhaps taking longer to execute) is to use a PL/pgSQL procedure to find the errors and to populate a table with information about those errors. The back end then uses JDBC methods to extract the table's messages as a ResultSet and generates the report file from that ResultSet's information. A combination of C and pgSQL routines could also be used.]

This function should check for every database integrity issue is not already automatically checked by the database itself. It should also report the values of data fields that look suspicious. Findings to report include, but are not limited to

Be as thorough as you can. Do note that FabFlixs management has decided that it is legal for

Desired, but not required for full credit (which means that if you do it you can earn credit that can offset points lost for other reasons) is to check each movie banner and trailer URL to determine if it points to an actual, active location at which resides displayable graphical content (e.g., a JPG file); if it does not, that fact is noted in the report.

Desired, but not required for full credit, is that the back-end program, in addition to creating and storing the report, automatically launches a browser (if it is not already running), opens a new window, displays the report (page) in that window and then returns to the menu.


Demonstration

To prepare for the demo, have the WAR file you turned in stored on your demonstration machine, and also have it deployed via Tomcat so that the FabFlixs web site is ready to use. Have the postgreSQL server up and running; have your updated employee database ready to go. Have your ZIP file on your demonstration machine and unzip it when asked. When the demo begins, we will ask you to demonstate the three procedures discussed above, using the customer web site (from the WAR file) and the employee back-end program (from your ZIP file). As before, your task is to convince us, in no more than 15 minutes, that your work is well and completely done and the FabFlixs project is ready to move on to phase 4. Be prepared to tell us the algorithm you used for fuzzy title matching, and to convince us that it was the best approach to take.

Written by Norman Jacobson, May 2005, with some sections adapted from "Project 3," an ICS185
  Winter 2005 exercise written by Chen Li.
References updated, some substantially, to reflect postgresql 8.1.4, and some text clarified,
  by Norman Jacobson, April 2007.
Removed section about additional postgreSQL files needed (installed now in Phase 0); SQL LEDA function name
   changed to ISCLOSE to avoid confusion with LEDA C-code function; minor updates for clarity by
  Norman Jacobson, September 2007 & November 2007.
Clarified that LEDA distance is up to implementor, and can vary with strings being used; added suggestion
  to look on web for ideas on best distance algorithm to employ, by Norman Jacobson, December 2007.
Updated to reflect postgreSQL 8.3 and fixed minor typos, by Norman Jacobson, September, 2008.
Revised to include READ.ME requirement, by Norman Jacobson, December 2008.
Minor polishing by Norman Jacobson, March 2009.