Setting Up the FabFlixs Database; Allowing Employee Access

Phase 1


Overview

In phase 1, we set up the schema and related metadata for the FabFlixs database and populate the database with initial information gleaned from a previous system. Testing is included to help insure the database has been built and populated properly.


References


What to Turn In

In one ZIP file, labeled with your team name


Creating the Fabflixs database

Creating the database is straightforward:


Creating the database tables

The following chart gives the FabFlixs database schema; given for each table is its name, its attributes, and constraints on the attributes.

Each varchar() field for which there is no data (i.e., the field contents are missing or unknown) is the empty string (''); other non-required fields which have no data are null. Required fields have the constraint that they are not null, not empty and do not consist solely of whitespace (blanks, tabs, etc.). An attribute or group of attributes designated as a primary key should be implemented as such. (See PostgreSQL 8.3.7 Documentation, Chapter 5.3, Constraints for more information about constraints.)

SQL statements to insert the information from the previous system into their appropriate tables are in the script file FabFlixsTestData.txt. The tables from which this data came sometimes do not have the same structure as the corresponding FabFlixs tables, as FabFlixs is a major upgrade; obviously, you will need to read the data out of these records and place it into the correct FabFlixs fields, in the correct format. The affected data, and what must be done to make it FabFlixs compatible, is discussed below.

Credit card data and ids

Note that the previous system stored credit card numbers as numbers in the credit card file, but as strings in the customer file (yet another good reason to move to a new system)! In FabFlixs, the credit cards numbers are stored as strings (see the table definitions, below); be sure to convert the data as necessary.

Typical (though perhaps unexpected) credit card account and use rules that affect the FabFlixs schema include the following:

For security reasons, and to avoid an update of the customer file if credit card info is changed, we specifiy the use of a numeric, automatically generated key in the customer file that references (is a foreign key of) the credit card file. But there is no such key in the data to be imported into the FabFlixs database; the previous system did not use this identifier, but instead used the credit card number as the key. An attempt to insert existing records into the customer table will have credit card numbers treated as keys (a violation of requirements); an attempt to insert a credit card record would result in data populating the wrong fields! Yet the specs must still be met; what to do?

This issue can be handled a few ways; here's one straight-ahead solution:

The imported data may have errors; we must deal with them:

Director names

  • The provided data has each director's name as one field, with the name beginning with the first name and ending with the last. Thus, searching and sorting by director, required in a later phase, is rather problematic. Required, then, is to have the director's name broken into last name and "rest of name." A reasonable approach to accomplish this is to

    Use your best efforts to extract the last name from the directors's name; management knows that it's impossible to be perfectly accurate.

    Stars and Genres

    It is legal for a movie(e.g. a documentary about a museum) to have no affiliated stars.

    In this (new) FabFlics system, every movie will have at least one genre. If, when importing data, it is discovered that their is no genre associated with the movie, we still import it, knowing that it will not appear in any "by genre" search (See Phase 2). However, we can find such movies via the data cleaning report (see Phase 3); once this report is available, it is management's intent to correct the database so that all movies have at least one genre.

    Sales

    The previous sytem did not track the number of items sold--each time a DVD was sold, it was entered as a separate line item (e.g., if a single customer bought four copies of Chicken Run in one transaction, four separate records would be added to the sales file). It also did not store the shipping address (it was used for the shipping label, then disgarded). So the test data does not have quantity nor shipping address values, but the new tables expect them; what to do?

    Here is one way to solve the problem:

    One-Step Conversion

    Since the INSERTs to place into the tables is all in one file, and not grouped at all for easy subdivision or modification, best is to effect all the table changes at once; that is

    1. Set up FabFlixs with the 'temporary' table structures, as discussed above, needed for the imported data to be inserted in the correct fields.
    2. Run the script to insert the data into the database.
    3. Update the tables as discussed above, converting data as needed.

    You may find the use of BAT files heplful in this process; note that the "<" operator allows bat files to accept information from a specified input file and ">" sends the results of BAT processing to a text file.

    FabFlixs schema

    Table Name Attributes Notes
    movies id:integer -- primary key
    title:varchar(100)
    year:integer
    director_rest_of_name:varchar(50)
    director_last_name:varchar(50)
    banner_url:varchar(200)
    trailer_url:varchar(200)
    required
    required
    required
    required
    required; if director has one name, put it in director_last_name & director_rest_of_name = 'none'
    URL of movie's "poster"; not required
    URL of trailer; not required
    stars id:integer -- primary key
    first_name:varchar(50)
    last_name:varchar(50)
    dob:date
    photo_url:varchar(200)
    required
    required; for a person with one name, put it in last_name & make first_name 'none'
    required
    not required
    not required
    stars_in_movies star_id:integer, references stars.id
    movie_id:integer, references movies.id
    all attributes required
    genres id:integer -- primary key
    name:varchar(32)
    all attributes required
    genres_in_movies genre_id:integer, references genres.id
    movie_id:integer, references movies.id
    all attributes required
    each movie must be associated with at least 1 genre
    customers id:integer -- primary key
    first_name:varchar(50)
    last_name:varchar(50)
    cc_id:integer, references
       creditcards.id
    address:varchar(200)
    email:varchar(50)
    password:varchar(20)
    all attributes required

    for a person with one name,
       put it in last_name
       & make first_name 'none'
    sales
    customer_id:integer, references customers.id
    movie_id:integer, references movies.id
    quantity_ordered: integer
    shipping_address: varchar(200)
    time:timestamp
    all attributes required
    creditcards id:integer -- primary key
    credit_card_number:varchar(16)
    name_on_card:varchar(100)
    expiration:date
    all attributes required
    if imported credit card number is invalid,
    credit_card_number is set to "invalid"

    You may add additional tables to the database as needed as FabFlixs is developed, but you may not change these tables' attributes, as other company departments' software rely on their current specification.

    Be sure to maintain database integrity; for example, ensure the database does not allow a movie record to be created without its associated stars_in_movie and genres_in_movie records also being created, that "orphan records" (e.g., a credit card record without a corresponding customer record) can't exist, and so on.

    The imported movie data has not been cleaned; that is, it may have errors. Phase 3 will implement data cleaning. In particular, the database may have records that appear to be duplicates; that is, they are (almost) the same except for their ID number. That's ok, in that they will cause no database errors. It's not OK if the records really contain duplicate information, but that will be addressed in Phase 3.

    Once you have populated the database, modify or add to these records to ensure all types of legal records are represented. For instance, the provided set has no records for people with one name; add records that do. We need a robust test set for the remaining phases to ensure specifications are met for all legal data.

    Finally, once the database is populated, further restrict fields as feasible to prevent errors. (Additional error checking, of course, may be done by programs that accept data and place it into the database.)

    For all user interactions with the database's tables, such as with the employee access program below, make very sure the user only is allowed to view, change or look things up in fields that are appropriate to the task. For example, when new records are entered into the database, their ids should be automatically generated: There is no need to force the user to come up with some id that does not duplicate an id already in the table. (See the SERIAL data type, section 8.3.7 of the postgreSQL manual for a good way to generate ids auotmatically.) Yet, when correcting or deleting ani item, such as a star or movie, an employee can often do so more quickly by entering its unique id then first searching by its name and then, say, choosing the particular item from a list (though the program might also allow that method).


    Allowing Employee Access to the Database

    Authorized employees will be allowed to directly access the FabFlixs database; we are to provide a simple interface for them to issue common commands, as well as a way to enter an arbitrary SQL statement. (We do not allow them direct access to psql or pgAdmin because these tools contain powerful —and dangerous— features that FabFlixs employees (except the DB developers) do not require.) For this phase, we get the application l; in a later phase, we add appropriate security so employees can access only functions for which they are authorized.

    In an actual development effort, the l and look-and-feel requirements for these functions would have been specified in every particular. We've purposely left these requirements high-level because we want you to use your skills, experience and reasoned judgment to refine them. Do feel free to ask the instructor or TA for advice! With your project manager's permission, you can also enhance these specifications for potential additional credit; for example, you can use a GUI or Web-based user interface rather than a console interface, or access these functions via a secure web page. But be careful: including additional features that do not work will detract from your score.

    Write a Java program that, using JDBC to interface with postgreSQL, provides for the following lity. It can be console based (or GUI or web-based, as discussed above). You may use a different language and ODBC library, provided you have received approval from your project manager. In any event, you must use posegreSQL.

    When this program is run, the user is asked for the the user name and the user password for the fabflixs database. If all is well, the employee is granted access (and a message to that effect appears on the screen); if access is not allowed, it says why (e.g., the database is not present, the password is wrong). Allow a way for the employee to exit easily. Note that an incorrect password generates an SQLException, so catch this exception to deal nicely with wrong passwords.

    Provide a menu that allows the employee to

    In cases where the requested tasks cannot be accomplished, handle the error gracefully: Do not just pass along some Java exception!

    Whenever feasible, allow the user to cancel a command in progress (in case s/he made an error). For example, when selecting which star to delete from a menu of three choices, allow the user to simply not select any of them. Or, when deleting by ID, ask for confirmation; if the user does not confirm, do not delete the item.

    For those functions where many lines could print, display lines in groups of 20 or so, so that the employee can read the contents before they scroll off the top of the screen. For example, after printing 20 lines, say "Hit any key to continue" and when a key is hit, display the next 20 lines. (Obviously, this will be somewhat different if you use a GUI or Web interface; nonetheless, nicely handle the situation when there is more data than will fit on a screen.)

    Upon completion of any command except the exit commands, the user is returned to the menu.


    Demonstration

    Create a running system from your ZIP file, quickly. Show the correctness, completeness, lity and performance of your database during your demonstration by using the employee interface developed here. Your task is to convince us in no more than 15 minutes that your work is well and completely done and the FabFlixs project is ready to move to phase 2. In particular, be sure to demonstrate to us that your system works as specified for directors, stars and customers with only one name.

    Written for ICS185 Spring 2005 by Norman Jacobson, March 2005.
       Most sections adapted from ICS185 Winter 2005 exercises Project 1 and Project 2 written by Chen Li.
    Updated to include employee information (added ICS 185 Winter 2006 and documented in its Project 3 document by Chen Li and Shengyue Ji) and to reflect newer versions of software, for CS122B Spring 2007 by Norman Jacobson, March 2007.
    Revised to clean up customer-credit card relation by Norman Jacobson, April 2007.
    Revised to include a major section on updating inserted data to fit FabFlixs'schema, including instructions for generating credit card ids, dealing with sales quantity and shipping address, breaking director names into last and first name, and for ensuring ids are automatically generated and fields are properly constrained, by Norman Jacobson, September 2007.
    Revised to explain meaning of "required" in schema tables, fix minor typos, make clear a movie does not need to have any associated stars but must have at least one associated genre and provided instructions for dealing with imported movies that have no genre, by Norman Jacobson, November and December 2007.
    Revised to more fully describe the role and handling of ID numbers in the employee access system, and other minor editing for clarity, by Norman Jacobson, September 2008.
    Minor revisions to reflect postgreSQL 8.3.4 and to make clear that both parts of directors' and stars' names are required, by Norman Jacobson, October 2008.
    Minor revisions to clarify credit card numbers being stored as strings, by Norman Jacobson, November, 2008.
    Minor revisions to require single-name handling during demonstration, by Norman Jacobson, December 2008.
    Minor revisions to reflect postgreSQL 8.3.7; a few typos fixed, by Norman Jacobson, March & April 2009.