ICS 184 / EECS 116 Summer 2006
Assignment #1

Due date and time: Wednesday, July 12, 11:59pm


Introduction

A database is simply defined as a collection of information relevant to some person, some program, or some organization. The collection can be as straightforward as a homogeneous list of employees and their job titles, or as complex as the complete set of interrelated information maintained by a bank about its customers, employees, assets, and investments.

In applications of a realistic scale, merely understanding the application's data requirements can be a challenge. The other half of the battle, turning these requirements into a software system that manages the data, is even more difficult if done from scratch. Fortunately, there are well-understood techniques for performing what's called data modeling, a conceptualization of what data needs to be stored, and how data relates to other data. Furthermore, there are existing software systems called database management systems that are adept at storing and manipulating data modeled using such techniques. The existence of data modeling techniques and database management systems greatly reduces the work required to build an application that requires a database.

There are multiple data modeling techniques in use, but by far the most prevalent is called the relational model, which we've been discussing in lecture. In the relational model, data is organized into relations with one or more attributes, with each relation being a set of tuples, and each tuple containing a unique grouping of values for the attributes. For example, a database for a hospital might contain a relation called patient, with attributes like patient_id, name, address, and admitted_date. Each tuple in the relation would contain information about a particular patient. Two tuples could have the same value for some of the attributes — such as two patients with the same name and/or address — but not for all of them.

Relational databases excel at managing the interrelationships between data in different relations. In our hospital example, a relation called visit might consist of attributes like patient_id and doctor_id, along with other information such as reason and visit_date. The presence of the patient_id and doctor_id attributes relates information about each instance of a patient visiting a doctor with identifying information about the patient and the doctor, without requiring that identifying information to be duplicated in the visit relation. Later this quarter, we'll explore issues, such as duplication of information, that arise in the design of a relational database.

For this assignment, I'd like you to get some practice formulating queries for a relational database that has already been designed. I'm asking you to use relational algebra to formulate your queries, so that you can focus on the task of selecting just the right information, joining together related data from multiple relations when necessary, without worrying about the syntactic and semantic details of programmatic query languages like SQL, which can sometimes be unintuitive. Subsequent assignments will encourage you to explore other parts of the broader task of data management, such as expressing and testing queries and data definitions programmatically using SQL, modeling data for the purposes of designing a database, and connecting to a database from another program.


The database

For this assignment, we'll work with a database underlying the operations of a hypothetical company that manages multiple automobile dealerships throughout the United States.

The following are the data requirements of our enterprise.

Keeping in mind the data requirements described above, the database has been designed to consist of the following relations, with the attributes making up the primary key of each relation underlined. Spend some time looking over these definitions, and make sure you understand why the primary key was chosen as it was. (For example, why is social_security_number not included in the primary key of sale?)

car(vin, make, model, year, mileage, asking_price, invoice_price)
dealership(dealership_id, dealership_name, dealership_street_address, dealership_city, dealership_state)
salesperson(salesperson_id, salesperson_name)
customer(social_security_number, customer_name, customer_street_address, customer_city, customer_state)
reports_to(salesperson_id, managing_salesperson_id)
works_at(salesperson_id, dealership_id, month_worked, base_salary_for_month)
inventory(vin, dealership_id)
sale(vin, social_security_number, salesperson_id, dealership_id, sale_price, sale_date)

The domain of the month_worked attribute of the works_at relation is unique months, such as March 2006 or September 2008.


The queries

Write a relational algebra expression for each of the following queries, which are intended to operate on a database designed as described in the previous section of this assignment. For each query, any legal relational algebra expression that yields the correct result is acceptable.

  1. Find the names of all salespeople who have ever worked for the company at any dealership.
  2. Find the names of all salespeople who are managed by a salesperson named "Joe Jones".
  3. List the VIN, make, model, year, and mileage of all cars in the inventory of the dealership named "Irvine Toyota Sales".
  4. List the VIN, year, and mileage of all Toyota Camrys in the inventory of the dealership named "Irvine Toyota Sales". (Note that a Toyota Camry is indicated by the make being "Toyota" and the model being "Camry".)
  5. Find the name and Social Security Number of all customers who bought a car at a dealership located in a state other than the state in which they live.
  6. Find the names of all salespeople who do not have a manager.
  7. Find the name of the salesperson that made the largest base salary working at the dealership named "Irvine Toyota Sales" during January 2006.
  8. Find the salesperson ID and name of all salespeople who have worked at each one of the company's dealerships at some point in time.
  9. List the name, salesperson ID, and total sales amount for each salesperson who has ever sold at least one car. The total sales amount for a salesperson is the sum of the sale prices of all cars ever sold by that salesperson.
  10. Find the name and salesperson ID of the salesperson who sold the most cars for the company at dealerships located in California between March 1, 2006 and March 31, 2006.
  11. Calculate the payroll for the month of March 2006.
    • The payroll consists of the name, salesperson ID, and gross pay for each salesperson who worked that month.
    • The gross pay is calculated as the base salary at each dealership employing the salesperson that month, along with the total commission for the salesperson that month.
    • The total commission for a salesperson in a month is calculated as 7% of the profit made on all cars sold by the salesperson that month.
    • The profit made on a car is the difference between the sale price and the invoice price of the car. (Assume, for simplicity, that cars are never sold for less than the invoice price.)

You are permitted to use the assignment operation to create temporary relations for the purposes of simplifying your queries, but it is not required.


Deliverables

Write all of your queries in a single file in a Microsoft Word (.doc), Rich Text Format (.rtf), or PDF document. Please be sure to use the appropriate Greek characters and mathematical symbols, rather than making up your own equivalents; this will help us greatly in grading the assignment, especially since there are multiple correct solutions to most of these problems. Submit the one file containing your queries and nothing else.

Follow this link for a discussion of how to submit your assignment. Be aware that I'll be holding you to all of the rules specified in that document, including the one that says that you're responsible for submitting the version of the assignment that you want graded. We won't regrade your assignment simply because you submitted the wrong version by accident.