Dialysis Database

Assignment 5

In this exercise, you (begin to) develop a database of kidney dialysis patient data. You prepare a data entry screen to accept information about patients, calculate some results based on the entered data, and place selected results in a summary report. In doing these tasks, you become familiar with data processing chores common to scientific inquiry, learn to use a database as a reporting and investigative tool, and apply a number of the information representation and presentation concepts we’ve discussed.


What to Turn In

• An electronic copy of the database you created


Grading

You will be graded on the correctness, completeness, and presentation of your layouts, on how well you have structured your database (its representation), whether your database contains the required fields with correct attributes, and how well it checks for input errors. Be prepared to show your database and layout in action, should the TA so request.

Background

A person’s kidneys can stop working, usually because of trauma (injury) or disease. Sometimes they will recover their function, but if they fail for good, the body can’t remove the (poisonous) by-products that come from its own metabolic processes and, in a few days, the person dies. Rather grim. However, these days there are three major treatments that can compensate for the lethal effects of “end stage renal disease” (ESRD, for short).

The first is “hemodialysis.” Two or more times a week, a person is “hooked up” to a dialysis machine. The patient’s blood is routed through a complex system of filters and chemicals that remove the toxins. The treatment can take a few hours; it’s mind-bogglingly expensive (hundreds of dollars per treatment); there’s some danger of infection and side effects. It’s also typically not quite as effective as the kidneys, so, over time, a patient often has to increase the number or duration of treatments. It’s so psychologically taxing (imagine having your life literally and completely dependent upon a machine) that some patients need ongoing psychological counseling.

The second is a kidney transplant. A healthy kidney, from a “matched” living or dead donor, is surgically placed into the ailing patient. This is major surgery, with its attendant dangers and costs, both to the dialysis patient and the live donor. The transplant patient is often on (expensive) drugs indefinitely to prevent the body from rejecting the kidney. The patient usually doesn’t need dialysis for some time, but, eventually, often does (but not as often, typically, as before the transplant). Sometimes the new kidney stops functioning, and the patient must get another transplant or go back on full-blown dialysis.

The third is a class of dialysis treatments where the patient wears a filtration system that’s partially in the abdominal wall and partially external; the external “bag” is replaced every so often. It’s fairly effective, but often not as effective as hemodialysis. Surgery, with its risks, is required to install the “interior portion” of the system. Because of the external/internal connection and bag changing, major infection is a common problem, often resulting in hospitalization, and sometimes (at least temporary) discontinuance of this approach and a turning to hemodialysis or transplantation. The most common form of this dialysis technique is called CAPD. (That abbreviation often stands for the entire class of these techniques; so it will in this exercise).

Because all these alternatives are so expensive, the federal government picks up the tab. The government mandates that dialysis treatment facilities and transplant hospitals keep detailed records on patients with ESRD: It is quite interested in tracking how its money is being spent, in detecting and stopping fraud, and in learning which treatments are the most cost-effective. The collected data is forwarded to the ESRD Networking Coordinating Council—“the Network,” for short–a national group of offices that insures the accuracy of and prepares reports about the data (among many other duties).

Each Network office also typically prepares several monthly reports. Some provide profiles of the patient population or the facilities at which they were treated; others point up missing or inconsistent data, so the office can call the appropriate facility and clear up the problem. Each office sends some of these data to CMS to add to its national dialysis information bank.

The Network and CMS databases are also available to qualified researchers. There’s much about treating ESRD that is still unknown, especially about which treatments are most effective in which patient populations, an issue you can tackle, if you like, as optional work.

For more information about the ESRD Networks and the data they collect, see www.esrdnetworks.org; for more about CMS, see www.cms.hhs.gov.

About databases, briefly

A modern database contains the data itself, command sets (often called macros) that manipulate the data, forms that describe how data should appear on the screen, and report layouts that describe how data should appear on printed reports. The data is stored in one or more tables. Each row of a table is one record or instance of the data (in our case, a patient); each column is a field that contains an attribute that applies to all records (our fields include birth date and date of first dialysis, among others). Along with each field is information about its characteristics; for example, for birth date we might store that the field’s name is DOB, that it is a date and that the date is stored in a particular format. Databases are designed so that it is straight-forward to do computations (say, grouping of records according to some criterion or counting the number of records that have a gender of female), retrieve subsets of information, sort records, and prepare reports.

The codebook

To maintain and analyze data properly, one must know its attributes. These include its data type, its size, and, if the data is a code for something, what each value of the code means. This information is commonly placed in a codebook. The table below shows the codebook for the dialysis information that will be stored in the database for this exercise. dx is an abbreviation for dialysis, tx for transplant. By the way, designing codebooks is a craft all its own, one with its own set of representational and presentational issues.

Attribute Name

Type

Size

Description

NetID Numeric 5 identification number; unique to each patient; assigned by the database as a new patient’s data is entered
MissDeth Character 1 X  if a patient has died but the official death notice form
   has not been received,
blank otherwise
DOB Text 8 Date of birth in MMDDCCYY format
MM = 99 when month is unknown
DD = 99 when day is unknown
CC = century; can only be 18, 19, 99 or 20;
99 = year of birth unknown
YY = remainder of year (blank if CC = 99)
Gender Character 1 X - unknown


F - female


M- male
ICDA5 Numeric 5 Standard, national disease codes; used to indicate disease that lead to dx
DeathDate Text 8 Date of death in MMDDCCYY format
MM = 99 when month is unknown
DD = 99 when day is unknown
CC = century; can only be 19, 99 or 20;
99 = year of birth unknown
YY = rest of year (is blank if CC = 99)
entire date is blank if patient is alive
DeathPri Numeric

2

Primary cause of death
0 - patient alive
The information about each event is stored as a triplet: the event itself (as a code), the date it occurred, and the facility at which it occurred. Three triplets are stored. The first set is Event1, EvDate1 and EvFac1; the second is Event2, EvDate2 and EvFac2; the third is Event3, EvDate3, and EvFac3.
Eventx Numeric 2 Event
0: no event
1: hemodialysis at a facility
4: hemodialysis at home
11 - 13: CAPD
16: recovered function
17: discontinued dx, patient’s choice
31-39: tx
97: changed facility (only; no dx change)
99: death
EvDatex Text 8 Date event occurred, MMDDYY
MM = 99 when month is unknown
DD = 99 when day is unknown
CC = century; can only be 19, 20 or 99;
  :99 = year unknown
YY = 99 when year is unknown
if year is unknown, entire date is considered missing,
  code 99999999.
entire date is blank if event not yet occurred
EvFacx Text 6 Code for facility at which event occurred
999999 = unknown

Note that dates are stored as text, rather than as a date type, because date types cannot handle a date with a missing month, day or year, or one that is blank. In real-world data, missing values pretty much always occur, and are important to consider in data analysis.


Getting Help from Access

We will be using Microsoft Access 2003 as our database tool.

Before diving into Access, it is worthwhile to take a moment to introduce you to its extensive help features. This lab will guide you through the process of developing your database, but there will likely be times where you want more information. And learning to use help features is an important general computing skill.

Most features in Access print out help messages as you use them. For example, when you enter the properties for database fields (see below), Access will print out a helpful message whenever you go to set a particular property. In many places in Access, hitting the F1 key will bring up more help.

Perhaps the easiest way to get general help is to use the “Office Assistant,” a context-sensitive help feature that is very easy to use, and really quite good at providing information. To activate it, just select Show the Office Assistant from the Help menu. Click on the paperclick animation, then type in your question and click on Search.

You can also use the other features under the Help menu; in particular, the first option, Microsoft Office Access Help, activates the full-blown Access help feature. There are other help features available; do experiment!


Part I: Data Entry and Validation

Preparing for the data

The first task when creating a database to hold new information is to create an empty database with the appropriate fields and associated attributes.

• Launch Access. To do so, choose Programs from theStart menu, then Microsoft Office, then Microsoft Office Access 2003.

• In the right-hand Getting Started box that appears, select New...; it’s under Open.

The right column will change to show a number of options under New.

• Choose Blank database...

The File New Database dialog box appears, which allows you to give a name to your database and store it. To match the instructions here

• Name the database file kinfo.mdb (type kinfo.mdb into the File name: box) and store it in the \Temp folder (using the Save in: menu at the top of the dialog box to position the file there). Click on Create to save the database file.

The Database dialog box will appear; it has a choice down its left-hand size for each kind of information that (can be) stored in a database (which we’ll use later) and a main menu that shows three ways to create the database.

• Click on Create table in Design view; click on Open.

You’ll now see a grid with a place to enter in each field (Field Name), the kind of data it holds (Field Type) and a Description (helpful comments about the field). As you enter the field name, a menu will appear in the Data Type next to it; you will choose the most appropriate type for the field. When you choose a type, a properties list will appear in the General tab at the lower left of the table box; these properties can be changed as needed. The lower right of the table box always shows informational help as you do your work. Hitting the F1 function key will call up additional help— do use it as needed.

• Using the codebook as your guide, enter a Field Name, a Data Type and a Description. (You can also adjust Field Properties for each field at this time, or wait until you’ve entered all the fields. We discuss more about field properties below.)

Make the field names meaningful, and ones that can easily be matched to the correct field information in the codebook. The data type is just that, an indicator of the kind of data to be stored in that field. Having type information helps Access know what symbols are legal in a field’s value (e.g., numbers can’t include punctuation) and what operations involving that field make sense (e.g., text fields cannot meaningfully be multiplied together).

Text, Number and Yes/No field types are just that. A Memo field is essentially a huge text block; it allows for entering a large amount of text data. Date/Time lets you store date and time information. Currency lets you store (monetary amounts, such as U.S. dollars. AutoNumber has Access automatically put a number into this field as the record is created. Hyperlink stores a link to a Web page; OLE Object stores a link to data the database needs access to that is not stored in the database itself. The Lookup Wizard helps you set up a table that lists the legal values for the field; during the entry of a patient's data, the user chooses a value from that list. You can learn more about these types by hitting the F1 key.

Note that you can’t use a Yes/No field if the possible answers are “Yes,” “No,” and such things as “refused to answer” or “question not asked”: Yes/No fields can only store two values, Yes and No. Note too that Date/Time does not allow for any missing parts (e.g., if you had a date with a missing day, there would be no way to store it as a date without “making up” a day)—so that means, unfortunately, you can’t use the Date/Time type for any field that may contain values representing missing data.

If you enter a field in error, you can delete it by highlighting all of its information (by clicking on the gray box at the far left of the field line), and hitting the delete key.

Make sure you get the data types right before proceeding on to other tasks! Changing them later is doable, but often very messy and time-consuming!

Some examples

To help you get a feel for all this, here’s some fields and their data types; each field uses a different kind of data type. You should not need to use data types other than those shown here (hint, hint!).

NetID: The appropriate field type is AutoNumber, as the codebook says the database should automatically provide this ID.

DOB: We cannot use a Date/Time type, as that will not allow for the invalid dates we use to denote missing data. The best type for it is Numeric, since it has to be all numbers.

Gender: This is limited to three text values, F, M, and X. So we use a lookup table with these three values. We choose Lookup Wizard... then I will type in the values that I want from the wizard’s dialog box, then click Next. On the next box, I tell the wizard I only want 1 column and enter into that column the three values for the list, that is F, M and X,, then click Next. On the following screen I use the name Gender for the list and then Finish. If you now click on the Lookup tab at the lower left of the table screen, you’ll see the information from the wizard, confirming that your list has been created.

You may have noted that the data types chosen don’t necessarily match those given in the codebook. That’s ok because the codebook describes how the data is to appear when it is transferred from one party to another; as long as we abide by its formats when we share data, we can store the data any way we want in the database (and convert it to the types in the codebook when we prepare it for sharing with others).

Ensuring data accuracy

A paramount rule of database design is to have the database catch every possible error at the time of data entry. Incorrect data in the database leads to incorrect reports, analyses, and conclusions, and they can lead to bad policy and terrible consequences. An example—this is a true story...

A water district that supplies the Los Angeles area had to change some of the chemicals it used to purify the water (the reasons are interesting, but too long to go into here). Trouble was, anyone on hemodialysis treated with this changed water would die: Compounds formed in the water would react with chemicals in the dialysis machine, and the resulting poisonous substance would enter the patient’s blood during treatment. So, the L.A. ESRD network went to its database, and pulled up contact information for dialysis facilities and for people being treated at home. It notified every one of them about the situation and what to do about it (which was to install a special filter between the water supply and the dialysis machine). The Network was the only location that was required to have on file all dialysis patients in the area. Although the media was broadcasting the change, and other organizations were doing their best to notify the “interested parties,” it fell to the Network to make sure all were indeed notified. Imagine the consequences of error: for instance, a whole facility of poisoned patients because the facility’s code in the database was incorrect, or a home dialysis patient dying because a data error indicated the patient was dead—so of course she did not need to be contacted!

Properties in fields interact, sometimes in unexpected ways, with properties in input screens or reports. For instance, if a restriction on the range of numbers that can appear in a numeric field is given in the table, and a different restriction is placed on the field when its value is entered from a screen, difficult-to-fix contradictions can occur. So, a general rule: if a property is going to apply to a field on all screens that use it, set the property in the table. If not, set the properties in the screen(s) as appropriate. It is critical in a real-world system that the properties be set, wherever they are set, so that any incorrect data values entered on a screen that can be caught are caught. (Some errors cannot be caught: if you meant to enter a birth date of 05062004 and instead entered 05072004, there is no way the database can catch that, as both birth dates are perfectly legal and reasonable values. We catch all of what we can catch.)

Some examples, revisted

To illustrate setting field properties, here we set them for the fields we discussed above. In general, to change a property, click on its box.

NetID: The existing field size of long integer will allow for very large numbers, which in turn allows for lots of patients— just what we want. New Values set to Increment does the automatic numbering we desire. The user is not going to enter this field, so we do not need a Format or Caption. (Remember, these properties apply to the table; we can have captions and formatting on a screen or report if we want by defining them there.) Indexed is currently allowing duplicate values, not what the codebook calls for. So click on the Indexed property box to get a list of available choices and pick Yes(No Duplicates), which is what we want; each network ID is to be unique.

DOB:. The largest value this field can be is 99999999. That's pretty big for a number, so we leave the Field Size set at Long Integer. Formatting is best left to screens and forms, so we leave it blank here. Decimal places will always be 0, so we change the property to 0. Input masks are best left to input screens. Captions, too are best left to screens. We leave the Default Value at 0; during data entry, it will be changed to a real birthday (or the value for a missing one) so the 0 should not remain in a real patient—and if we see a 0 somewhere in the database, we know we have an error that needs to be fixed, not something we would notice if we used a default value that corresponded to a legal birth date. We leave the validation rules and messages to the data entry screens. We do, however, change this field’'s Required property to Yes; every patient will have a birthday, even if it is the phony missing value one.

Gender: This is limited to three text values, F, M, and X, so the Field Size of 50 is inappropriate; we change it to 1. We change Required to Yes and Allow Zero Length to No, as we always want a code from the table present. We leave the other properties as they are.

• Adjust the field properties of each field as necessary, keeping the above rules in mind. Read the help messages and use the F1 key as needed to get help on specific properties.

• To protect your work against loss, every now and again save it via the Save command (under the File menu). The first time you save the table, you will be asked to provide a name; call it KINFO to have its name match the one we use here. Of course, also save your table when you are done creating it.

When you save the table, you may be warned that there is no primary key defined. It turns out we do not need one; we would if we had multiple tables in the database, but we will be only using one. So, if this message comes up, click on No.

If you are working on the table in more than one session, you can get to it easily by double-clicking on the kinfo.mdb database (which, of course, launches Access and open the kinfo database). You may get a security warning; click Open if you do. When the database window appears, click on KINFO, then Design, to continue your work.

We strongly encourage you to have the TA review your table before you continue with this lab. As we said above, it’s much easier to fix any mistakes in the table now (while nothing else depends on it) than later, when changing the table will often necessitate major changes in other parts of your work.

The data entry screen

Adding new data to a database, deleting incorrect information, and keeping its data up-to-date and correct is a common, ongoing chore. To make the addition, deletion, and changing of data (often called “data maintenance”) easier and less error-prone, we want our screen layouts to group together related data, have meaningful labels for the fields, use good typography and judicious use of color to make them easy to read, and to have the database prevent the user, as much as it can, from entering invalid data values. So your next task is to prepare a “data entry screen” for this database.

• Go to the Database window. Click on Forms; Click on Create form by using wizard; be sure the KINO table appears in the Table/Queries box, and follow the instructions!

There are several ways to create a form, but start with the form wizard; it does a lot of work for you but still gives you a nice set of options to customize your form. Do feel free—even encouraged—to create screens using the other available options.

The last screen of the form wizard asks if you wish to modify the form, or view it. For now, tell Access you want to view your form, so you can review it. You can (and probably will) want to modify your form at some point to make it easier to find related fields, add screen or subgroup headings, change colors, and so on. To do so, just click on Forms, click on the name of your form and select Design. You’ll be placed into the design tool. It looks complex, but is actually quite straightforward to use. Again, use the various help features and experiment; you’ll be surprised how much you can do, and do easily, to improve your screen. (Try, for instance, typing “validation” into the Help Assistant’s query box and select Validate or restrict data in a form. you’ll get lots of information on how to set up input masks and validation rules for your input form).

Checking your work

• Enter some data to test your screen organization and field definitions. There are several ways to enter data into the database. Perhaps the easiest is to select Forms and double-click on the data entry form. The screen will appear; start entering data. Experiment with the record positioning buttons (the bottom of the form) until you are comfortable with using them. You enter values for fields by typing; you can move from one field to the next by clicking next to the field name (or wherever you placed the “box” in which that field’s contents should go, if you changed its position in your layout).

You can change the values of existing fields by simply clicking on their data entry boxes and changing the contents in them or, if the values come from a lookup table, by clicking on the arrow next to the field. Access will not leave a field (once entered) that fails to meet its data validation checks; instead, it will present an error message. When you dismiss the message, you will still be in the field. (Note that when you first create your form, virtually no validation checks are in place—about all that it will check is that there is no text in numeric fields.)

You can use the record positioning buttons on the bottom of the form to move from one record to another. Access will not let you leave a record if any data field in it is still in error. Be observant: if you try to leave an in-error record, say from field X, an error message will appear that may have nothing to do with field X, but is related to another field that has an error.

Don’t just pick obviously correct data values; pick ones known to be wrong and see if your database catches the mistakes. If it doesn’t, modify the data entry form, if possible, so they are caught. You won’t be able to catch every possible mistake, since someone might enter a legal, but wrong, data value in a field, and because Access’ data validation tools can’t check everything you might want. But do catch as many errors as you can. If the check is complex, start with doing part of the checking, make sure it works, then add more to it, check that, and so on, until you have the complete check in place—it’s a lot easier to find and correct mistakes this way then trying to figure out what’s wrong in a complex check.

• Test your data entry checking rigorously. When you are done, make sure your database has at least 10 “clean” records of data in it. (You’ll need them for the remainder of this exercise.)

Improving the screen layout

Now that you’ve had some practice using your screen, you may have thought of some ways to improve its appearance, perhaps even change it so it is more likely a data entry person will make fewer mistakes.

• Modify your screen as appropriate; you goal is to have the easiest to read and use screen that you can.


Part II: Computing New Fields

Calculated fields

It is often very hard to discern patterns in raw data, so we use formulae to compute from the data new values that are more meaningful to us. Deciding what data to collect is, in fact, often predicated upon the kinds of “higher level” information desired.

For example, the “event timeline” in this database reflects reality well. A patient begins dialysis (or gets a transplant) on a given date at a certain facility. That patient is considered to be in that “mode” until some other “event” occurs to change it; moving to CAPD, for example. The new mode, the date it occurred, and the facility now responsible for the patient (which might be the same one as before, or might not) is recorded. The old mode ends on the date the new one begins. Mode changes are tracked until the patient dies or leaves the local network, both of which are modes themselves. A special mode code is used to indicate that the patient changed facility, but that the mode itself did not change. (A patient who left the network and comes back is also considered to have had a facility change event).

But the timeline (and other fields) don’t directly provide information of daily interest to a Network. For instance, a patient’s current dialysis facility—the place where dialysis was last done—is often needed. Searching the event timeline for it would work, but it’s slow and error-prone compared to just looking on the screen for the current facility. So we want to compute this field, have it updated every time data used to compute it changes, and have it stand out on the screen.

We’ll step you through this variable’s computation; it will serve as an example of how computations are done in Access (and many other modern databases).

We need a place to store the results this new field contains. Since our purpose in creating it was to allow the person viewing the dialysis data a quick way of seeing the current dialysis facility, we store this new field on the data entry form. (There are other places one could store the field, such as in a report separate from the data entry form; doing so would make the results available to a different “user class,” one that cared only about summary results. We’ll get a feel for that class’ needs in Part III of this assignment.)

• Go to the form you created for data entry and open it in design view. Expand the form’s window so you have access to more of the form layout. Place the mouse just above the Form Footer bar so that mouse turns into a cross and drag the moust down so that the detail area of the form is enlarged. Then, from the Toolbox, click the Text Box icon (second row, second column). Then click on the wizard icon directly above it. (Activating the wizard will do related tasks for you, such as creating a label field to go with this text field). The mouse should turn into an arrow with ab in a box next to it.) Place the mouse in the detail area where you want the upper-left corner of the text box to be on the screen, and drag the mouse to make the text box; let go of the mouse button when the box is the size you want. (Note that a companion label is created.)

This box can now contain text of any kind, including a formula.

About formulas

A formula is an algebraic-like expression that produces a result; it computes new values from existing ones. It can be quite simple or very complex.

When entering complex formulas, don’t enter them all at once. Start with a simple component of it; check it. When it works right, add another component, and check that part, and so on, building up the formula piece by piece until you have it all.

For example, we use the IIf statement in several formulas we ask you to write for this assignment. IIf looks like this:

IIf(condition, what-to-do-if-true, what-to-do-if-false)

and it can be nested, a powerful mechanism for computing results, but one that results in a complex expression.

Suppose you want to enter the following formula. (Don’t worry right now about what it does; we’ll get to that presently.)

=IIf([Event3] <> 0 And [Event3] < 16, [EvFac3], IIf([Event2] <> 0
And [Event2] < 16, [EvFac2], IIf([Event1] <> 0 And
[Event1] < 16, [Event1], 0)))

It’s best to start with

IIf([Event1] <> 0 And [Event1] < 16, [Event1], 0)

and check that it works right for event 1. Then expand the formula to

IIf([Event2] <> 0 And [Event2] < 16, [EvFac2],
IIf([Event1]<> 0 And [Event1] < 16, [EvFac1], 0))

You know the formula works for Event1, so you only have to check it for Event2.

Continue by adding Event3. It may not seem like it, but incremental construction, with tests done after each step, is much faster and more effective at finding and fixing mistakes than trying to check the entire formula at once.

• Place the cursor in the text box; right click and choose Properties. Click on the Data tab and, in the Control Source box, right click and select Zoom.... (This action gives a large typing space). Now enter the above formula, doing it piecemeal, as illustrated above. Don't hit the Enter key when the formulas get long enough to “overfill” a line: the formula will wrap to a new line automatically. Enter OK to store formulas. If Access detects something wrong with the a formula,it will give you an error message when you try to leave this field; if that happens, return to the formula and fix it.

Interpreting formulas

The formula will indeed appear much like shown above—Access doesn’t employ good rules of presentation here, does it? (This ugliness illustrates why we should do our best to make our work readable; text is much tougher to understand when it isn’t presented well.) Here's the same formula, formatted a bit, to make it easier for you to reference while we discuss it:

=IIf([Event3] <> 0 And [Event3] < 16, [EvFac3],
  IIf([Event2] <> 0 And [Event2] < 16, [EvFac2],
  IIf([Event1] <> 0 And [Event1] < 16, [EvFac1], 0)))

Fields in formulas are referred to by name and enclosed in square brackets. Numeric constants (0 above) are just written down; text (none is used above) is enclosed in double quotes ("). An operator is written between its operands (or in front of it when there is only one operand, as in -2). <> is “not equal to,” < is “less than,” And is a logical operator that takes on the value true when both of its operands are true, and takes on the value false otherwise.

Functions calculate results. You provide the function with data to work with, called arguments or parameters; it computes a result and returns it to you. We use the IIf function in the formula above. The IIf function has three arguments. IIf first evaluates the first argument. If it is true, it returns the value of the second argument; if false, it returns the value of the third argument. We can nest functions to have the result of one function used as an argument of another. Seems strange, perhaps, but it does the job. If you interpret the IIf statement above, it would go something like this:

If event 3 is a hemodialysis (“hemo”) event, return its facility, otherwise
  If event 2 is a hemo event, return its facility, otherwise
    If event 1 is a hemo event, return its facility, otherwise
      return 0 (to mean “there is no current dialysis facility”)

Since events are (supposed to be) given in chronological order, this formula returns the facility at which the patient had the most recent dialysis event—just what we want.

Another way to enter formulas is by clicking on Build... when you are in the text box. This action lets you call up one of three different tools that can let you build expressions. Expression Builder. It’s quite straightforward to use (with a little—and we do mean a little—practice), and you may find it an easier approach to entering formulas then typing them. (The Expression Builder will help you avoid spelling mistakes, misplaced commas, and unbalanced parentheses, all of which plague anyone trying to enter formulas correctly.) Use the other builders only if you have had at least the equivalent of a course in introductory programming; they are powerful but complex tools.

Cutting parts of the formula and pasting them to a different place in the same or a different formula also helps reduce error, but be sure to make any needed changes in the pasted section.

There are many more operations and functions, of course; you can learn about them via good questions to the Assistant, or by using features available under the Help menu.

• Change the text in the label box (for the formula-holding field) to something meaningful (just click on the box and drag the mouse to highlight the existing text and start typing). Resize the box as needed to hold the text nicely (click on the box, move the cursor until it turns into a double arrow, and then drag the edge of the box).

• Save the form. Check that the new field is working properly by opening the data form and comparing the computed result against the dialysis events and facilities data you have (for each patient). Enter new data values as needed to insure that the Current Dialysis Facility variable is being computed properly.

• Now, add the following calculated fields to your form. Name them appropriately; display them nicely. (For instance, on the Yes/No fields, you could display the words “Yes” or “No,” as appropriate, or a check box that’s marked when the answer is “Yes”—but don’t put a 1 for Yes and a 2 for No; that’s much too hard to decipher! Remember: Presentation matters.)

Patient Alive? (As of today’s date) “Yes” if so, “No” if not
Current Mode: the event code of the most recent event (0 = no events)
Ever aTransplant? “Yes” if the patient ever received a transplant, “No” otherwise

Check your work carefully: Remember the potential consequences of wrong data!


Part III: Summary Results and Report

Preparing a summary report

Sifting through several (actually, at the Networks, it’s thousands) of data records to discern patterns is tough, if not impossible. So we prepare summary reports where the data for all or a group of records is combined in well-thought-out ways to make evident the data’s patterns and properties. A report has titles, column headings, labeling, summary lines, and so forth as necessary to present the data in the clearest, neatest way possible for a reader who might never see, or care about, the data entry process. In Access, a report obtains its data from tables or queries (more on the latter in a minute), and formats it by commands that appear in the report itself.

A query is where computed information, based on fields stored in the database’s tables, is kept; the query often includes totals and averages for a subgroup of records fulfilling specified criteria. We’ll lead you through a query that computes and stores a couple of “summary variables,” and then use those variables in a report.

• In the Database dialog box, click on Queries and click on Create a query in design view and then Open to create a blank query. Now select your table name from the Show Table window, and click on Add; this action makes your table’s fields available to the query. Now click on Close.

You now have a blank query. To provide you an example of how summary variables can be computed (as usual in Access, there is more than one way to do this), we’ll step you through creating a couple of variables: 1) the number of females in the database, and 2) the number of male patients who are currently on dialysis.

You’ll note a table at the bottom of the query screen; each column of the table stores information about one computed variable. We will be using predefined Access functions, called “totals,” for this work, so

• Click Totals under the View menu so that a Total: row is added to the query table.

We’ll be working in the first column for the number of women.

• In the Field: box, enter the expression FemaleCount:Count(*). This tells the query that you want to compute a new variable FemaleCount that is a count of records.

The Table: box tells the query the table in which the selected field resides (it’s possible that two tables each have a field with the same name). For this calculated variable, leave Table: blank—the variable is not part of any table.

• Change the value in the Total: box to Expression. This indicates that FemaleCount’s will be based on a subgroup (rather than all) of the records in the table.

Show being checked means that, when the query is opened (executed), the result of this computation will display (which is what we want).

Now we need to modify FemaleCount so it contains a count of women only; right now, it’s a count of everyone in the database.

• In the next column (to the right), enter Gender in the Field: box.

Access will fill in the table name (since it knows that’s the only place Gender can come from), check the Show: box, and place Group By in the Total box

• Change Group By to Where. Access will now select records “where” certain criteria are met

Note that Show has been unchecked; selection fields, since they only help us set up the results for other fields, are typically not displayed.

• In the Criteria: box, enter =F.

This selects out the gender group that has a value equal to F—females—and ignores all other groups. So, in this query, any time a variable (like FemaleCount) is computed, the computation will include only those records where Gender is F. Since FemaleCount is a count, that count will now only be of females.

• Save the query; give it a meaningful name (we called it Number of Women).

• Now double-click Number of Women in the Database window; if all has gone well, the (correct) count will appear in a spreadsheet-like form in a window.

• As always, test your work: enter a number of different records into the database (you can delete them later, if need be) so you can be sure that FemaleCount is correctly counting female patients. Be particularly sure to check unusual situations (for instance, try the case where the database has no records where Gender equals F, and make sure FemaleCount is 0 in that case).

Now we build a query to count the number of male patients who are currently on dialysis. A look at the codebook tells us that “currently on dialysis” means that the patient’s most recent mode is between 1 and 13. What’s the most recent event? It’s the highest-numbered event that has a non-zero code—0 doesn’t count as a mode change, since it means “not yet used.” We check the most recent event to see if it is between 1 and 13. If so, the patient is on dialysis. If the gender code for this patient indicates the patient is male, then this patient is added to the count of male patients who are currently on dialysis.

• Create a new query, via the design view

• Create an expression CountMenCntDx:Count(*); this new variable is just another (more complicated) count.

Unfortunately, Access does not allow one to use computed results on forms (such as the current dialysis facility we computed above) in queries. So, to select the records to appear in this count, we can’t use the current dialysis mode variable from the form: we have to compute a similar variable here.

• Create a logical expression that computes the current dialysis mode. Make its value parallel the current dialysis form variable. (Hint: the expression begins CntDxMode: IIf([Event3] <> 0, [Event3], IIf([Event2] <> 0, [Event2], ... and ends with Event1 and a few closing parentheses.) Make sure Table: is blank (this expression is certainly not in the table!), Show: is unchecked, and Total: is Where.

• Set Criteria: to >0 And < 14.

• Save your query as # Males Now on Dx.

Access will, when the query is opened, compute the current mode for each patient, and select out those patients current mode is between 1 and 13. That’s just the group we want to count.

• Check your work!

We can create as many queries as we wish on the database, and they can get pretty much as complicated as we want. (For instance, suppose we wanted to count all the females currently on dialysis. We would have three columns in the query: the count variable, a “group by” female in the next column, and then another “group by,” for current dialysis in the third. Access includes in a computed query variable only those records that meet all the “group by” criteria.) Queries (and thus their variables) can also be made available to other queries, so one can build up complex queries a bit at a time, if desired.

• Create the following summary variables (you’ll note some are just the ones we created above, or very similar to them):

    # of people in database {No selection criteria}
    #of women in database {1 selection criterion}
    # of men currently on dialysis {2 selection criteria}
    # of men who are current transplants {2 selection criteria} Again, check your work! Be sure to save your results.

As you’ve noticed, the query viewing is primitive, and only allows you to look at the variables in that query. We want to present a summary report that contains all our summary variables, and is in an easy-to-read format.

Unfortunately, Access is very picky about the properties that queries and tables must have in order for all of them to appear in the same report. For instance, if you ask Access to create a report that uses variables from the table itself, and from a query based on that table, it often refuses! It also can give incorrect results when you try to use two (or more) queries in a report that are based on the same table.

So, in order to get all the summary variables into one report, we “fake” Access out (not really: we just follow its rules to “work around” its limitations). The trick: copy all the summary variables into one, new query, and then use that new query as the basis of the report:

• Create a new query, using the design view

• In the Show Table window, select the Queries tab and, in turn, Add every query which contains a variable you want to appear on the report. Close the window when you’re done. (If you add a table you decide you don’t need, right-click on it and select Remove Table. If you later want to add a table, click on Show Table in the Query menu.

• For each variable you want to appear in the report, click on a Field: box and select that variable. You’ll note certain settings “come along” with this selection; leave them be: they are just what we want.

• Save your query as something like All Summary Measures.

• Check your selections by opening (running) this query, and note if all the needed variables are present. If so, save the query and exit; if not, go back into design mode and fix things.

You’re now ready to create the summary report.

• In the Database window, click on Reports then on Create report by using wizard.

The Report Wizard will do a lot of the initial report creation work for you; it’s much faster than building a report from the ground up.

• Where you are asked for the table or query to use, select All Summary Measures, then click on the right arrow to move each of the variables from the query that you want in the report into the Selected Fields area. (This screen does let you choose variables from other queries and tables, but, if you do, you’ll probably just get error messages!)

• Just click Next on the following screen: sorting orders makes no sense for summary variables. (Choosing a sorting order would make sense if we were printing a value for each patient, instead of printing one value summarizing all patients.)

• Choose a layout and orientation that suits the intended audience; that is, they are not computer or Access experts; they are business types who uderstand dialysis reporting. Click Next. Choose a style that you think your audience will like; click Next. Give a title for your report and click Finish. A preview of the report will appear.

You now have a basic report. But it is a bare-bones report: the titles of the variables, for instance, are not very meaningful to the report’s reader. So, go into the design mode and spiff up the report:

• In the Database window , click the name of your report and select Design.

Access treats a layout as having sections. The Report Header contains information that appears once on the report, at the beginning. The Page Header is information that appears at the top of each page. The Detail section is information that is repeated for each “entity” that appears in the report. (Since the variables appearing in this report are summarizing information across all patients, they only appear once; right now, this report has one entity). The Page Footer controls information that appears at the bottom of every page. The Report Footer contains information that appears once, at the end of the report.

• Move the variables from the Detail section to the Report Footer (just select and drag them).

These variables and titles only appear once on this report, so they should really be in the Report Footing section; it is misleading to leave them where they are.

• Make this report as easy to read and attractive as can be managed inside Access. You can easily check how the report will look by clicking Print Preview in the File menu, and then clicking it again to have the design view reappear. Try not to print copy after copy of the report to check its appearance—look at it on the screen. That approach will be faster, save paper, and save you money.

The report format options found in the Toolbox are quite similar to those used in form design. Try out various ideas; don’t be afraid to experiment. (Make a copy of your database. If your experiments go so awry that you don’t want to try to recover from them, just throw away your current database and use the copy.) It’s so easy to create a report that you might make several versions of the report, trying out different approaches; when you make the one you think is best, delete the others.

A note about context

In the introduction to this exercise, we gave you quite a bit of information about the data’s meaning. Yet, in all the work you’ve done, you didn’t need to know what the data meant—all you had to do was follow the instructions of others of what to do with it. This is a useful phenomenon, in that one doesn’t need to be an expert in a given field to do computer work for it.

So why did we bother with the lengthy background? Because you do need to know about the data’s context and meaning if you wish to know if your results are reasonable; it’s often called a “reality check.” You need to know the context of the data’s use to devise a presentation appropriate to your audience. Just because a database produces a result doesn’t mean it is correct or useful!

A note about saving your database

Access tries to figure out how big your database is going to get and reserves space for it. Sometimes this approach makes the database file quite large, even though there is little information in the database itself. (This typically happens if you repeatedly add, then delete, large amounts of information). If you find that your database doesn’t fit onto your diskette or memory stick, compact it: go to the Tools menu, select Database Utilities, then, from its submenu, select Compact and Repair Database. That should do the trick.

If, after compaction, your database is still too big to fit on your backup media (this is quite rare), back up your database to your H: drive.


Optional work

This exercise’s required work just scratches the surface of table, form and report design and of Access’ features. You can add do much more; some ideas follow. We encourage you to do other things instead of or in addition to those suggested here—but check with the TA first to make sure what you have in mind is reasonable given Access’ capabilities and the time you have to devote to bringing your idea to fruition.

Hint: You’ll want to make extensive use of Access’ help facilities, and perhaps get guidance from the TA, before undertaking most of these activities.

There are many more useful fields that can be computed for patients (and are, in the actual database):

• Compute the patient’s age, to the nearest year. The age is the number of years from birth to today, if the patient is alive, or birth to death, if the patient is deceased. If the birth or death year is missing, set the age to 999. If the birth or death month is missing, assume a month of 06; if the birth or death day is missing, assume 15. (These are standard assumptions; previous research has shown them to be reasonable.)

The easiest way to proceed here is to look at the Patient Alive? field. If it is “Yes”, use an “ending date” of today’s date. If not, use the date of death as the ending date. Using the month and day assumptions above as needed, make beginning (birth) and ending (death or today’s) dates and, using date arithmetic, compute the difference between the two days. Divide the result by 365.25 (the number of days in a year, on average), and round to the nearest year. If the calculation can’t be done, store a value of 777. You will probably need some temporary fields, and perhaps another function or two, to complete the calculation. (Exactly what will be needed depends on exactly how you do the computation; there are several approaches that work.) This calculation will probably require a bit of thought to get right. Do use Access’ information tools discussed above to discover how to do the necessary calculations.

• Compute “Current Transplant?” The field contains “Yes” if the patient’s most recent dialysis event was a transplant, “No” otherwise.

• Compute the patient’s age at the first event, rounded to the nearest year.

• Compute the patient’s years on dialysis, which is the time from first event until today, if the patient is alive, or to death. Use the same rules regarding missing data as we used for the patient’s age. (If you computed the patient’s age properly, all you need to do is change a date or two in the computation to get these fields’ values.)

• Compute “grouped versions” of the previous two fields, as follows:

    If the field’s value is between 0 and 19, make the grouped version’s value a 1;
    If it is between 20 and 40, make the grouped value a 2;
    If between 41 and 60, make it a 3;
    If 61 or greater, make it a 4;
    If can’t be calculated (999), make it a 9.

Add them to the layout; label, format and position them well.

• Add one or more of these following summary fields to your report. Note you may have to create some of the computed fields discussed above. The new fields are

    # of people in age group 1
    # of people in age group 2
    # of people in age group 3
    # of people in age group 4

    # of women in age group 1
    # of women in age group 2
    # of women in age group 3
    # of women in age group 4

    # of men in age group 1
    # of men in age group 2
    # of men in age group 3
    # of men in age group 4

    # of dx patients in age group 1
    # of dx patients in age group 2
    # of dx patients in age group 3
    # of dx patients in age group 4

    # of current tx patients in age group 1
    # of current tx patients in age group 2
    # of current tx patients in age group 3
    # of current tx patients in age group 4

    #of patients whose first event occurred when they were in age group 1
    #of patients whose first event occurred when they were in age group 2
    #of patients whose first event occurred when they were in age group 3
    #of patients whose first event occurred when they were in age group 4

Another common database activity is to pull out selected records and/or fields and send them to another site (as the Networks do, sending their data to CMS).

Let’s assume you are sending a copy of the database to a researcher.

• Make a copy of your database. Remove from it the facility fields (this researcher doesn’t care about them); also remove the summary fields and reports (these don’t transfer to a different database well, since they are not really part of the patient table). Save the revised database with the name For Dr Spock, as a tab delimited text file. Turn it in electronically with your other work.

There is much more that can be learned from this data. Further summary statistics can be computed, both grand and by group, and using fields we haven’t employed; research questions can be investigated. Data can be graphed and charted, making it even easier to see patterns in it. Data can be copied to other tools within Office and analyzed or processed there (for instance, copying data to a spreadsheet lets you use statistics not available in Access.) Or pictures and other data can be copied from elsewhere in Office and copied into your database layouts to “spiff them up.”

And there are many other areas of inquiry; here are some you might consider:

Do patients who have ever had a transplant live longer than patients who have never had one? (If not, then maybe the government shouldn’t fund transplants—but perhaps it’s more subtle than this...)

Are there differences in longevity based on when ESRD began (for instance, do youngsters who start on ESRD live as long as elders who do)?

How long does a typical transplanted kidney function?

How long does a patient typically stay on CAPD? To what mode do they go when they change?

Which facilities have the highest fatality rate? Transplant failure rate? Leaving-CAPD rate?

Are there major differences in any of these findings because of demographic factors (e.g., age, gender, ethnicity)?

So, if you’d like, “massage the data” further. You’ll need to think carefully about what questions you want to try to answer, and how best to go about analyzing the data to do so.

Your grade for this part of the assignment will depend one how much additional work you undertake, how well you do it, and how complicated it is to execute. Remember, though, you cannot get optional credit unless you have completed the required parts of this assignment; we want you to focus on the required material first.

Written by Norman Jacobson, December 1995; inspired by his work for ESRD Network #18
Rewritten by Norman Jacobson (to change some tasks and split into smaller exercises), December 1996
Rewritten for Windows NT, Microsoft Access and ICS 1A by Norman Jacobson, September 1997
Revised by Norman Jacobson, April 1998
Revised by Norman Jacobson, September 1998
Minor revisions by Norman Jacobson, August 1999<
Revised for ICS10A by Norman Jacobson, December 2000<
Revised for the Winter 2005 offering of ICS10A, including updating to reflect Access 2003, by Norman Jacobson, December 2004
Minor revisions and some polishing for ICS10A Spring 2006 by Norman Jacobson, March and April 2006