Homework 1: Simple UNIX programs

This assignment will make you more familiar with how to build simple Unix programs, create Makefiles, look at assembly code, and use the GDB debugger. The things you learn will prepare you to handle a more complex environment of the xv6 kernel required for the future homework. You can do this assignment on any operating system that supports the Unix API (Linux Openlab machines, your laptop that runs Linux or Linux VM, and even MacOS, etc.). You don't need to set up xv6 for this assignment Submit your programs and the shell through Gradescope (see instructions at the bottom of this page). For Mac / OSX> users. The support of 32 bit applications is depricated in the latest version of your system. So if you already updated your system to macOS Catalina or have updated your XCode then we recommend you to do the homework at the Openlab machines.

Part 1: Simple UNIX programs

Download the main.c, and look it over. This is a skeleton for a simple UNIX program.

To compile main.c, you need a C compiler, such as gcc. On Openlab machines, you can compile the skeleton with the following command:

 $ gcc main.c 
This will produce an a.out file, which you can run:
 $ ./a.out 

Alternatively you can pass an additional option to gcc to give a more meaningful name to the compiled binary, like

 $gcc main.c -o hello

Here gcc will compile your program as hello. In the rest of this part of the assignment you will explore how to automate program development with Makefiles, learn how debug your code with GDB, and disassemble the program to verify your understanding of assembly language.

Part 2: Simple Makefiles

This part of the homework is adapted from https://opensource.com/article/18/8/what-how-makefile and http://mrbook.org/blog/tutorials/make/ It aims to introduce you to basics of Makefiles and the make tool that provides a way to compile complex software projects like xv6 and Linux kernel.

If you want to run or update a task when certain files are updated, the make utility can come in handy. The make utility requires a file, Makefile (or makefile), which defines set of tasks to be executed. You may have used make to compile a program from source code. Most open source projects use make to compile a final executable binary, which can then be installed using make install.

We'll explore make and Makefile using basic and advanced examples. Before you start, ensure that make is installed in your system. Note: we will create three different makefiles (Makefile1, Makefile2, and Makefile3) in this part.

Let's start by printing the classic "Hello World" on the terminal. Create a empty directory myproject containing a file Makefile with this content:

say_hello:
        echo "Hello World"

Now run the file by typing make inside the directory myproject. The output will be:

$ make
echo "Hello World"
Hello World

In the example above, say_hello behaves like a function name, as in any programming language. This is called the target. The prerequisites or dependencies follow the target. For the sake of simplicity, we have not defined any prerequisites in this example. The command echo "Hello World" is called the recipe. The recipe uses prerequisites to make a target. The target, prerequisites, and recipes together make a rule.

To summarize, below is the syntax of a typical rule:

target: prerequisites
<TAB> recipe

As an example, a target might be a binary file that depends on prerequisites (source files). On the other hand, a prerequisite can also be a target that depends on other dependencies:

final_target: sub_target final_target.c
        Recipe_to_create_final_target

sub_target: sub_target.c
        Recipe_to_create_sub_target

It is not necessary for the target to be a file; it could be just a name for the recipe, as in our example. We call these "phony targets."

Going back to the example above, when make was executed, the entire command echo "Hello World" was displayed, followed by actual command output. We often don't want that. To suppress echoing the actual command, we need to start echo with @:

say_hello:
        @echo "Hello World"

Now try to run make again. The output should display only this:

$ make
Hello World

Let's add a few more phony targets: generate and clean to the Makefile:

say_hello:
        @echo "Hello World"

generate:
        @echo "Creating empty text files..."
        touch file-{1..10}.txt

clean:
        @echo "Cleaning up..."
        rm *.txt

If we try to run make after the changes, only the target say_hello will be executed. That's because only the first target in the makefile is the default target. Often called the default goal, this is the reason you will see all as the first target in most projects. It is the responsibility of all to call other targets. We can override this behavior using a special phony target called .DEFAULT_GOAL.

Let's include that at the beginning of our makefile:

.DEFAULT_GOAL := generate
This will run the target generate as the default:
$ make
Creating empty text files...
touch file-{1..10}.txt

As the name suggests, the phony target .DEFAULT_GOAL can run only one target at a time. This is why most makefiles include all as a target that can call as many targets as needed.

Let's include the phony target all and remove .DEFAULT_GOAL:

all: say_hello generate

say_hello:
        @echo "Hello World"

generate:
        @echo "Creating empty text files..."
        touch file-{1..10}.txt

clean:
        @echo "Cleaning up..."
        rm *.txt

Before running make, let's include another special phony target, .PHONY, where we define all the targets that are not files. make will run its recipe regardless of whether a file with that name exists or what its last modification time is. Here is the complete makefile:

.PHONY: all say_hello generate clean

all: say_hello generate

say_hello:
        @echo "Hello World"

generate:
        @echo "Creating empty text files..."
        touch file-{1..10}.txt

clean:
        @echo "Cleaning up..."
        rm *.txt

The make should call say_hello and generate

$ make
Hello World
Creating empty text files...
touch file-{1..10}.txt

It is a good practice not to call clean in all or put it as the first target. clean should be called manually when cleaning is needed as a first argument to make:

$ make clean
Cleaning up...
rm *.txt

Now that you have an idea of how a basic makefile works and how to write a simple makefile, let's look at some more advanced examples.

Something more real

Now lets try to create a simple Makefile that we can use to compile our programs:

all:
    gcc main.c  -o hello

Now you can run:

make
Instead of running GCC manually, the Makefile lets you compile main.c into the hello program.

In our example the only target in the Makefile is all. The make utility will try to resolve this target if no other targets are specified.
We also see that there are no dependencies for target all, so make safely executes the system commands specified.

Finally, make compiles the program according to the command line we gave it.

When you submit your work, rename this makefile into Makefile1. You can always pass a custom name to make with the -f

make -f Makefile1

Using dependencies

Sometimes it is useful to use different targets. It makes the Makefile more modular and allows assembling a complex project from multiple pieces.
Here is an example (in your submission name this makefile as Makefile2:

all: hello

hello: main.o 
    gcc main.o -o hello

main.o: main.c
    gcc -c main.c

clean:
    rm *.o hello

Now we see that the target all has only one dependency (i.e., hello), but no system commands. In order for make to execute correctly, it has to meet all the dependencies of the called target.

Each of the dependencies are searched through all the targets available and executed if found.

In this example we see the target called clean. It is useful to have such target if you want to have a fast way to get rid of all the object files and executables.

make clean

Using variables and comments

You can also use variables when writing Makefiles. It comes in handy in situations where you want to change the compiler, or compiler options (create this makefile and submit it as Makefile3).

# This is how you write comments
# Use gcc as a compiler
CC=gcc
# CFLAGS will be the options we'll pass to the compiler
CFLAGS=-Wall

all: hello

hello: main.o
    $(CC) $(CFLAGS) main.o -o hello

main.o: main.c
    $(CC) -c $(CFLAGS) main.c

clean:
    rm *.o hello

Variables can be very useful. To use them, just assign a value to a variable before you start writing your targets. After that, you can just use them with the dereference operator $(VAR).

If you want to know more...

With this brief introduction to Makefiles, you can create some very sophisticated mechanisms for compiling your projects. However, this is just the tip of the iceberg. More documentation is available here: Make documentation.

Here is an example of a more automated Makefile that you might use in one of your projects (not required for this homework).

CC=gcc
# Compiler flags
CFLAGS=-Wall
# Linker flags
LDFLAGS=
# You can add multiple source files here separated with spaces
SOURCES=main.c

# Replace .c with .o creating a list of object files
OBJECTS=$(SOURCES:.c=.o)

# Name executable 
EXECUTABLE=hello

all: $(SOURCES) $(EXECUTABLE)
    
$(EXECUTABLE): $(OBJECTS) 
    $(CC) $(LDFLAGS) $(OBJECTS) -o $@

# Rule that tells make how to make an object file out of a .c file 
.c.o:
    $(CC) -c $(CFLAGS) $< -o $@

clean:
    rm *.o $(EXECUTABLE)

What to submit

Three makefiles (Makefile1, Makefile2, and Makefile3) you created to compile your hello example

Part 3: Debugging programs with GDB

On UNIX systems the main debugger is GDB (GNU debugger). To be able to comfortably debug your code compile it with the -g option which will instruct the compiler to generate debug symbols (variable names, source lines, etc.) for the program. For example, change your Makefile to have
 
CFLAGS=-Wall -g -m32 -fno-pic
This will compile your hello program with debugging symbols (-g flag), as a 32bit x86 executable (-m32 flag), and for simplicity avoid generating position independent code ( -fno-pic flag). Then you can start you program under control of gdb
gdb hello
This starts gdb ready to execute your hello program. To get it running type the run command in the GDB command prompt (or r -- short for run):
gdb\> run
Now the program runs and finished printing "Hello world".

GDB is a feature-rich debugger, and it will take you some time to learn all the features. Here is a couple of starting points: GDB tutorial, GDB intro and GDB Cheat Sheet.

Probably, the best resource for this homework is Operating Systems from 0 to 1. Chapter 6. Runtime Inspection and Debug (it is strongly recommended to read this chapter).

At a high level you need only two main things: 1) breakpoints and 2) ability to examine data. Breakpoints can be set with the "b" command inside gdb.

Breakpoints and single stepping

Just to make debugging a bit more realistic lets add another function to our simple program. Lets change it to compute a sum of numbers from 0 to n. You can do this by implementing the following function
unsigned long sum(int n) {
    int i;
    unsigned long sum = 0;

    for (i = 0; i < n; i++) {
        sum = sum + i;
    }

    return sum;
}
and calling it from
main()
int main(void) {

    unsigned long s;

    s = sum(100);
    printf("Hello world, the sum:%ld\n", s);
    return 0;
}

Running the programs on its own is not that useful. Lets try setting a breakpoint on the "main" function to examine what the program is actually doing. For that type break in the GDB command prompt or b and then run the program with r

(gdb) break main
Breakpoint 1 at 0x56b: file main.c, line 26.
(gdb) r
Starting program: ...  

Breakpoint 1, main () at main.c:26
26          s = sum(100);
(gdb) 

The debugger stopped at the beginning of the main function (line 26 of main.c. You can examine the source code of the program by typing list or l

(gdb) list
21
22      int main(void) {
23
24          unsigned long s;
25
26          s = sum(100);
27          printf("Hello world, the sum:%ld\n", s);
28          return 0;
29      }
30

Now you can execute the program line by line by typing next (execute next line, or n for short), and step (s) to step into a function. Try stepping into the sum function by running step

(gdb) s
sum (n=100) at main.c:13
13          unsigned long sum = 0;

Here I'm inside the sum function, where I type l to list the source code, and then type n to execute it line by line (I type n once, and then simply hit "Enter" asking GDB to execute the last command for me

(gdb) l
8       #include 
9       #include 
10
11      unsigned long sum(int n) {
12          int i;
13          unsigned long sum = 0;
14
15          for (i = 0; i < n; i++) {
16              sum = sum + i;
17          }
(gdb) n
15          for (i = 0; i < n; i++) {
(gdb)
16              sum = sum + i;
(gdb)
15          for (i = 0; i < n; i++) {
(gdb)
16              sum = sum + i;

TUI: Graphical User Interface

The second most useful feature is the TUI mode that turns GDB into a real modern debugger. Here is a couple of links to TUI.

You can switch into TUI by pressing Ctrl-X and then "1", or start gdb in TUI mode right away

  gdb hello -tui
Or by typing this in the gdb command prompt (this command doesn't work on Openlab, so you'll have to do Ctrl-X and then 1, but normally it works)
(gdb) tui enable

Start the program from the begginging and single step it with n and s. The source code of the program will be scrolling in the TUI window in the top part of the screen.

Examining data

You can print values of variables with "print", e.g., print the values of i and sum

  gdb\> p i
  gdb\> p sum

Conditional breakpoints

While debugging programs it's often useful to see what the program is doing right before it crashes. One way to do this is to step through, one at a time, every statement of the program until we get to the point of execution where we want to see what is happening. This works, but sometimes you may want to just run to a particular section of code and stop execution at that point so you can examine data at that location.

GDB allows you to set conditional breakpoints. For example, lets break inside the loop of the sum function when the index i is equal to 50. I first list the source code to get the exact source lines and then set a brakpoint inside the main.c file at line 16 with break main.c:16

(gdb) l
11      unsigned long sum(int n) {
12          int i;
13          unsigned long sum = 0; 
14
15          for (i = 0; i < n; i++) {
16              sum = sum + i; 
17          }
18
19          return sum; 
20      }
(gdb) break main.c:16
Breakpoint 2 at 0x56555543: file main.c, line 16.
This breakpoint will trigger for every iteration of the loop. So if we want it to fire only when i is 50 we add the following condition. I use 2 as this is the number of the breakpoint that I set at line 16.
(gdb) condition 2 i==50

I now continue execution of the program with the continue or c command.

(gdb) c
Continuing.

Breakpoint 2, sum (n=100) at main.c:16
16              sum = sum + i; 

When the breakpoint is hit I check that the value of i is really 50

(gdb) p i
$1 = 50
(gdb) 

Exploring crashes

Now, lets take a look at how you can use GDB to debug your crashing programs. First, lets generate a program that crashes. Add a global variable a[32] to your program (it's an array of 32 integers).
int a[32];
Lets then add a function that makes an out of bounds array access
unsigned long crash_array(int n) {
    int i;
    unsigned long sum = 0;

    for (i = 0; i < n; i++) {
        sum = sum + a[i];
    }

    return sum;
}
If you invoke this function with n larger than 31 it will crash (Note, you might get lucky and it will not crash, i.e., not all out of bounds accesses cause a crash in C programs). To be safe lets invoke it with n equal to 10,000.
s = crash_array(100000);
printf("crash array sum:%ld\n", s);    
If you make this program and run it, it will crash
./hello
Hello world, the sum:4950
Segmentation fault (core dumped)
Now, to understand the crash you can run it under gdb:
(gdb) r
Starting program: /home/aburtsev/doc/OS_Stuff/Flux/git/personal/classes/os-class/cs143a/hw/hello
Hello world, the sum:4950

Program received signal SIGSEGV, Segmentation fault.
0x56555566 in crash_array (n=100000) at main-full.c:18
18	        sum = sum + a[i];
You can use the backtrace (bt) command to look at the backtrace (a chain of function invocations leading to the crash):
(gdb) bt
#0  0x56555566 in crash_array (n=100000) at main-full.c:18
#1  0x565555ec in main () at main-full.c:45
Here, the GDB tells you that crash_array got a segfault at line 18 in main-full.c. You see that there are two stack frames available (0 for main and 1 for crash_array). You can use the frame (f) command to choose any of the frames and inspect it. For example, lets choose frame #0 and list the crashing code with the list command
(gdb) f 0
#0  0x56555566 in crash_array (n=100000) at main-full.c:18
18	        sum = sum + a[i];
(gdb) l
13	unsigned long crash_array(int n) {
14	    int i;
15	    unsigned long sum = 0;
16
17	    for (i = 0; i < n; i++) {
18	        sum = sum + a[i];
19	    }
20
21	    return sum;
22	}
We know that line 18 is the crashing line. We can print the values of the local variable i
(gdb) p i
$1 = 35824
It is equal to 35824. This should give you enough information for why you crashed.

Now fix the crash_array function to prevent the crash.

What to submit

The Makefile and the updated main.c program.

Part 4: Exploring assembly code

In this part of the homework you will explore assembly code of the sum function.

Re-start GDB and set the breakpoint on the sum function

(gdb) b sum
Breakpoint 1 at 0x533: file main.c, line 13.
(gdb) r
Starting program: /home/aburtsev/doc/OS_Stuff/Flux/git/personal/classes/os-class/cs143a/hw/hello 

Breakpoint 1, sum (n=100) at main.c:13
13          unsigned long sum = 0; 
GDB can disassemble the code of the program with the disassemble (or disas) command
(gdb) disas
Dump of assembler code for function sum:
   0x5655552d <+0>:     push   %ebp
   0x5655552e <+1>:     mov    %esp,%ebp
   0x56555530 <+3>:     sub    $0x10,%esp
=> 0x56555533 <+6>:     movl   $0x0,-0x4(%ebp)
   0x5655553a <+13>:    movl   $0x0,-0x8(%ebp)
   0x56555541 <+20>:    jmp    0x5655554d 
   0x56555543 <+22>:    mov    -0x8(%ebp),%eax
   0x56555546 <+25>:    add    %eax,-0x4(%ebp)
   0x56555549 <+28>:    addl   $0x1,-0x8(%ebp)
   0x5655554d <+32>:    mov    -0x8(%ebp),%eax
   0x56555550 <+35>:    cmp    0x8(%ebp),%eax
   0x56555553 <+38>:    jl     0x56555543 
   0x56555555 <+40>:    mov    -0x4(%ebp),%eax
   0x56555558 <+43>:    leave  
   0x56555559 <+44>:    ret    
End of assembler dump.
(gdb) 

Unfortunately, the default syntax for the disassembly is AT&T. So the first thing you do is switch to Intel syntax we were using in class. Invoke the set disassembly-flavor intel command and disassembly the function again

(gdb) set disassembly-flavor intel
(gdb) disas
Dump of assembler code for function sum:
   0x5655552d <+0>:     push   ebp
   0x5655552e <+1>:     mov    ebp,esp
   0x56555530 <+3>:     sub    esp,0x10
=> 0x56555533 <+6>:     mov    DWORD PTR [ebp-0x4],0x0
   0x5655553a <+13>:    mov    DWORD PTR [ebp-0x8],0x0
   0x56555541 <+20>:    jmp    0x5655554d 
   0x56555543 <+22>:    mov    eax,DWORD PTR [ebp-0x8]
   0x56555546 <+25>:    add    DWORD PTR [ebp-0x4],eax
   0x56555549 <+28>:    add    DWORD PTR [ebp-0x8],0x1
   0x5655554d <+32>:    mov    eax,DWORD PTR [ebp-0x8]
   0x56555550 <+35>:    cmp    eax,DWORD PTR [ebp+0x8]
   0x56555553 <+38>:    jl     0x56555543 
   0x56555555 <+40>:    mov    eax,DWORD PTR [ebp-0x4]
   0x56555558 <+43>:    leave  
   0x56555559 <+44>:    ret    
End of assembler dump.
(gdb) 
Your assignment is to explain every line of the assembly dump and submit it as a text file. Note, while you can understand what the assembly code is doing, you can also use GDB to step through the assembly code of the program. You can simply type
(gdb) layout asm
to switch TUI to the assmebly mode. Then you can use two new GDB commands: nexti (ni) and stepi (si) for stepping through the assembly instructions (i stands for "instruction", e.g., next instruction and step instrucion).
(gdb) si

You can do the split layout to show both assembly and the C source code windows at the same time.

(gdb) layout split
You can also show the registers on the screen (don't forget the list of TUI commands --- they can be very handy).
(gdb) layout regs

What to submit

A text file that explains every line of the assembly dump.

Part 5: Exploring calling conventions

In this final part of the homework you will explore the calling conventions for passing the arguments on the stack.

Re-start GDB and set the breakpoint on the sum function, and run the program.

(gdb) b sum 
Breakpoint 1 at 0x533: file main.c, line 13.
(gdb) r
Starting program: /home/aburtsev/doc/OS_Stuff/Flux/git/personal/classes/os-class/cs143a/hw/hello 

Breakpoint 1, sum (n=100) at main.c:13
13          unsigned long sum = 0; 
(gdb)
Use the info command to see the content of the registers
(gdb) info regs
Undefined info command: "regs".  Try "help info".
(gdb) info reg
eax            0xf7fb0dd8       -134541864
ecx            0xffffc8c0       -14144
edx            0xffffc8e4       -14108
ebx            0x0      0
esp            0xffffc874       0xffffc874
ebp            0xffffc884       0xffffc884
esi            0xf7faf000       -134549504
edi            0x0      0
eip            0x56555533       0x56555533 
eflags         0x286    [ PF SF IF ]
cs             0x23     35
ss             0x2b     43
ds             0x2b     43
es             0x2b     43
fs             0x0      0
gs             0x63     99
Use the x command to inspect the stack
(gdb)  x/24x $esp
0xffffc874:     0xf7faf000      0x00000000      0xf7e0760b      0xf7faf3fc
0xffffc884:     0xffffc8a8      0x56555572      0x00000064      0x00000001
0xffffc894:     0xffffc954      0xffffc95c      0x565555c1      0xf7fe59b0
0xffffc8a4:     0xffffc8c0      0x00000000      0xf7defe81      0xf7faf000
0xffffc8b4:     0xf7faf000      0x00000000      0xf7defe81      0x00000001
0xffffc8c4:     0xffffc954      0xffffc95c      0xffffc8e4      0x00000001

Explain every value from the dump that you get (similar to the one above but generated on your machine), i.e., which one is a local variable inside sum, which is the frame pointer, what are the arguments to sum and so on. The more values you can exaplain the better.

Hints: start from main, i.e., set a breakpoint on main and see how the values on the stack are changing, step through main and see what gets pushed on the stack.

Note: GCC generates the code that keeps the stack aligned at 16 bytes, hence instead of allocating 4 bytes for the local variable in sum it allocates 16.

What to submit

Submit a text file explaining every value on the stack.

Submit your work

Submit your solution through Gradescope Gradescope CS143A Principles of Operating Systems. Place each part of the assignment into folders with name part2, part3, part4 or part5, then pack it into a zip archive and submit it. Please name makefiles as Makefile1, Makefile2, Makefile3 for the part 2. Please submit .txt files for parts 4 and 5. You can resubmit as many times as you wish. If you have any problems with the structure the autograder will tell you. Part 4 and 5 of the assignment will be graded manually after the deadline. The structure of the zip file should be the following:

/
  - /part2
    - Makefile1
    - Makefile2
    - Makefile3
    - main.c
  - /part3
    - Makfile
    - main.c
  - /part4
    - explanation.txt
  - /part5
    - explanation.txt
Updated: October, 2019