143A Principles of Operating Systems


Home
Homework 1: Simple UNIX programs This assignment will make you more familiar with how to build simple Unix programs, create Makefiles, look at assembly code, and use the GDB debugger. The things you learn will prepare you to handle a more complex environment of the xv6 kernel required for the future homework. You can do this assignment on any operating system that supports the Unix API (Linux Openlab machines, your laptop that runs Linux or Linux VM, and even MacOS, etc.). You don't need to set up xv6 for this assignment Submit your programs and the shell through Gradescope (see instructions at the bottom of this page). For Mac / OSX> users. The support of 32 bit applications is depricated in the latest version of your system. So if you already updated your system to macOS Catalina or have updated your XCode then we recommend you to do the homework at the Openlab machines. Part 1: Simple UNIX programs Download the main.c, and look it over. This is a skeleton for a simple UNIX program. To compile `main.c`, you need a C compiler, such as gcc. On Openlab machines, you can compile the skeleton with the following command: $ gcc main.c This will produce an `a.out` file, which you can run: $ ./a.out Alternatively you can pass an additional option to gcc to give a more meaningful name to the compiled binary, like $gcc main.c -o hello Here gcc will compile your program as `hello`. In the rest of this part of the assignment you will explore how to automate program development with Makefiles, learn how debug your code with GDB, and disassemble the program to verify your understanding of assembly language. Part 2: Simple Makefiles This part of the homework is adapted from https://opensource.com/article/18/8/what-how-makefile and http://mrbook.org/blog/tutorials/make/ It aims to introduce you to basics of Makefiles and the `make` tool that provides a way to compile complex software projects like xv6 and Linux kernel. If you want to run or update a task when certain files are updated, the make utility can come in handy. The make utility requires a file, Makefile (or makefile), which defines set of tasks to be executed. You may have used make to compile a program from source code. Most open source projects use make to compile a final executable binary, which can then be installed using make install. We'll explore make and Makefile using basic and advanced examples. Before you start, ensure that make is installed in your system. Note: we will create three different makefiles (Makefile1, Makefile2, and Makefile3) in this part. Let's start by printing the classic "Hello World" on the terminal. Create a empty directory myproject containing a file Makefile with this content: say_hello: echo "Hello World" Now run the file by typing make inside the directory myproject. The output will be: $ make echo "Hello World" Hello World In the example above, say_hello behaves like a function name, as in any programming language. This is called the target. The prerequisites or dependencies follow the target. For the sake of simplicity, we have not defined any prerequisites in this example. The command echo "Hello World" is called the recipe. The recipe uses prerequisites to make a target. The target, prerequisites, and recipes together make a rule. To summarize, below is the syntax of a typical rule: target: prerequisites <TAB> recipe As an example, a target might be a binary file that depends on prerequisites (source files). On the other hand, a prerequisite can also be a target that depends on other dependencies: final_target: sub_target final_target.c Recipe_to_create_final_target sub_target: sub_target.c Recipe_to_create_sub_target It is not necessary for the target to be a file; it could be just a name for the recipe, as in our example. We call these "phony targets." Going back to the example above, when make was executed, the entire command echo "Hello World" was displayed, followed by actual command output. We often don't want that. To suppress echoing the actual command, we need to start echo with `@`: say_hello: @echo "Hello World" Now try to run make again. The output should display only this: $ make Hello World Let's add a few more phony targets: generate and clean to the Makefile: say_hello: @echo "Hello World" generate: @echo "Creating empty text files..." touch file-{1..10}.txt clean: @echo "Cleaning up..." rm .txt If we try to run make after the changes, only the target say_hello will be executed. That's because only the first target in the makefile is the default target. Often called the default goal, this is the reason you will see all as the first target in most projects. It is the responsibility of all to call other targets. We can override this behavior using a special phony target called `.DEFAULT_GOAL`. Let's include that at the beginning of our makefile: .DEFAULT_GOAL := generate This will run the target generate as the default: $ make Creating empty text files... touch file-{1..10}.txt As the name suggests, the phony target `.DEFAULT_GOAL` can run only one target at a time. This is why most makefiles include all as a target that can call as many targets as needed. Let's include the phony target all and remove `.DEFAULT_GOAL`: all: say_hello generate say_hello: @echo "Hello World" generate: @echo "Creating empty text files..." touch file-{1..10}.txt clean: @echo "Cleaning up..." rm .txt Before running make, let's include another special phony target, `.PHONY`, where we define all the targets that are not files. make will run its recipe regardless of whether a file with that name exists or what its last modification time is. Here is the complete makefile: .PHONY: all say_hello generate clean all: say_hello generate say_hello: @echo "Hello World" generate: @echo "Creating empty text files..." touch file-{1..10}.txt clean: @echo "Cleaning up..." rm .txt The make should call `say_hello` and `generate` $ make Hello World Creating empty text files... touch file-{1..10}.txt It is a good practice not to call clean in all or put it as the first target. clean should be called manually when cleaning is needed as a first argument to make: $ make clean Cleaning up... rm .txt Now that you have an idea of how a basic makefile works and how to write a simple makefile, let's look at some more advanced examples. Something more real Now lets try to create a simple Makefile that we can use to compile our programs: all: gcc main.c -o hello Now you can run: make Instead of running GCC manually, the Makefile lets you compile `main.c` into the `hello` program. In our example the only target in the Makefile is `all`. The `make` utility will try to resolve this target if no other targets are specified. We also see that there are no dependencies for target `all`, so `make` safely executes the system commands specified. Finally, make compiles the program according to the command line we gave it. When you submit your work, rename this makefile into `Makefile1`. You can always pass a custom name to make with the `-f` make -f Makefile1 Using dependencies Sometimes it is useful to use different targets. It makes the Makefile more modular and allows assembling a complex project from multiple pieces. Here is an example (in your submission name this makefile as `Makefile2`: all: hello hello: main.o gcc main.o -o hello main.o: main.c gcc -c main.c clean: rm .o hello Now we see that the target `all` has only one dependency (i.e., `hello`), but no system commands. In order for `make` to execute correctly, it has to meet all the dependencies of the called target. Each of the dependencies are searched through all the targets available and executed if found. In this example we see the target called `clean`. It is useful to have such target if you want to have a fast way to get rid of all the object files and executables. make clean Using variables and comments You can also use variables when writing Makefiles. It comes in handy in situations where you want to change the compiler, or compiler options (create this makefile and submit it as `Makefile3`). # This is how you write comments # Use gcc as a compiler CC=gcc # CFLAGS will be the options we'll pass to the compiler CFLAGS=-Wall all: hello hello: main.o $(CC) $(CFLAGS) main.o -o hello main.o: main.c $(CC) -c $(CFLAGS) main.c clean: rm .o hello Variables can be very useful. To use them, just assign a value to a variable before you start writing your targets. After that, you can just use them with the dereference operator $(VAR). If you want to know more... With this brief introduction to Makefiles, you can create some very sophisticated mechanisms for compiling your projects. However, this is just the tip of the iceberg. More documentation is available here: Make documentation. Here is an example of a more automated Makefile that you might use in one of your projects (not required for this homework). CC=gcc # Compiler flags CFLAGS=-Wall # Linker flags LDFLAGS= # You can add multiple source files here separated with spaces SOURCES=main.c # Replace .c with .o creating a list of object files OBJECTS=$(SOURCES:.c=.o) # Name executable EXECUTABLE=hello all: $(SOURCES) $(EXECUTABLE) $(EXECUTABLE): $(OBJECTS) $(CC) $(LDFLAGS) $(OBJECTS) -o $@ # Rule that tells make how to make an object file out of a .c file .c.o: $(CC) -c $(CFLAGS) $< -o $@ clean: rm .o $(EXECUTABLE) What to submit Three makefiles (Makefile1, Makefile2, and Makefile3) you created to compile your hello example Part 3: Debugging programs with GDB On UNIX systems the main debugger is GDB (GNU debugger). To be able to comfortably debug your code compile it with the -g option which will instruct the compiler to generate debug symbols (variable names, source lines, etc.) for the program. For example, change your Makefile to have CFLAGS=-Wall -g -m32 -fno-pic This will compile your hello program with debugging symbols (`-g` flag), as a 32bit x86 executable (`-m32` flag), and for simplicity avoid generating position independent code ( `-fno-pic` flag). Then you can start you program under control of gdb gdb hello This starts gdb ready to execute your `hello` program. To get it running type the `run` command in the GDB command prompt (or `r` -- short for run): gdb\> run Now the program runs and finished printing "Hello world". GDB is a feature-rich debugger, and it will take you some time to learn all the features. Here is a couple of starting points: GDB tutorial, GDB intro and GDB Cheat Sheet. Probably, the best resource for this homework is* Operating Systems from 0 to 1. Chapter 6. Runtime Inspection and Debug (it is strongly recommended to read this chapter). At a high level you need only two main things: 1) breakpoints and 2) ability to examine data. Breakpoints can be set with the "b" command inside gdb. Breakpoints and single stepping Just to make debugging a bit more realistic lets add another function to our simple program. Lets change it to compute a sum of numbers from 0 to n. You can do this by implementing the following function unsigned long sum(int n) { int i; unsigned long sum = 0; for (i = 0; i < n; i++) { sum = sum + i; } return sum; } and calling it from main() int main(void) { unsigned long s; s = sum(100); printf("Hello world, the sum:%ld\n", s); return 0; } Running the programs on its own is not that useful. Lets try setting a breakpoint on the "main" function to examine what the program is actually doing. For that type `break` in the GDB command prompt or `b` and then run the program with `r` (gdb) break main Breakpoint 1 at 0x56b: file main.c, line 26. (gdb) r Starting program: ... Breakpoint 1, main () at main.c:26 26 s = sum(100); (gdb) The debugger stopped at the beginning of the `main` function (line 26 of `main.c`. You can examine the source code of the program by typing `list` or `l` (gdb) list 21 22 int main(void) { 23 24 unsigned long s; 25 26 s = sum(100); 27 printf("Hello world, the sum:%ld\n", s); 28 return 0; 29 } 30 Now you can execute the program line by line by typing `next` (execute next line, or `n` for short), and `step` (`s`) to step into a function. Try stepping into the `sum` function by running `step` (gdb) s sum (n=100) at main.c:13 13 unsigned long sum = 0; Here I'm inside the `sum` function, where I type `l` to list the source code, and then type `n` to execute it line by line (I type `n` once, and then simply hit "Enter" asking GDB to execute the last command for me (gdb) l 8 #include 9 #include 10 11 unsigned long sum(int n) { 12 int i; 13 unsigned long sum = 0; 14 15 for (i = 0; i < n; i++) { 16 sum = sum + i; 17 } (gdb) n 15 for (i = 0; i < n; i++) { (gdb) 16 sum = sum + i; (gdb) 15 for (i = 0; i < n; i++) { (gdb) 16 sum = sum + i; TUI: Graphical User Interface The second most useful feature is the TUI mode that turns GDB into a real modern debugger. Here is a couple of links to TUI. You can switch into TUI by pressing Ctrl-X and then "1", or start gdb in TUI mode right away gdb hello -tui Or by typing this in the gdb command prompt (this command doesn't work on Openlab, so you'll have to do `Ctrl-X` and then 1, but normally it works) (gdb) tui enable Start the program from the begginging and single step it with `n` and `s`. The source code of the program will be scrolling in the TUI window in the top part of the screen. Examining data You can print values of variables with "print", e.g., print the values of `i` and `sum` gdb\> p i gdb\> p sum Conditional breakpoints While debugging programs it's often useful to see what the program is doing right before it crashes. One way to do this is to step through, one at a time, every statement of the program until we get to the point of execution where we want to see what is happening. This works, but sometimes you may want to just run to a particular section of code and stop execution at that point so you can examine data at that location. GDB allows you to set conditional breakpoints. For example, lets break inside the loop of the `sum` function when the index `i` is equal to 50. I first list the source code to get the exact source lines and then set a brakpoint inside the `main.c` file at line 16 with `break main.c:16` (gdb) l 11 unsigned long sum(int n) { 12 int i; 13 unsigned long sum = 0; 14 15 for (i = 0; i < n; i++) { 16 sum = sum + i; 17 } 18 19 return sum; 20 } (gdb) break main.c:16 Breakpoint 2 at 0x56555543: file main.c, line 16. This breakpoint will trigger for every iteration of the loop. So if we want it to fire only when `i` is 50 we add the following condition. I use `2` as this is the number of the breakpoint that I set at line 16. (gdb) condition 2 i==50 I now continue execution of the program with the `continue` or `c` command. (gdb) c Continuing. Breakpoint 2, sum (n=100) at main.c:16 16 sum = sum + i; When the breakpoint is hit I check that the value of i is really 50 (gdb) p i $1 = 50 (gdb) Exploring crashes Now, lets take a look at how you can use GDB to debug your crashing programs. First, lets generate a program that crashes. Add a global variable `a[32]` to your program (it's an array of 32 integers). int a[32]; Lets then add a function that makes an out of bounds array access unsigned long crash_array(int n) { int i; unsigned long sum = 0; for (i = 0; i < n; i++) { sum = sum + a[i]; } return sum; } If you invoke this function with `n` larger than 31 it will crash (Note, you might get lucky and it will not crash, i.e., not all out of bounds accesses cause a crash in C programs). To be safe lets invoke it with n equal to 10,000. s = crash_array(100000); printf("crash array sum:%ld\n", s); If you make this program and run it, it will crash ./hello Hello world, the sum:4950 Segmentation fault (core dumped) Now, to understand the crash you can run it under gdb: (gdb) r Starting program: /home/aburtsev/doc/OS_Stuff/Flux/git/personal/classes/os-class/cs143a/hw/hello Hello world, the sum:4950 Program received signal SIGSEGV, Segmentation fault. 0x56555566 in crash_array (n=100000) at main-full.c:18 18 sum = sum + a[i]; You can use the `backtrace` (`bt`) command to look at the backtrace (a chain of function invocations leading to the crash): (gdb) bt #0 0x56555566 in crash_array (n=100000) at main-full.c:18 #1 0x565555ec in main () at main-full.c:45 Here, the GDB tells you that `crash_array` got a segfault at line 18 in `main-full.c`. You see that there are two stack frames available (0 for `main` and 1 for `crash_array`). You can use the `frame` (`f`) command to choose any of the frames and inspect it. For example, lets choose frame `#0` and list the crashing code with the `list` command (gdb) f 0 #0 0x56555566 in crash_array (n=100000) at main-full.c:18 18 sum = sum + a[i]; (gdb) l 13 unsigned long crash_array(int n) { 14 int i; 15 unsigned long sum = 0; 16 17 for (i = 0; i < n; i++) { 18 sum = sum + a[i]; 19 } 20 21 return sum; 22 } We know that line 18 is the crashing line. We can print the values of the local variable `i` (gdb) p i $1 = 35824 It is equal to 35824. This should give you enough information for why you crashed. Now fix the `crash_array` function to prevent the crash. What to submit The Makefile and the updated main.c program. Part 4: Exploring assembly code In this part of the homework you will explore assembly code of the `sum` function. Re-start GDB and set the breakpoint on the `sum` function (gdb) b sum Breakpoint 1 at 0x533: file main.c, line 13. (gdb) r Starting program: /home/aburtsev/doc/OS_Stuff/Flux/git/personal/classes/os-class/cs143a/hw/hello Breakpoint 1, sum (n=100) at main.c:13 13 unsigned long sum = 0; GDB can disassemble the code of the program with the `disassemble` (or `disas`) command (gdb) disas Dump of assembler code for function sum: 0x5655552d <+0>: push %ebp 0x5655552e <+1>: mov %esp,%ebp 0x56555530 <+3>: sub $0x10,%esp => 0x56555533 <+6>: movl $0x0,-0x4(%ebp) 0x5655553a <+13>: movl $0x0,-0x8(%ebp) 0x56555541 <+20>: jmp 0x5655554d 0x56555543 <+22>: mov -0x8(%ebp),%eax 0x56555546 <+25>: add %eax,-0x4(%ebp) 0x56555549 <+28>: addl $0x1,-0x8(%ebp) 0x5655554d <+32>: mov -0x8(%ebp),%eax 0x56555550 <+35>: cmp 0x8(%ebp),%eax 0x56555553 <+38>: jl 0x56555543 0x56555555 <+40>: mov -0x4(%ebp),%eax 0x56555558 <+43>: leave 0x56555559 <+44>: ret End of assembler dump. (gdb) Unfortunately, the default syntax for the disassembly is AT&T. So the first thing you do is switch to Intel syntax we were using in class. Invoke the `set disassembly-flavor intel` command and disassembly the function again (gdb) set disassembly-flavor intel (gdb) disas Dump of assembler code for function sum: 0x5655552d <+0>: push ebp 0x5655552e <+1>: mov ebp,esp 0x56555530 <+3>: sub esp,0x10 => 0x56555533 <+6>: mov DWORD PTR [ebp-0x4],0x0 0x5655553a <+13>: mov DWORD PTR [ebp-0x8],0x0 0x56555541 <+20>: jmp 0x5655554d 0x56555543 <+22>: mov eax,DWORD PTR [ebp-0x8] 0x56555546 <+25>: add DWORD PTR [ebp-0x4],eax 0x56555549 <+28>: add DWORD PTR [ebp-0x8],0x1 0x5655554d <+32>: mov eax,DWORD PTR [ebp-0x8] 0x56555550 <+35>: cmp eax,DWORD PTR [ebp+0x8] 0x56555553 <+38>: jl 0x56555543 0x56555555 <+40>: mov eax,DWORD PTR [ebp-0x4] 0x56555558 <+43>: leave 0x56555559 <+44>: ret End of assembler dump. (gdb) Your assignment is to explain every line of the assembly dump and submit it as a text file. Note, while you can understand what the assembly code is doing, you can also use GDB to step through the assembly code of the program. You can simply type (gdb) layout asm to switch TUI to the assmebly mode. Then you can use two new GDB commands: `nexti` (`ni`) and `stepi` (`si`) for stepping through the assembly instructions (i stands for "instruction", e.g., next instruction and step instrucion). (gdb) si You can do the split layout to show both assembly and the C source code windows at the same time. (gdb) layout split You can also show the registers on the screen (don't forget the list of TUI commands --- they can be very handy). (gdb) layout regs What to submit A text file that explains every line of the assembly dump. Part 5: Exploring calling conventions In this final part of the homework you will explore the calling conventions for passing the arguments on the stack. Re-start GDB and set the breakpoint on the `sum` function, and run the program. (gdb) b sum Breakpoint 1 at 0x533: file main.c, line 13. (gdb) r Starting program: /home/aburtsev/doc/OS_Stuff/Flux/git/personal/classes/os-class/cs143a/hw/hello Breakpoint 1, sum (n=100) at main.c:13 13 unsigned long sum = 0; (gdb) Use the `info` command to see the content of the registers (gdb) info regs Undefined info command: "regs". Try "help info". (gdb) info reg eax 0xf7fb0dd8 -134541864 ecx 0xffffc8c0 -14144 edx 0xffffc8e4 -14108 ebx 0x0 0 esp 0xffffc874 0xffffc874 ebp 0xffffc884 0xffffc884 esi 0xf7faf000 -134549504 edi 0x0 0 eip 0x56555533 0x56555533 eflags 0x286 [ PF SF IF ] cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x63 99 Use the `x` command to inspect the stack (gdb) x/24x $esp 0xffffc874: 0xf7faf000 0x00000000 0xf7e0760b 0xf7faf3fc 0xffffc884: 0xffffc8a8 0x56555572 0x00000064 0x00000001 0xffffc894: 0xffffc954 0xffffc95c 0x565555c1 0xf7fe59b0 0xffffc8a4: 0xffffc8c0 0x00000000 0xf7defe81 0xf7faf000 0xffffc8b4: 0xf7faf000 0x00000000 0xf7defe81 0x00000001 0xffffc8c4: 0xffffc954 0xffffc95c 0xffffc8e4 0x00000001 Explain every value from the dump that you get (similar to the one above but generated on your machine), i.e., which one is a local variable inside `sum`, which is the frame pointer, what are the arguments to `sum` and so on. The more values you can exaplain the better. Hints: start from `main`, i.e., set a breakpoint on `main` and see how the values on the stack are changing, step through `main` and see what gets pushed on the stack. Note: GCC generates the code that keeps the stack aligned at 16 bytes, hence instead of allocating 4 bytes for the local variable in `sum` it allocates 16. What to submit Submit a text file explaining every value on the stack. Submit your work Submit your solution through Gradescope Gradescope CS143A Principles of Operating Systems. Place each part of the assignment into folders with name part2, part3, part4 or part5, then pack it into a zip archive and submit it. Please name makefiles as Makefile1, Makefile2, Makefile3 for the part 2. Please submit .txt files for parts 4 and 5. You can resubmit as many times as you wish. If you have any problems with the structure the autograder will tell you. Part 4 and 5 of the assignment will be graded manually after the deadline. The structure of the zip file should be the following: / - /part2 - Makefile1 - Makefile2 - Makefile3 - main.c - /part3 - Makfile - main.c - /part4 - explanation.txt - /part5 - explanation.txt

Updated: October, 2019