143A Operating Systems


Home
Homework 2: Shell This assignment will teach you how to use the Unix system call interface and the shell by implementing a small shell, which we will refer to as the 143A shell. You can do this assignment on any operating system that supports the Unix API (Linux Openlab machines, your laptop that runs Linux or Linux VM, and even MacOS, etc.). You don't need to set up xv6 for this assignment Submit your programs and the shell through Gradescope (see instructions at the bottom of this page). NOTE: YOU CANNOT PUBLICLY RELEASE SOLUTIONS TO THIS HOMEWORK. It's ok to show your work to your future employer as a private Git repo, however any public release is prohibited. For Mac / OSX users. The support of 32 bit applications is depricated in the latest version of your system. So if you already updated your system to macOS Catalina or have updated your XCode then we recommend you to do the homework at the Openlab machines. NOTE: We are aware that there are several tutorials on writing shells online. This assignment itself borrows heavily from Stephen Brennan's blog post. We strongly encourage you to do this assignment without referring to the actual code in those implementations. You are welcome to look at broad concepts (which we also try to explain here), but the actual implementation should be your work. NOTE: We recently were made aware of the GNU readline library. Bash (and other shells) rely heavily on it for auto-complete, moving the cursor around when entering input, and even reverse-search. For those interested, this is a really interesting read on the history of `readline`. For the purposes of this assignment, using `readline` is not allowed, as it would make several implementation details entirely trivial. We want you to learn by implementing a shell, including it's intricacies. TIP: While building this assignment, several parts, like adding support for I/O redirection and pipes might not be immediately obvious, and are quite involved. We encourage you to take a look at xv6's shell to get design clues (sh.c). Note however, that you cannot take the xv6 implementation and submit it (or any other submissions from previous years). You might pass all the test cases, but you will receive a 0 on this assignment if you don't submit what is entirely your work. We will build shell in the following parts: 1) Reading and parsing a command, 2) Executing programs, 3) Implementing support for I/O redirection, and 4) Implementing support for pipes. To start, this is a skeleton of a simple UNIX shell. #include <stdlib.h> #include <unistd.h> #include <stdio.h> #include <fcntl.h> #include <string.h> #include <assert.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/wait.h> #include <errno.h> int main(int argc, char argv) { sh_loop(); return EXIT_SUCCESS; } As we will see, the shell is a program that essentially waits for user input, executes commands, and repeats. We will keep our shell simple, by just calling a function `sh_loop`, that loops indefinitely, reading, interpreting and executing commands. Typically a shell does much more (steps related to initialization, configuration, termination, shutdown and so on). If you put the above snippet into a file `sh.c`, you can compile it with a C compiler, such as gcc. On Openlab machines you can compile it with the following command: $gcc sh.c -o sh143a Here gcc will compile your program as `sh143a`. (Note that the above file won't compile, as we have not definted `sh_loop` yet). In the rest of this part of the assignment you will convert `sh.c` into a shell. The basics The main job of a shell is to execute commands. One way to break this down is: Read commands from the standard input. Parse the command string by separating it into a program string and its argument string. Execute** the program, passing to it the appropriate arguments. The `sh_loop()` function, hence can look something like the following. void sh_loop(void) { char line; char args; int status; do { printf("143A$ "); line = sh_read_line(); args = sh_split_line(line); status = sh_execute(args); free(line); free(args); } while (status); } It runs in a loop, and it provides a prompt to the user every time the loop executes: 143A$ Once the user enters a command, it calls `sh_read_line` to read the command, `sh_split_line` to parse it, and finally `sh_execute` to execute the command. It then loops back, trying to do the same thing all over again. Note here that the termination of the loop is dependant on the `status` variable, which you will have to set appropriately when you write the `sh_execute` function. Reading a line We do not want to test you on your skills with reading and parsing lines in C, which can be quite involved if one wants to handle several possible error situations. Hence, we provide you with a template for `sh_loop()` below. The shell has to read characters from stdin into a buffer to parse it. The thing to note is that you cannot know before hand, how much text a user is going to input as a command, and hence, you cannot know how much buffer to allocate. One strategy is to start with an allocation of small size using `malloc`, and then reallocate if we run out of memory in the buffer. We can use `getchar()` to read character by character from stdin in a `while` loop, until we see a newline character, or an EOF character. In case of the former, return the buffer which has been filled by command characters until this point, after null-terminating the buffer. In case of an EOF it is customary to exit the shell, which we do. Note that an EOF can be sent using `CTRL_D`. We encourage you to try out writing your `sh_read_line` function using `getchar()` as mentioned above, which is a good learning opportunity. More recently however, the `getline` function was added as a GNU extension to the C library, which makes our work a lot easier. char sh_read_line(void) { char line = NULL; size_t bufsize = 0; // have getline allocate a buffer for us if (getline(&line, &bufsize, stdin) == -1) { if (feof(stdin)) // EOF { fprintf(stderr, "EOF\n"); exit(EXIT_SUCCESS); } else { fprintf(stderr, "Value of errno: %d\n", errno); exit(EXIT_FAILURE); } } return line; } We have given an implementation of the parser for you, but make sure you understand what `getline` is doing. Parsing the line Now that we have the line inputted by the user, we need to parse it into a list of arguments. We won't be supporting backslash or quoting in our command line arguments. The list of arguments will be simply be separated by whitespace. What this means is a command like `echo "hello world"`, will be parsed into 3 tokens: `echo`, `"hello`, and `world"` (and not into 2 tokens `echo`, and `hello world` as it should be ideally). That being said, the parser, `sh_split_line`, should split the string into tokens, using whitespace as the delimiter. `strtok` comes to our rescue: #define SH_TOK_BUFSIZE 64 #define SH_TOK_DELIM " \t\r\n\a" char sh_split_line(char line) { int bufsize = SH_TOK_BUFSIZE; int position = 0; char *tokens = malloc(bufsize sizeof(char )); char token, *tokens_backup; if (!tokens) { fprintf(stderr, "sh: allocation error\n"); exit(EXIT_FAILURE); } token = strtok(line, SH_TOK_DELIM); while (token != NULL) { tokens[position] = token; position++; if (position >= bufsize) { bufsize += SH_TOK_BUFSIZE; tokens_backup = tokens; tokens = realloc(tokens, bufsize sizeof(char )); if (!tokens) { free(tokens_backup); fprintf(stderr, "sh: allocation error\n"); exit(EXIT_FAILURE); } } token = strtok(NULL, SH_TOK_DELIM); } tokens[position] = NULL; return tokens; } At the start of the function, we begin tokenizing by calling `strtok()` which returns a pointer to the first "token". What `strtok()` actually does is return pointers to within the string you give it (we call that pointer `token`), and places a null terminator `\0` at the end of each token. We store each pointer in an array (buffer) of character pointers called `tokens`. Finally, we reallocate the array of pointers if necessary. The process repeats until no token is returned by `strtok()`, at which point we null-terminate the list of tokens. Part 1: Executing programs (30 Points) NOTE:* For the rest of this assignment, you will be doing all the implementation. You are free to modify any functions that we provide, including their signatures. What we provide is a template which we encourage you to use, as we expect it will make things easier for you. Now, finally we can come to the part where we make our tiny shell do what it was created for: starting to execute programs! By now, our shell should start and offer the user a prompt: 143A$ In this part of the assignment, you have to extend the shell to allow simple execution of external programs, for instance `ls` : 143A$ ls bar.txt foo.txt sh143 sh.c 143A$ The execution of programs, of course, is handled by the `sh_execute` function. int sh_execute(char args) { if (args[0] == NULL) { return 1; // An empty command was entered. } return sh_launch(args); // launch } You should do this by implementing the `sh_launch` function. Use the UNIX interface that we've discussed in class (the functions to cone processes, i.e., `fork()`, executing new processes, i.e., `exec()`, working with file descriptors i.e., `close(), dup(), open(), wait()`, etc. to implement the various shell features. Remember to return an appropriate return value from `sh_launch` as the main loop `sh_loop` depends on it. Feel free to modify how you use the `status` variable in `sh_loop`. Print an error message when `exec` fails. You might find it useful to look at the manual page for `exec`, for example, type $man 3 exec and read about `execv`. NOTE: When you type `ls` your shell may print an error message (unless there is a program named `ls` in your working directory or you are using a version of `exec` that searches `PATH`, i.e., `execlp(), execvp(), or execvpe())`. Now type the following: 143A$ /bin/ls This should execute the program `/bin/ls`, which should print out the file names in your working directory. You can stop the 143A shell by inputting `CTRL_D`, which should put you back in your computer's shell. You may want to change the 143A shell to always try `/bin`, if the program doesn't exist in the current working directory, so that below you don't have to type "/bin" for each program, or (which is better) use one of the exec functions that search the `PATH` variable. Your shell should handle arguments to the called program , i.e. this should work 143A$ ls /home aburtsev 143A$ TIP: In GDB, if you want to debug child processes, `set follow-fork-mode child` is sometimes useful. This is a good reference . Part 2: I/O redirection (30 Points) Now that you can execute commands, let us extend the features our shell provides. Now you have to implement I/O redirection commands so that you can run: 143A$ echo "143A is cool" > x.txt 143A$ cat < x.txt 143A is cool 143A$ You should extend `sh_execute` to recognize "`>`" and "`<`"characters. Remember to take a look at xv6's shell to get design clues. You might find the man pages for `open` and `close` useful. Make sure you print an error message if one of the system calls you are using fails. Part 3: Pipes (40 Points) Finally, you have to implement support for pipes so that you can run command pipelines such as: 143A$ ls \| sort \| uniq \| wc 11 11 85 143A$ You have to extend `sh_execute` to recognize "`\|`". You might find the man pages for `pipe`, `fork`, `close`, and `dup` useful. Test that you can run the above pipeline. The `sort` program may be in the directory `/usr/bin/` and in that case you can type the absolute pathname `/usr/bin/sort` to run sort. (In your computer's shell you can type `which sort` to find out which directory in the shell's search path has an executable named "sort".) From one of the Openlab machines you should be able to run the following command correctly (here a.out is your 143A shell): $ a.out < t.sh Submit your work Submit your solution through Gradescope Gradescope CS143A Operating Systems. Pack your shell, `sh.c` into a zip archive and submit it. Please name the C file `sh.c`. You can resubmit as many times as you wish. If you have any problems with the structure the autograder will tell you. The structure of the zip file should be the following: / - sh.c Challenge exercises (total extra 50%, 10% each) The shell we have built is very simple. It does not support built-in commands, like `cd`, `history`, etc. It does not support providing a list of commands, or running jobs in the background. There is no support for globbing, quoting or backslash escaping, to name a few important features typical in shells. You can add any** feature of your choice to your shell. But, you may want to consider the following as a start: Support for `cd`. It is a useful exercise to figure out how why `cd` doesn't work when provided as a command line argument to our shell, and make it work. 143A$ pwd /home/harishankarv/cs143a/hw2/ 143A$ cd ../hw1 143A$ pwd /home/harishankarv/cs143a/hw1/ Support for command history. `history` is another built-in shell command which displays a history of the commands entered in the current session of shell invocation. Note that using the GNU `readline` library is not allowed. 143A$ perl 143A$ dos2unix 143A$ history 1 perl 2 dos2unix 3 history Support for globbing. Shells typically support globbing, which looks for the `` and `?`, etc. pattern matchers in the command and perform a pathname expansion and replace the glob with matching filenames when it invokes the program. cp .jpg /some/other/location will copy all files with .jpg in the current directory to `some/other/location` Support for a list of commands separated by a `;`. You can usually run a list of commands in one line in most of the popular shells around, by separating the commands by a `;` : cmd1 ; cmd2 ; cmd3 Support for runing commands in the background using `&`. One can typically ask the shell to run a command in the "background" by appending a `&` at the end. The command is then run as a job, asynchronously. cmd arg1 arg2 &

Updated: April, 2020