Due: Friday, October 12Monday, October 15
This assignment can be completed in teams of two or three. Before you start work, please fill out the group signup form, even if you intend to work alone. This is necessary to initialize a group repository with the starter code for the assignment.
You can find your group number on the group info file, once your repository has been initialized.
Introduction
In this assignment, you will extend OS/161's thread (process) system and will implement several system calls to allow OS/161 to support more interesting user programs. These extensions modify shared OS data structures, so you will need to add appropriate synchronization to keep the OS's state consistent.
To complete this assignment, you will need to be familiar with the OS/161 thread code. The thread system provides interrupts, control functions, and synchronization (semaphores, locks, and condition variables). You will be extending the thread system to provide additional, higher-level functions found in most thread libraries and will use that functionality to implement the process management system calls getpid, waitpid, and kill.
Of course, these system calls are not very interesting without a way to create new processes! To help with that, we have given you the implementation of fork.
Beginning the Assignment
Documenting Design Decisions
We recommend documenting how you will manage the project -- how you will identify jobs which have been completed (or which remain), specific problems you have encountered, design decisions you have made, and discoveries/insights you've encountered. A simple text file in SVN may suffice, but we encourage you to use DrProject's wiki and ticketing system. We strongly recommend that you collect your notes in your wiki (or a text file) as you go. Then, summarize your notes for the final submission.
In your final submission, you must provide a well-organized design document discussing your solution to each part of this assignment and answering the questions raised at the end of each section. Name your document "design.txt" and place it in the a2 directory. You are welcome to submit a pdf file instead of a text file. You may include diagrams / drawings if it helps to clarify your design.
If you are unable to complete the assignment, you should add an extra section to your design document that clearly describe the remaining problems for any part of your system that does not meet the requirements.
Setting Up Starter Code
Your repository should contain a new a2 folder containing a src subdirectory with the complete OS/161 source tree to use for this assignment. You will need to re-run "./configure --ostree=`echo <path-to-root>` --werror" in the OS/161 src directory, since you're using a different source tree. The "--werror" flag configures the Makefiles to tell the compiler to treat warnings as errors, stopping the compilation as soon as a warning occurs. Compiler warnings are your friends, pay attention to them and fix them immediately!
You will also need to reconfigure your kernel before you begin work on this assignment. The procedure for configuring a kernel is the same as in the first assignment, except you will use the ASST2 configuration file. If you have set up a .gdbinit file in your OS/161 installation root directory with a "dir" command, remember to update it to point to the new source code.
Speeding up Testing
The simulator allows you to pass arguments to the kernel from the command line. These arguments are treated as commands entered at the menu prompt. You may find it convenient to use this feature to automate some tests. To pass arguments to the kernel from the command line, pass a string of commands (with a semicolon separating each command) as the second parameter to sys161. For example, to have the kernel run the array test and then quit, type:
sys161 kernel "at;q"
Code Reading
Place the answers to the code reading questions in a text file called code-reading.txt in your a2 directory. Please answer the questions using full sentences, and refer to the code (file and function) when appropriate.
- What happens to a thread when it exits (i.e., calls thread_exit() )? What about when it sleeps?
- What function(s) handle(s) a context switch?
- How many thread states are there? What are they?
- What does it mean to turn interrupts off? How is this accomplished? Why is it important to turn off interrupts in the thread subsystem code?
- What happens when a thread wakes up another thread? How does a sleeping thread get to run again?
- Semaphores are implemented using spinlocks. Where are spinlocks implemented? And what does it mean to get the data for a spinlock?
- Are OS/161 semaphores "strong" or "weak"?
- Why does the lock API in OS/161 provide lock_do_i_hold(), but not lock_get_holder() ?
- What are the ELF magic numbers?
- What is the difference between UIO_USERISPACE and UIO_USERSPACE? When should one use UIO_SYSSPACE instead?
- Why can the struct uio that is used to read in a segment be allocated on the stack in load_segment() (i.e., where does the memory read actually go)?
- In runprogram(), why is it important to call vfs_close() before going to usermode?
- What function forces the processor to switch into usermode? Is this function machine dependent?
- The mips_trap function handles all exceptions. Is it possible to determine whether the exception being handled was triggered during user execution or kernel execution?
Extending the Thread System
For this part of the assignment, you will extend the kernel thread subsystem to allow one thread to wait for another to exit. The starter code includes a simple thread id management system (see pid.c in src/kern/thread and pid.h in src/kern/include). We have added code that uses these functions to allocate an identifier when a new thread is created in thread_fork.
Adding Synchronization to the PID System
Currently, this thread id management code contains no synchronization. Thus, if two threads were executing thread_fork concurrently, it would be possible for both new threads to be assigned the same thread id. This is obviously undesirable, so your first task is to add appropriate synchronization to pid.c.
You should think of pid.c as implementing a monitor that controls access to the shared pid data structures. All access to the pid data structures (including querying or modifying any fields) must be done through a function provided by pid.c. The functions that start with pi_ are internal; they should only be called from within pid.c. The functions that start with pid_ can be called externally and are entry points for the monitor; they are also declared in pid.h. You must ensure that at most one thread can be active in the pid monitor at any time.
We have included a lock for the pid monitor already, and all the implemented pid_ functions (the entry points for the monitor) correctly acquire this lock at the start of the function and release it before returning.
Later, as you extend the thread system, you will need to add additional external functions to pid.c to respect the monitor requirement. Make sure that these functions are also appropriately synchronized.
Adding Synchronization to the Thread System
You must implement or modify the thread functions below. The implementations of these four functions must all work together, so read and think about all of them before starting to work on any of them. Since you are extending the thread subsystem, place any new code in kern/thread/thread.c or kern/thread/pid.c and remember to add prototypes to kern/include/thread.h or kern/include/pid.h.
- int thread_join(pid_t pid, int *status)
thread_join suspends the execution of the calling thread until the thread identified by pid terminates by calling thread_exit.
If status is not NULL, the exit status of thread pid is stored in the location pointed to by status. The thread pid must be in the joinable state; it must not have been detached using thread_detach.
When a joinable thread terminates, its thread descriptor (thread id) and exit status must be retained until another thread performs thread_join (or thread_detach) on it.
Only the creator (parent) of a thread is allowed to call thread_join on it. Anything else returns an error.
Return Value: On success the exit status of thread pid is stored in the location pointed to by status, and 0 is returned. On error, a non-zero error code is returned.
Errors:- ESRCH: No thread could be found corresponding to that specified by pid.
- EINVAL: The thread has been detached.
- EINVAL: The caller of thread_join is not the parent of the thread specified by pid.
- EDEADLK: The pid argument refers to the calling thread. (You will need to add this error to errno.h, and a corresponding message to errmsg.h)
Hint: To decide whether it is necessary to wait or not, and to retrieve the exit status of the specified thread, you will need to examine values that are internal to the pid monitor. - int thread_detach(pid_t pid)
thread_detach puts the thread pid in the detached state. This means that the thread cannot be joined, and the thread descriptor and exit status can be discarded immediately when pid terminates. If pid has already exited when thread_detach is called, the thread descriptor should be reclaimed as part of handling thread_detach.
When a thread exits, thread_detach should be called on all of its children.
Return Value: On success, 0 is returned. On error, a non-zero error code is returned.
Errors:- ESRCH: No thread could be found corresponding to that specified by pid.
- EINVAL: The thread pid is already in the detached state.
- EINVAL: The caller is not the parent of pid
- thread_exit
Modify thread_exit so that it stores the new exit status argument in its pid struct, and synchronizes correctly with the joining thread (if any). The exiting thread should also call thread_detach for all of its children. - thread_fork
We have changed the prototype and implementation of thread_fork so that it returns (via pass by reference) a thread id (pid) rather than a pointer to the thread structure itself. If the parent does not want to know the thread id (i.e., the last argument to thread_fork is NULL), you should place the new thread in the detached state immediately. (If the parent does not know the thread id, then it will not be able to call thread_join for the new thread).
You will probably find it convenient to implement most of these functions by calling a suitable function of the pid monitor (which you may need to write). As you are implementing these functions, consider which module should have responsibility for the functionality.
Using thread_join and thread_detach
Last assignment, several of you noticed that the output of printchartest was displayed after the prompt. The OS/161 menu thread forks new kernel threads to handle certain commands entered at the prompt. Use thread_join to make the menu thread wait until the child thread has finished. In addition, add support for "&"; any command that is followed by a "&" should be detached so that the menu thread does not wait for it.
Documentation
In your design document, describe your solution to the join/detach/exit synchronization problem. Describe what will happen if (a) the join happens before the exit, (b) the detach happens before exit, (c) exit happens before join, and (d) exit happens before detach. Are there other possibilities?
System Calls
In this section, you will implement getpid, waitpid, and kill.
It's crucial that your syscalls handle all error conditions gracefully (i.e., without crashing OS/161) and return the correct value (in case of success) or error code (in case of failure). Check the man pages linked above to get a description of the system calls and the errors they need to handle. If there are conditions that can happen that are not listed in the man page, return the most appropriate error code from kern/errno.h. If none seem particularly appropriate, consider adding a new one.
If you're adding an error code for a condition for which Unix has a standard error code symbol, use the same symbol if possible. If not, feel free to make up your own, but note that error codes should always begin with E, should not be EOF, etc. Consult Unix man pages to learn about Unix error codes; on Linux systems "man errno" will do the trick.
Also, be aware that the OS/161 kernel is preemptive, meaning that it is possible for two different user-level processes to be executing a system call at the same time. This means that any system resources need to be protected by appropriate synchronization.
Recall that the file usr/include/unistd.h contains the user-level interface definition of the system calls that you will be writing for OS/161 (including ones you will implement in later assignments). This interface is different from that of the kernel functions that you will define to implement these calls. Note that the user-level interface is already defined for the system calls you will write in this assignment, but you might want to read this file to see the type and number of arguments that will be passed to the OS.
getpid(), waitpid()
A pid, or process ID, is a unique number that identifies a process. The implementation of getpid() should be straightforward, since we have already taken care of allocating thread ids when a new thread is created (since every user-level process executes in the context of a single kernel thread). Similarly, your waitpid() system call should map nicely to the thread_join function in the kernel thread system.
kill()
The kill system call gives a process a way to send a signal to another process. It is often used to terminate another process (hence the name), but it is also the system call that is used to send any signal. For this assignment, we allow any process to issue a kill system call for any other process, rather than restricting it to parent/child interactions. (For example, the parent of a runaway process may have already exited without waiting for it, but a user would still like a mechanism for stopping it!) In real systems, a process must have suitable permissions to kill another process (i.e., they belong to the same user, or the process issuing the kill belongs to a super-user). Since OS/161 does not support multiple users, it would not make sense to try to implement these restrictions.
kill takes two parameters. The first is the pid of the process that is the target of the signal, and the second is the signal to send. 31 signals are defined, but only a few are implemented in OS/161. See the table of signals in the kill man page to see which signals are implemented. The implemented signals can terminate a process, be ignored, cause it to stop executing (to be removed from the queue of runnable processes), or cause it to be eligible for execution again. (Hint: Take a look at the thread_switch function.)
The main challenge in implementing this call is that you cannot simply stop or destroy the specified thread immediately. To understand why, consider the possibility that the child process may itself be in the middle of a system call when the parent issues the kill. This means that the child may be holding system resources, such as kernel locks. If we destroy the child's thread structure and never run it again, then those resources will never be released. Locating all resources held by the child is extremely difficult, and even if you could identify them all, just making them available again may leave things in an inconsistent state.
The alternative used in real systems (and which you will implement for OS/161) is for the kill() system call to set a flag in the target thread indicating that it has been sent a "kill" or "stop" signal. Every thread checks this flag before returning to userspace, and if the flag has been set, it exits instead of returning.
Hint:You will find it useful to support the setting and checking of this flag through additional functions added to pid.c. Recall also that all exceptions go through mips_trap in kern/arch/mips/locore/trap.c, making this a good place to check things before returning to userspace. Recall also that an exception (like a timer interrupt) may occur in the middle of a system call causing a recursive call to mips_trap. Hence, not every return from mips_trap is a return to userspace.
We have supplied new user-level test programs named "killtest" and "continuetest". Read the code in testbin/killtest/killtest.c and testbin/continuetest/continuetest.c to understand what these tests do. You will need to use good DEBUG messages and to have those messages on to know that the tests are working correctly.
Documentation
Once again, discuss how you handle each of these system calls in your design document. Include answers to the following questions about the kill system call (the first two depend on your implementation, which you should describe in answering the questions):
- What happens if you kill a zombie?
- What exit code is used by a thread that exits because it was killed?
- What happens if a process waits (using the waitpid() system call) for a process that was the target of a kill() system call?
- Can a kernel thread use the same mechanism as the kill() system call to terminate another kernel thread?
- The killtest program forks a user-level process that executes an infinite loop (i.e the code contains "while (1) {}"), and its parent issues a kill() system call to end it. Why does this work? That is, why does the process return to userspace if it contains no code that enters the operating system?
Make sure you identify where to find your code modifications in the OS/161 source. To make life a bit easier for the TAs, please put the system calls (sys_getpid, sys_waitpid, and sys_kill)in src/kern/syscall/proc_syscalls.c.
Marking Guidelines
Your submission will be marked based on correctness, style and efficiency, and your ability to explain your design decisions:
- Correctness (70%): All of your thread system extensions and system calls should perform the specified work, return the correct return values and error codes, and handle error cases gracefully.
- Style and Efficiency (15%): Your code should be readable and well structured. Functionality should be placed in the correct module and should not be duplicated.
- Design Document and Code Reading (15%): The design document should discuss all of the required topics in a clear, professional style. Avoid unnecessary technical detail; your audience is a manager who does not need specific information about your implementation -- just a high-level idea of the trade-offs you have made. Grammar and composition count.
Submission
Submit your assignment by checking in a final version to your SVN repository. Make sure to:
- Review your code for good style and readability.
- Verify that your code compiles without warnings. (Compiler warnings are your friends!)
- Proofread your design document and confirm that it addresses the requested topics.
- Confirm that thread_join, thread_detach, sys_getpid, sys_waitpid, and sys_kill are implemented and that the menu, thread_exit, and thread_fork have been augmented with the appropriate functionality.
- Check that all files are named correctly and that your submission is completely contained in the a2 directory of your group's repository.
- Add the message "Assignment 2 Complete" to your final check-in to inform us that your assignment is ready for marking. (If you don't include this message, we'll mark the last check-in, but adding this message will help us complete the marking faster, since we will be able to start before your grace days are exhausted.) The commit time of your submission determines how many grace days are used.