Maintainer: admin
Review for quiz #2, to be held during class on Tuesday, October 16. Sections covered: up until the end of chapter 10 (approximately), omitting chapter 5.
Under construction
1Chapter 3: Processes¶
Starting from slide 3.37
- inter-process communication
- mailbox sharing solutions
- link: only two processes?
- only one process can execute receive operation at a time?
- round-robin scheme - sender throws it out randomly, some receiver gets it, sender gets notified as to who
- message passing: blocking or not, on part of sender or receiver
- queue of messages attached to link
- zero capacity: sender waits for receiver (rendez-vous)
- bounded (finite) capacity: sender waits if queue full
- unbounded (infinite) capacity: sender never waits, is this even possible?
- Examples
- POSIX shared memory
- process creates shared memory segment, returns an ID
- processes wanting access to that segment must attach to it by ID (call returns pointer)
- can write to that memory segment (using the pointer)
- can detach it from its address space when done
- Mach:
- message-based (everything! even system calls)
- all tasks are created with 2 mailboxes (which are created with
port_allocate()
: kernel, and notify - three system calls:
msg_send
,msg_receive
,msg_rpc
(remote procedure call, to execute something elsewhere)
- Windows XP:
- message-based using "local procedure call" (can only be used by local processes)
- communication channels established using ports (sort of like mailboxes)
- client opens handle to subsystem's connection port, sends a connect request
- server creates two private communication ports, sends one to client, keeps the other one
- corresponding ports are used by server and client to send/receive messages
- INSERT DIAGRAM LATER
- POSIX shared memory
- mailbox sharing solutions
- client-server communication
- sockets (endpoints for communication)
- IP address + port
- different services listen on different ports, e.g. ssh on 22, typical web servers on 80, etc
- remote-procedure calls
- allows clients to invoke a procedure on a remote host as if it were local
- client-side stub locates server, marshalls parameters, sends them over (underlying details abstracted)
- server-side stub receives messages, unmarshalls parameters, performs procedure
- INSERT DIAGRAM LATER (or describe it at least)
- remote method invocation (java only)
- java program on one machine can call a method on a remote java object
- basically the same as RMI except there are objects involved
- THERE IS A DIAGRAM HERE. it doesn't really say much though. look it over anyway
- sockets (endpoints for communication)
2Chapter 4: Threads¶
- Thread: unit of CPU utilisation, forms basis of multithreaded computers
- things unique to each thread: thread ID, program counter, register set, stack
- things threads from the same process share: code, data (statically-created variables), OS resources (open files, signals)
- single-threaded process: code + data + files, then registers + stack
- multi-threaded: code + data + files, then each thread gets its own registers and stack
- Benefits of threading
- Greater responsiveness (if one thread is blocked or busy, can keep going - e.g., webservers)
- Resource sharing (memory, other resources)
- Cheapter than creating new processes in terms of allocating memory and resources (context-switching is easier)
- Scalability: threads can run in parallel on multiple processors (I don't really see how this is best characterised as "scalability"; perhaps making full use of resources?)
- In the client-server model: client sends request, server spawns new thread to handle request, server resumes listening for requests
- Single vs. multicore
- on a single-core system, threads are allocated in a cycle: t1, t2, t3, t4, t1, t2, t3, t4 etc (this is what is meant by concurrency apparently?)
- on a multicore system, concurrency means that threads are actually executed at the same time. could be t1 t3 t1 t3 on one core, t2 t4 t2 t4 on the other, etc
- challenges of multicore systems:
- how to divide up activities? need to find parts of application that can be divided into smaller tasks that can run in parallel
- how to split data? must be divided to run on separate cores (? how does this make sense)
- data dependency: what if one task depends on the results of another?
- testing/debugging: many different execution paths
- user-level thread libraries:
- POSIX threads
- win32 threads
- Java threads
- Kernel threads: handled by the kernel
- Multithreading models, per process (user-to-kernel):
- many-to-one (many user-level, single kernel)
- examples: solaris green threads library, gnu portable threads
- thread management done by user space thread library (efficient)
- but only thread can access the kernel at a time, so if any user thread makes a blocking syscall entire process will block
- one-to-one (one kernel thread per user thread)
- examples: windows nt/xp/2000, linux, solaris 9+
- benefits: more concurrency (syscalls are not blocking for other threads)
- downsides: can result in decreased performance as you have to create many kernel threads, etc
- also, some OSes limit the number of kernel threads possible at any given time
- many-to-many
- examples: solaris before 9, windows nt/2000 with threadfiber package (how cute)
- many user threads mapped to the same nubmer of or fewer kernel threads
- OS can create as many user threads as it needs to, since it doesn't have to create a kernel thread for each one
- true concurrency: corresponding kernel threads can run in parallel on a multiprocessor (what does this have to do with anything)
- two-level model
- examples: IRIX, HP-UX, Tru64 unix, solaris before 9
- like the many-to-many model, except user threads can be bound to kernel threads as well (so a superset of many-to-many)
- many-to-one (many user-level, single kernel)
- thread libraries
- APIs for creating, managing threads
- can be implemented in user space
- calling a function in library results in a local call (not sys call) in user space
- or, using the kernel-level library provided by OS
- code, data structures of library exist in kernel space
- invoking APi function usually results in sys call
- pthreads: a POSIX standard for APIs for thread creation and sync
- provided as user-level or kernel-level
- the behaviour is specified by this standard, but it can be implemented differently by different OSes
- examples: Solaris, Mac OS X, FreeBSD, Linux (there is even a windows implementation)
- Java threads
- managed by the JVM, implemented using underlying thread library provided by OS
- these threads can be created by extending the
Thread
class or implementing theRunnable
interface
- various issues (and non-issues) relating to threads
- behaviour of
fork()
- should it duplicate only the calling thread or all threads of that process?
- some unix systems thus have two versions of
fork()
exec
, when invoked by a thread, will replace all the threads in that process. wait wtf? how? and why is this relevant to fork?
- thread cancellation (premature termination)
- how to deal with resource allocation, or if it's updating a data segment shared by other threads?
- asynchronous cancellation: terminates the thread immediately (bit of a misleading name)
- deferred cancellation: thread periodically checks if it should be cancelled, giving it the chance to die nobly
- aside: signal handling (i think this section should count as an aside, even though it has its own slide)
- signal handlers used to notify processes that specific events (e.g., kill events) have occurred
- event generates signal, signal gets delivered to process, signal is handled
- many options as to whom the signal is delivered to:
- only the relevant thread
- every thread in the process to which the relevant thread belongs
- certain threads in the relevant process
- a specific thread that receives all signals for the process
- thread pools
- create some threads, throw them into a pool in which they wait around for work to be doled out, the poor creatures
- faster to make an existing thread service a request than to create a new thread
- number of threads in application bounded by the size of the pool
- of course, this has the slight disadvantage of having to create all the threads at the very beginning (slower startup maybe)
- thread-specific data
- so that each thread can have its own copy of data
- supported by most threading libraries
- e.g., transaction system (?)
- scheduler activations
- what does this even meeeeeean
- both the many-to-many and the two-level models require communication to ensure correct number of kernel threads
- the mechanism for this communication is via upcalls (like the opposite of a system call?), from the kernel to the thread library, provided by "scheduler activations" whatever those are
- behaviour of
- threads in different OSes
- windows XP
- each app runs as a separate process
- each can have one or more threads
- thread library included in win32 api, one-to-one
- fiber library (many-to-many) available too
- each thread contains thread ID, register set, separate user/kernel stacks, private data storage area (last three collectively known as context)
- primary data structures of a thread:
- ethread: executive thread block (thread start address, pointer to parent block, pointer to kthread) - kernel space
- kthread: kernel thread block (scheduling and sync info, kernel stack, pointer to teb) - kernel space
- teb: thread environment block (thread identifier, user stack, thread-local storage) - user space
- linux
- called tasks not threads
- one-to-one
- threads created by calling
clone()
(allows child task to share address space of parent task, or process) - amount of sharing that takes place is controlled by flags
- CLONE_FS: file-system info shared
- CLONE_VM: same memory space
- CLONE_SIGHAND: share signal handlers
- CLONE_FILES: share open files (how does this differ from CLONE_FS?)
- NPTL (Native POSIX Thread Library) supported by most modern distros, POSIX-compliant
- windows XP
3Chapter 10: Filesystem interface¶
- attributes (kept in dir structure, which is on disk)
- name
- system identifier
- type
- location
- size
- protection (permissions)
- time, date, user identification
- operations
- create: allocate space in system, make new entry in directory
- write: given file name and info to be written, system searches for it, keeps a write pointer
- read: ^ but read pointer
- reposition in file: seek, pointer value changes
- delete: find file, release space, delete dir entry
- truncate: delete contents, keep attributes
open()
system call avoids constant searching for dir entry (saves it in memory;close()
frees)- takes file name, copies dir entry into open file table, returns pointer
- per-process table (pointer, rights), system-wide (location, access dates, file size)
- to manage open files, need pointer, open count, location, access rights
- access methods
- sequential
- direct
- index
- can simulate sequential on direct
- direct on sequential is inefficient
- storing files on disks
- partitions
- raw or formatted
- RAID protection
- volume = entity with filesystem
- directory structure
- collection of nodes pointing to files
- operations
- search for file
- create
- delete
- list
- rename
- traverse
- single-level directory for all users: dumb
- two-level, one for each user: bad for cooperation
- tree-structured: each file has unique path name
- acyclic graph
- hard links?
- multiple abs path names
- reference list: delete files when no references to it exist
- self-reference?
- check that there are no cycles
- mounting
- sharing, etc, nothing special
- failure
- directory corruption, disk-controller failure, etc
- types of access (for right management):
- read
- write
- execute
- append
- delete
- list