Wikinotes

Maintainer: admin

Review for quiz #3, to be held during class on Tuesday, November 20. Chapters covered: 8 and 9.

The content on this page includes summaries of the contents of the slides posted by Professor Liu, in addition to user-created supplementary questions and answers.

background
- for a program to run (as a process), it must be read from disk into memory
  - recall: the only storage media the CPU can access directly are main memory and registers (with the CPU cache between registers and main memory)
  - accessing registers can be done within one clock cycle
  - accessing main memory can take many clock cycles
- each process has a separate memory space, defined by base and limit registers
  - base: smallest legal physical memory address
  - limit: the size of the address space
- processing of a user program:
  - the variables used are symbolic addresses in the source program
  - the source program is then compiled/assembled to an object module
  - the compiler binds the symbolic addresses to relocatable addresses ("x bytes from start of module")
  - loader/linkage editor then binds the relocatable addresses to absolute addresses
  - in fact, final-address-binding can happen at various stages
    - at compile time, if the memory location is known beforehand, absolute code can be generated (although if the starting location changes, code must be recompiled)
    - at load time, if relocatable code is generated by the compiler
    - at run time, if the process can be moved (in memory) during execution (needs hardware - MMU - support)
- logical vs. physical address spaces
  - logical address: virtual
  - physical address: the one used by the memory unit (like latitude/longitude vs. street names, etc)
- memory management unit (MMU)
  - hardware device mapping virtual to physical addresses (dynamically)
  - uses TLB
  - allows user programs to deal with virtual rather than physical addresses
  - makes use of a relocation register: given a virtual address, it adds the value of that register to it, then uses that as the physical address
- dynamic loading
  - routines are not loaded into memory until they are needed (called)
  - results in better memory usage
  - useful for things like error routines (large amounts of code, called infrequently)
  - can be handled by the process itself - the calling routine will call a relocatable linking loader to load the desired routine in memory if it is not already in it
  - no special support from the OS is needed
  - however, it falls on the programmer to make use of dynamic loading in their programs
- dynamic vs static linking
  - static: system libraries are treated like any other object module and are combined by the loader into the binary
  - dynamic: the linking happens at execution time
  - advantage: more space efficient
  - a code stub handles locating the library if it's in memory already, or loading it if it is not
  - the stub then replaces itself with the address of the routine for the library, which is then executed
  - needs support from the OS, to check if the routine is in the process' memory address (what does this mean???)
  - useful for libraries
swapping
- process can be temporarily swapped to disk (known as the backing store), then brought back into memory to continue execution
- roll out, roll in: lower-priority processes are swapped out in favour of higher-priority processes
- the time it takes to swap something to disk is proportional to the amount of memory swapped (as the limiting factor is the transfer time)
- ready queue: holds the list of ready-to-run processes which have memory images on disk
contiguous memory allocation
- main memory is usually divided into two partitions: a "high" partition, and a "low" partition
- since the interrupt vector is usually in low memory, the OS is usually placed in low memory as well
- then, user processes are placed in high memory
- relocation registers (base, limit) used to protect processes
- if the logical address is less than the limit register, then you add the base register to it to get the physical address
- otherwise you get a trap (addressing error, maybe segfault)
- multiple-partition allocation
  - hole: block of available memory
  - holes of various sizes are scattered throughout memory, etc
  - when a process begins execution, it is allocated memory from a large-enough hole
  - OS keeps info on which partitions are free and which have already been allocated
  - ways of satisfying requests:
    - first-fit: allocate the first hole that is big enough
    - best-fit: allocate the smallest hole that is big enough (requires searching the entire list, unless it has been ordered by size)
    - worst-fit: allocates the largest hole that is big enough (same as above)
    - usually first-fit and best-fit use storage more effectively
    - first-fit is typically fastest
- fragmentation
  - external fragmentation: there is enough space in all of memory to satisfy a request, but it is not contiguous (due to processes being loaded and removed from memory, etc)
  - internal fragmentation: when the amount of memory allocated is larger than that requested, so there's some space within partitions that is not being used
  - external frag. can be reduced by compaction (shuffling memory contents to place all the free blocks together)
  - compaction is only possible if relocation is dynamic (done at execution time)
paging
- allows the physical address space of a process to be noncontiguous
- physical memory is divided into fixed-size blocks called frames (size between 2⁹ and 2²⁴ bytes per page)
- virtual memory is divided into fixed-size blocks called pages
- OS keeps track of all free frames
- to run a program, you need as many free frames as it needs lol
- then you have a page table to translate virtual to physical addresses
- the virtual address consists of two parts:
  - page number: used as an index into the page table, contains base address of each page in physical memory
  - page offset: combined with base address, defines the physical memory address
- if the page size is 2ⁿ bytes, the page offset is n bits
- if there are 2^m possible virtual addresses, then the page number is m-n bits
- so in virtual memory you have pages, and each page corresponds to a frame in physical memory
  - that mapping is defined in the page table (so index 0 corresponds to frame 1)
  - then, a thing in a page corresponds to the same location in the frame itself, etc
- from virtual address to fetching the thing in memory, it goes like:
  - the page table index is used to look up the frame index via the page table
  - the page offset is added to the end of the frame index
  - goes to that location in physical memory, fetches the relevant byte
- implementation of the page table
  - one scheme: kept in main memory
    - page table base register: starting address of the page table
    - page table length register: size of it
    - in this scheme, accessing anything requires two memory accesses: one to find the page table, then another to get the thing itself (which is inefficient)
  - another scheme: special hardware caches to speed up lookup
    - associative memory or translation lookaside buffers (cache for the page table)
    - if the desired page is not in the TLB, then go to memory, etc
- calculating the effective access time when looking up something
  - let $\epsilon$ be the amount of time it takes to look up a page in the TLB
  - assume it takes 1 unit of time to look up something in memory
  - hit ratio for TLB: $\alpha$
  - time taken if the page is in the TLB: $(1 + \epsilon)$ since you look it up in memory afterwards
  - time taken if it's not: $2 + \epsilon$ since you have to look it up in the TLB anyway, then access memory to find the page table translation, then again to find the thing itself
  - then effective access time is $(1 + \epsilon) \alpha + (2+\epsilon)(1-\alpha) = 2 + \epsilon - \alpha$
- memory protection
  - protection bit associated with each frame
  - corresponding valid/invalid bit attached to each entry in the page table
  - if the page is in the process' logical address space, then it's valid; otherwise, invalid
  - for example, if that page is not being used by the process, then it would be marked invalid
  - note that page tables are per-process ..........
- shared pages
  - common code can be shared in a paging scheme
  - read-only (reentrant) code shared among multiple processes (maybe different users)
    - must appear in the same location in the logical address space of all processes (?)
    - i.e., the frames are shared (so the page tables for these different processes have some overlap)
    - reentrant: non-self-modifying, can never change during execution
  - processes can also have their own private code and data, can appear anywhere in virtual memory etc
structure of the page table
- hierarchical paging
  - since modern OSes support very large virtual address spaces (2³² or 2⁶⁴, etc), the page table itself can become very large
  - solution: use a multiple-level page table scheme
  - you have an outer page table, and each entry points to the location of an inner page table (not standard term terminology)
  - then each entry in each inner page table points to a frame in physical memory
  - basically you're paging the page table itself
  - example: 32-bit addresses, page size of 2¹⁰ bytes
    - page number: 22 bits, page offset: 10 bits
    - then, the page number has a 12-bit page number page number (lol) and a 10-bit page number page offset
    - the 12-bit page number page number is used for the outer page table, then the 10-bit page number page offset is used for the inner page table
  - of course, you can also have three-level paging schemes, etc
  - example: 64-bit addresses, page size still 2¹⁰
    - if 2 levels: 12 bit offset, 10 bit inner page, 42 bit outer page
    - if 3 levels: 12 bit offset, 10 bit inner page, 10 bit middle page, 32 bit outer page
- hashed page tables
  - common for > 32-bit address spaces
  - the page number is hashed; collisions are handled by using buckets
  - frame found by hashing it, then searching along the bucket chain
- inverted page tables
  - usually, each process has one page table, which has one entry for each page the process is using
  - con: lots of memory used just to store page tables
  - solution: one entry for each frame of memory
  - each entry contains virtual address of page, and information about the owning process
  - so there's only one page table needed in the system, and only one entry per page (reduces duplication, so, less memory needed overall)
  - however, when a page reference occurs, you need to search through this larger table, which would be slower (since can't index it directly)
    - the inverted page table is sorted by physical address
    - go through entries one by one, find the physical address matching the virtual address (also the process ID)
    - can use a hash table to limit the search a bit. but each access to the hash table would add a memory reference
  - also, harder to implement shared memory pages
    - since in shared memory, multiple virtual addresses are mapped to one physical address
    - but in this scheme, there is only one virtual page entry for each physical page
segmentation
- memory-management scheme that corresponds to a user's view of memory
- collection of variable-sized segments (not ordered)
- program: collection of segments
- segment: logical unit, like an array, method, function, program, section of variables, etc
- just a way of grouping together things logically (even though they may not be grouped together physically)
- a logical address consists of a 2-tuple: segment number, offset
- segment table: maps the above logical address to physical addresses
  - each entry (one per segment) has a base and a limit
  - hardware view:
    - segment number is translated using the segment table, to get the limit and base for this segment
    - the offset is compared with the limit; if it exceeds it, we get a trap
    - then, the base is added to the offset, and used as the physical address

What are the two storage media that the CPU can access directly, and how do their access times compare?

Main memory, and registers. Accessing registers can be done in <1 clock cycle, whereas accessing main memory can take multiple.

At what stages can final address binding happen? Give detail, etc.

At compile time, to generate absolute code (code must be recompiled if starting location changes - is this ever used?)

At load time, if the compiler has generated relocatable code

At run time, if the process can be moved in memory during execution (requires MMU support)

Wtf is an MMU

Memory management unit. Hardware device, translates virtual addresses to physical addresses so that programmers don't have to deal with physical addresses. Uses a TLB, etc.

What is the difference between dynamic loading and linking? What are the advantages of both compared to their static counterparts?

Dynamic loading: parts of a process that are not needed right away are not brought into memory until they are called. Useful for things like error handling routines, which are typically large blocks of code that get called infrequently. Results in a smaller memory footprint. It falls upon the programmer to make use of dynamic loading in their programs by calling a "relocatable linking loader" that will either locate a part in memory (if it is there) or else bring it into memory.

Dynamic linking: system libraries and other commonly-used code are linked at execution time, with the programmer making use of a "code stub" that locates the library in memory and replaces itself (during execution time) with the address of that routine. Needs support from the OS, to check that the desired routine is in the process' memory space. Also results in a smaller memory footprint.

Define this.

A large storage device, commonly a disk, to which things in memory can be swapped.

Main memory is usually divided into two partitions. What are they called and what goes into each?

High: user processes. Low: interrupt vector, OS.

Ways of satisfying allocation requests? Which is fastest?

First-fit: first hole that is big enough. Fastest if the list of holes is not ordered by size.

Best-fit: smallest hole that is big enough

Worst fit: largest hole that is big enough (why would even do this??)

Difference between internal, external

External fragmentation: when you have small holes scattered throughout memory, and you can't use them because they're just too small individually. For example, there is a hole of size 10, and three others of size 4 (all non-contiguous), and a process comes along requesting a hole of size 6, which is granted. Then you're just left with (at best) 4 holes of size 4. If a process comes along needing a hole of size 16, well, it can't be given because the holes are not contiguous

Internal fragmentation: when you have leftover space (not being used by anything) within the partitions being used, due to the way blocks are allocated. For instance, if a process requests 15 bytes, but it's given 16 because it has to be a power of 2 (etc).

What is it, what is a requirement for this to be possible, when can it be done

Compaction is a way to reduce external fragmentation, and consists of shuffling the contents of memory so as to place all the holes together. This is done at execution time and thus is only possible if relocation is dynamic.

Difference?

Pages: used in the logical (virtual) view of memory. Frames: physical view. One-to-one correspondence.

Explain what the bits of a virtual address mean if a virtual address is 32 bits, and the page size is 16 kilobytes.

16 KB = 2⁴ * 2¹⁰ so 2¹⁴ bytes. You need 2¹⁴ bits to indicate the page offset for any byte on a page. The remaining 18 bits can be used for the page index, and so there needs be 2¹⁸ pages in order to store 2³² bytes.

How is the page table stored?

In main memory, with base and limit registers. To speed up lookup, a special hardware cache (TLB) is used to store part of the page table.

Let's say the hit ratio for the TLB is 0.9, and it takes 100 units of time to look something up in memory. To look something up in the TLB takes 1 unit of time. What's the EAT?

101 for anything in the TLB, so 101 * 0.9. Then, 201 for anything not in the TLB, since you have to look up the page table in memory as well, so 201 * 0.1. Total: 90.9 + 20.1 = 111.

What is? Use?

Non-self modifying, or read-only, code. Never changes during its execution. Can be shared among processes.

Why do we even need pages in the first place

To allow memory usage to be non-contiguous

What are the different structures, and how do they work

Hierarchical paging: paging the page table.

Hashed page tables: hash the page number, then handle collisions using a bucket chain.

Inverted page tables: one entry per frame (not per-process), sorted by physical address. Less memory used, but slower, and harder to implement sharing.

Taken from stack overflow (actual source: some exam):

A computer has a 64-bit virtual address space and 2048-byte pages. A page table entry takes 4 bytes. A multi-level page table is used because each table must be contained within a page. How many levels are required?

2048 is 2¹¹, so each table can hold 2⁹ entries. Since 11 bits are required for the page offset, 53 bits remain to encode the entry for each page level. 53/9 is 6 (rounded up), so we need 6 levels.

Another calculation question:

32-bit virtual addresses, 4 KB pages, 2 levels of paging. What do the bits in an address mean?

4KB = 2¹² bytes in each page. So, 12 bits required for the page offset. Apparently the other two are both 10 bits each, though I don't know why (source).

What are they

Logical addresses (consisting of a segment number and an offset) are mapped to physical addresses

background
- virtual memory allows for the separation of logical (user) memory from physical memory
  - the logical view consists of: a stack that grows downwards, a heap that grows upwards, a data section, and a code section (with shared libraries in the middle between the heap and the stack)
- for programs, this means that not everything needs to be in memory for execution to begin
- also, the logical address space can be much larger than physical address space
- and, address spaces can be shared by different processes
- process creation is more efficient
- can be implemented via demand paging, and demand segmentation
demand paging
- a page is brought into memory only when it is needed
- less unnecessary I/O and less memory needed
- a page is determined to be "needed" when there is a reference to it
  - if a reference is invalid, abort
  - if a reference is to a page that is not in memory, it will be brought into memory
- to execute a process, we swap the pages for it into memory (lazily)
- pager: swapper that deals with pages (does this really need its own term)
- valid/invalid bits for the page table
  - like those used introduced in the memory protection section, but serving another purpose as well
  - valid: legal and in memory
  - invalid: either illegal or not in memory
  - all entries initially set to invalid
  - if we ever need to lookup an invalid entry, we get a page fault, so the OS needs to:
    1. look at another table (one stored within the PCB) to decide if it was an invalid memory reference or just a swapped-to-disk page
    2. find an empty frame from the free frame list
    3. read the page from disk and swaps it into the frame
    4. modify the table in the PCB
    5. update the validation bit and the entry in the page table
    6. restart the instruction that caused the page fault
- effective access time:
  - let $p$ be the page fault rate
  - let $t$ be the time it takes to handle a page fault
  - let $m$ be the time it takes to access memory (to read a page, say)
  - EAT: $(1-p) \cdot m + p \cdot t$
- the time it takes to handle a page fault is determined by:
  - time to service the interrupt
    - save user registers, process state
    - check if the interrupt is a page fault
    - check if the page reference is legal
    - if so, find the location of the page on disk
  - time to read in the page to a free frame (usually slowest)
    - influenced by disk latency, seek time, queueing time on the device
  - then, time to restart the instruction that was interrupted (why is this here)
- example for calculating EAT:
  - m = 200 nanoseconds = 0.0002 milliseconds
  - t = 8 milliseconds
  - p = 0.001
  - EAT = $(1-0.001) \cdot 0.0002 + 0.001 \cdot 8 = 0.0001998 + 0.008 \approx 0.0082$ milliseconds so 8.2 microseconds or 8200 nanoseconds
  - this shows that even if page faults are relatively rare, they can still introduce huge slowdowns (40x slower in this case)
  - obviously, we want to keep page-fault-rate low
  - if we want to keep it below, say, 5%, then we need for the EAT to be less than 210
  - the calculation's not hard, just make $p$ the variable
copy-on-write
- a benefit of virtual memory realised during process creation
- normally, the fork() system call duplicates the parent's pages for the child
- however, many ungrateful children invoke exec() immediately after birth, making this copying unnecessary
- copy-on-write allows the parent and child to initially share the parent's pages in memory
- only if either process modifies a shared page is that page copied
- results in more efficient process creation (so that fewer pages need to be allocated to child processes)
page replacement
- what happens when we need to swap a new page into memory but there are no free frames????????
- in that case, we need some sort of page replacement policy to get rid of pages we don't need
- such a policy should minimise the number of page faults (obviously)
- basic algorithm:
  - find location of desired page on disk
  - find a free frame, if one exists
  - if no free frames exist, find a victim frame, write it to disk (if dirty), and change the page and frame tables accordingly
  - bring the desired page into the newly freed frame, update tables
- makes use of a dirty bit in the page table so that only modified pages are written back to disk when they are evicted from memory
- evaluating different page replacement policies
  - FIFO: remove the least-recently-added pages first
  - belady's anomaly: for some page replacement policies, more frames can result in more page faults (which seems paradoxical)
  - OPT/MIN: theoretical algorithm (lol) that has the lowest page-fault rate and will never suffer from belady's anomaly
    - you evict the page that will not be used for the longest period of time
    - obviously this is impossible to implement in practice unless you know the future
    - however, it's useful as a comparison point for other page replacement policies ("how does this algorithm behave compared to OPT")
  - LRU: use recent history to simulate OPT
    - evict the page that was least recently used
    - implementation: every page entry has a "counter" (really a timestamp for the last use)
    - when we need to evict one, we just replace the page with the smallest timestamp
    - this has the added overhead of needing to search the table to find the LRU page each time, and we also need to write the timestamp to memory during every memory access
    - still, it's pretty efficient as it tends to reduce the number of page faults compared to most other algorithms
allocation of frames
- how do we allocate the fixed amount of free memory among the various processes running on a system?
- two methods: fixed, priority
- fixed allocation - doesn't take priorities into account; 2 methods
  - equal allocation:
    - all animals are equal
    - each process gets the same number of frames
  - proportional allocation:
    - some are more equal than others
    - frames allocated according to the size (virtual memory) of a process
    - how do they know that???
    - for example, if there are 2 processes, one of size 5, and one of size 10, and there are 30 frames, the first would get 10 frames, the second would get 20
- priority allocation - takes priorities into account
  - high-priority processes are given more memory to speed their execution
  - uses a proportional allocation scheme using priority rather than size (or possibly both priority and size)
thrashing
- if a process has too few frames, its page fault rate will be very high
- result: low CPU usage, so the OS thinks it needs to increase the degree of multiprogramming, so it adds more processes, then it all goes to hell
- then the OS will spend more time paging than executing processes (i.e., thrashing)
- to prevent this, we must provide a process with as many frames as it needs
- how do we know this magic number?
- working-set strategy: looks at how many frames a process is actually using
  - purpose: to prevent thrashing while keeping the degree of multiprogramming as high as possible
  - makes use of the locality model of process execution
    - locality: set of pages commonly used together
    - for instance, when a function is called, a new locality is defined
    - thrashing occurs when the size of the locality is greater than the total memory size
  - in the working-set model, we have a parameter $\Delta$ (working-set window) which is a fixed number of page references
  - working-set size of a process: number of pages referenced in the most recent $\Delta$
  - need an optimal size of $\Delta$ so that we get exactly one locality (not less, not more), or at least, estimate it better
  - the total number of frames we need is the sum of all the working set sizes (for all the processes)
  - if this number > total number of available frames, then we'll have thrashing
  - so, we need to suspend a process
  - if we have extra frames, then we can start a new process
  - for example, if the last few pages referenced were: 12323434343334
    - then, if $\Delta = 10$, we look at the last 10 page references --> pages 3 and 4
  - not entirely sure how this is supposed to work

What is this? What do valid and invalid bits mean for the page table in this scheme?

A page is only brought into memory when there is a reference to it (lazy loading). Valid = page is in memory. Invalid = page is illegal or not in memory.

What are page faults, and what happens when they are encountered?

When we hit a reference to a page marked invalid in the page table (so either illegal or just not in memory). The OS needs to look at another table in the PCB to check if it's illegal, and, if not, finds an empty frame using the free frame list, reads the page from disk into the frame, updates all the tables, then restarts the instruction.

Accessing memory takes 0.0005 (ignore the unit). The page fault rate is 0.01, and it takes 2.0 to handle a page fault. What's the EAT?

$(0.0005 \cdot 0.99) + (\cdot 2.0 \cdot 0.01) = 0.020495$ which is a ~40x slowdown

Same numbers as above, but we want to find the highest page fault rate that will result in a 10x slowdown

$0.0005 \cdot (1-p) + 2.0p = 0.0005 + 1.9995p = 0.005$, so $1.9995p = 0.0045$ and $p \approx 0.0023$

What is it? Benefits?

During process creation (with fork), instead of copying all the parent pages over for the child, let the child share the parent's pages and only copy when one process writes to a page. More efficient process creation and less memory wasted.

List them

FIFO: replaces the first one in

OPT: theoretical, replaces the one that will be used latest

LRU: replaces the least recently used one (smallest timestamp). overhead: searching the entire table, and updating the page table whenever you need to access a page

What is Belady's anomaly? Give a page replacement algorithm that does not suffer from it.

When increasing the number of frames increases the number of page faults for a page replacement policy. OPT is the only one.

Methods of allocating frames?

Fixed allocation, equal.

Fixed allocation, proportional to size of memory used by a process.

Priority allocation, based on priority (and possibly size).

What is thrashing

OS is constantly paging rather than doing useful work. Page fault rate is high, OS thinks it needs to add more processes to improve CPU utilisation, etc.

How does this work

Looks at the number of frames a process is using to try and prevent thrashing while having as many processes running as possible. We look at the number of different pages used by a process per locality. Make sure the sum of working set sizes for all processes is smaller than the number of pages available in memory, etc.

Review for quiz 3

1Chapter 8: Main memory¶

1.1Sample questions¶

1.1.1CPU and storage¶

1.1.2Address binding¶

1.1.3MMU¶

1.1.4Loading vs. linking¶

1.1.5Backing store¶

1.1.6Memory partitions¶

1.1.7Multiple-partition allocation¶

1.1.8Fragmentation¶

1.1.9Compaction¶

1.1.10Pages and frames¶

1.1.11Virtual address calculations¶

1.1.12Page table implementations¶

1.1.13Page table EAT calculations¶

1.1.14Reentrant code¶

1.1.15Why pages¶

1.1.16Page table structures¶

1.1.17Hierarchical paging calculations¶

1.1.18Segment tables¶

2Chapter 9: Virtual memory¶

2.1Sample questions¶

2.1.1Demand paging¶

2.1.2Page faults¶

2.1.3Page fault EAT calculations¶

2.1.4Copy-on-write¶

2.1.5Page replacement policies¶

2.1.6Belady's anomaly¶

2.1.7Frame allocation¶

2.1.8Thrashing¶

2.1.9Working-set model¶