CS-343 Assignment 8 - Vickery Solutions

1.  Cost per bit: increases for levels closer to the processor.
    Access time:  decreases for levels closer to the processor.
    Capacity:     decreases for levels closer to the processor.

    Instead of "access time", you could use "speed," which increases for
    levels closer to the processor.

2.  Cost. (There was a typo on the assignment sheet: the first word of
    the question was supposed to be "Which," not "Expand.")

3.  True.

4.  If you use write-back, the processor can update the information in
    cache without updating the information in main memory. This is a
    major issue in multi-core computing where different cores have
    separate caches.

5.  There are two kinds of locality: temporal and spatial. Temporal
    refers to the fact that a progam is likely to re-visit regions of
    memory frequently within a short time period, for example by
    repeatedly executing the instructions in a loop. Spatial refers to
    the fact that a program that accesses one location in memory is also
    likely to access nearby locations with a higher chance than other,
    more distant locations. Examples include instruction execution,
    which most commonly reads the next instruction from the next address
    after the current one.  Also, structured data accesses, such as
    summing the elements of a vector, which are stored in successive
    memory locations.

    The way locality relates to cache performance is that it raises the
    hit ratio from what one would expect by chance.  If every- thing
    were random, and cache contained 0.1% of main memory, the hit ratio
    would be only 0.001. But because of both kinds of locality, a miss
    is typically followed by many more hits because neighboring memory
    locations are brought into cache as the same time as the one needed
    to satisfy the miss.

M-1.  Instead of just one cache between main memory and the processor,
      there are two or three caches. The level nearest the processor is
      called L1, and is typically on the same chip as the processor. In
      multicore systems, L2 might be shared among processors, or each
      processor might have a separate L2. L1 is typically a split cache,
      with separate caches for instructions and data. L1 and L2 (and L3
      if present) adhere to the hierarchy of Question 1 above.

M-2.  Cost. If cost were no object, everything would be L1.

M-3.  Hit rate: the proportion of memory accesses that are satisfied
      using data already in cache.
      Hit time: the number of (processor) clock cycles needed to read a
      value from cache.
      Miss penalty: the number of additional clock cycles needed to get
      a value from main memory into cache when there is a miss.

M-4.  A cache line holds exactly one block of main memory.

M-5.  There is a V bit in cache for each cache line. The V bit indicates
      whether the cache line is valid or not. It's zero (not valid) if
      the data in the cache line is not actually a block of main memory.
      (For example, when there is a context switch, the processor stops
      executing one program and starts executing another one. When this
      happens, the cache is "flushed" to clear the first program's
      information out of the cache so the second program accesses only
      its own memory blocks. This is done be setting all the V bits of
      the cache to 0, not by somehow actually removing information from
      the cache data lines.)

M-6.  The tag field contains the information needed to tell which block
      main memory actually occupies the cache line's data field.

M-7.  The data field of a cache line is the copy of a block of
      information from main memory. You could have gotten picky in
      question M-4 and said that a cache line is bigger than a main
      memory block because the cache line has the V and tag bits in
      addtion to the main memory block. The answer to M-4 given above
      is correct, while side stepping this pickiness.

M-8.  The number of cache lines per set.

M-9.  Direct mapped.

M-10. Fully associative.

M-11. With more options where to place a memory block, it should be
      possible to reduce the miss rate due to a block replacing a
      block that is about to be referenced again.

M-12. The cost and possibly additional propagation delays due to the
      need to compare multiple tags simultaneously.

M-13. With larger lines, more adjacent parts of memory are available at
      once, which the principle of locality says is a Good Thing.

M-14. For a given amount of cache memory (determined by the cost the
      computer maker is willing to spend on cache), larger block sizes
      means there will be fewer cache lines. The less locality there is,
      the bigger a problem this will be.

M-15. Write through: When the processor writes to memory, the
      information is written to cache and immediately to main memory as
      well. This assures cache consistency, but at the expense of
      possibly extra memory write cycles.
      Write back: When the processor writes to memory, the information
      is written to cache, but the cache block is not actually written
      back to main memory until the block is subsequently invalidated
      (replaced or flushed).
      Note that both techniques, as described, imply first reading the
      block into cache, if is not already there, on a write.

M-16. Yes I do. Do you?

A-1.  m bits wide
A-2.  The tag field is whatever is left over after you take out the
      index field (see A-3), the word offset, and the byte offset.
      Byte offset:  w
      Word offset:  b
      Index:        n-a
      Tag:          m - (n-a + b + w)
      [I should have specified the block size as 2^w words and word size
      as 2^b bytes per word so the Byte offest would be b and the Word
      offset would be w. Read the exam questions carefully!]

A-3.  2^n/2^a = 2^(n-a). Thus the index field is n-a bits

A-4.  There are four values to compute, where p(x) means "the
      probablility of x," a number between 0.0 and 1.0. For the two cache
      levels, p(miss) = 1.0 - p(hit). MM means main memory.
        p(L1 hit) * (L1 access time) 
        p(L1 miss) * (L1 miss penalty + L2:MM access time)
        p(L2 hit) * (L2 access time)
        p(L2 miss) * (MM access time)

      Average L2:MM access time is the sum of the bottom two expressions:
        0.85 * 10 + 0.15 * 100
        = 8.5 + 15
        = 23.5 + 5
				= 28.5 clock cycles

      And the overall average access time is the sum of the top two
      expressions:
        0.90 * 1 + 0.1 * 28.5
        = 0.90 + 2.85
        = 3.75 clock cycles

Bonus:  What is the average access time if the clock speed is 2.5 GHz?
        Answer: 1/(2.5 * 10^9) = 0.4 * 10^-9 = 400 psec
                400 * 3.75 = 1500 psec = 1.5 nsec