Answers

  1. If a single-cycle processor design has a clock frequency of F, what would be the optimal speedup if the design is converted to a pipelined design with n stages?

    The optimal speedup would be n-times faster.

  2. Define instruction latency and instruction throughput. Read the next two questions before answering this one.

    Latency is the time that elapses from the moment an instruction is selected for execution (by putting its address in the PC) to the time it is “retired” (finishes execution). Throughput is the rate at which instructions are retired: instructions per second.

  3. What is the relationship between latency and throughput for a single-cycle processor design?

    They are the same.

  4. What is the relationship between latency and throughput for a pipelined processor?

    Latency is n times the clock period; throughput approaches one instruction per clock period.

  5. What is a perfectly balanced pipeline?

    One in which each stage has exactly the same maximum propagation delay.

  6. A ten-stage pipeline has the following maximum propagation delays in each stage (in psec): 50, 55, 45, 57, 49, 54, 48, 47, 2, and 2000. What would be the maximum clock speed at which this processor could operate?

    The slowest stage takes 2 nsec, so the maximum clock frequency is 0.5 GHz (500 MHz).

  7. For the previous question: what would be the instruction latency of the processor? Be sure to indicate both the numerical value and the proper unit of measure.

    Since there are ten stages, and the clock period has to be 2 nsec, the latency would be 20 nsec.

  8. For the above pipeline, what would be the instruction throughput of the processor, assuming there are no hazards?

    500 MIPS (millions of instructions per second)

  9. If it was your job to improve the design of the above pipeline design, what would you try to do?

    Balance the pipeline: reduce the propagation delays in the last (slowest) stage. The last stage might be divided into a number of smaller stages, but that would lead to a deep pipeline, which makes control hazards hard to manage. Maybe there would be ways to move some of the last stage’s logic into some of the earlier stages. Even if that increased the propagation delays for those stages, it would be better for throughput because slowest stage would be come faster, and the clock rate could therefore be increased.

  10. Define structural, control, and data hazards. Read the next question before answering this one.

    Structural: an operation cannot be performed at a certain time because the circuitry needed to implement it is needed for another operation at the same time. Control: conditional branching causes the decision about the next instruction to enter the pipeline to be delayed until the branch instruction processes through enough pipeline stages to make the decision known. Data: an instruction requires, for one or more of its operands, on the result of a previous instruction that has not yet been retired.

  11. Give an example of each of the three types of hazards listed above.

    Structural: The need to computer PC+4 at the same time a previous instruction is using the ALU during the EXE stage. The resolution is to have a separate adder for computing PC+4. Control: Any time a beq or bne instruction occurs. Data: Any time an instruction needs the result of a previous instruction, such as a lw or any R-type instruction, before that result is actually written into the register file.

  12. For each of the following terms, tell which of the three typess of hazard it is designed to deal with, and tell briefly how it does so.
    • delayed branch

      Control: Execute the instruction immediately following the branch instruction unconditionally, which can give the branch instruction enough time to decide whether the branch will be taken or not.

    • delayed load

      Data: The formal specification of the Instruction Set Architecture (ISA) states that an instruction that accesses a register immediately following a lw instruction that loads a value into that register will get the old value of the register, not the value generated by the lw.

    • register forwarding

      Data: The value needed is taken from a pipeline register rather than waiting for the result to be written to the register file.

    • branch prediction

      Control: The CPU keeps track of whether the branch instruction was taken or not the previous time it was executed, and starts to fetch and execute (but not to retire) instructions on the assumption that the same branch decision will be made this time. If the prediction turns out to be wrong, the partially-processed instructins must not affect the state of the registers or memory.

  13. For each of the four pipeline registers in Figure 4.51 on page 362, note that there is a certain number of arrows going into each one: 2 into IF/ID; 9 into ID/Ex; 7 into EX/MEM; and 4 into MEM/WB. For each register:
    • Tell the name of the register
    • Tell how many bits each arrow going into the register represents, in order from top to bottom.
    • Tell what information is represented by each arrow going into the register.

    IF/ID:
    PC + 4: 32 bits;
    An instruction to be executed: 32 bits;
    Total: 64 bits.

    ID/EX:
    WB: the RegWrite and MemToReg control signals (2 bits);
    M: The MemWrite and MemRead control signals (2 bits);
    The branch control signal: 1 bit;
    EX: The ALUSrc and ALUOp control signals (3 bits because ALUOp is two bits);
    PC+4: 32 bits;
    R[rs]: 32 bits;
    R[rt]: 32 bits;
    Sign extended immediate (includes function code bits): 32 bits;
    The rt and rd instruction fields: 10 bits.
    Total: 146 bits.

    EX/MEM:
    WB: the RegWrite and MemToReg control signals (2 bits);
    M: The MemWrite and MemRead control signals (2 bits);
    The branch control signal: 1 bit;
    The Branch Target Address (BTA): 32 bits;
    The Zero (Z) condition code bit: 1 bit;
    Ther ALU Result: 32 bits;
    R[rt] 32 bits;
    The write register number: 5 bits;
    Total: 107 bits.

    Mem/WB:
    WB: the RegWrite and MemToReg control signals (2 bits);
    Data from Memory: 32 bits;
    ALU Result: 32 bits;
    The write register number: 5 bits;
    Total: 71 bits.