# State Machine Implementation of the ARC CPU

## Introduction

The textbook presents two approaches to the design of the control unit (CU) of the ARC CPU, "microprogramming," and "hardwired control." The textbook's approach to developing a hardwired CU is based on use of a hardware description language (HDL) called VHDL. This document shows an alternative approach based on FSM design covered earlier in the course.

# Background

In the Moore model of a FSM, the outputs of the state machine are totally determined by the current state of the machine. This design sometimes requires more states to accomplish its task than a Mealy machine in which the present state and the present input values combine to determine the output values, but Moore machines are somewhat simpler to work with conceptually. In a Moore a circle containing the name of the state and the output values during that state, separated by a horizontal line, represents machine a state:



For the ARC datapath, there are 24 bits that the CU must generate values for on each clock pulse: six bits that go into the A-Bus decoder, six for the B-Bus decoder, six for the C-Bus decoder, four for the ALU function code, and two, Read (RD) and Write (WR), to control the memory. Rather than list all 22 bits and their values below the horizontal line in each state, we can use RTL statements to indicate how these 24 bits are to be set. We will use two forms of RTL statement, one for states where a memory operation takes place, and another for states when ALU operations are performed. To keep things as simple as possible, we will assume that all states fall neatly into exactly one of these two categories.

## **RTL for Memory Operations**

A *read* operation takes the form Data  $\leftarrow$  Memory[Address] and a *write* operation takes the form Memory[Address]  $\leftarrow$  Data. In both cases, the Address is the contents of a register connected to the A Bus. For a read operation the Data is loaded into a

register from the C Bus, whereas for a write operation the Data comes from a register connected to the B Bus.

For memory read operations, the B Bus is ignored by the memory and may contain arbitrary data. We will connect Register number zero (%r0), the pseudo-register that can't be changed and always provides 32-bits of zeros, for all read operations. The output of the ALU is discarded for read operations because the C Bus gets its input from memory. So the ALU function code doesn't matter, except that the condition code bits must not be modified when reading from memory. We will use the ALU's ADD function  $(0101_2)$  for the function code during all read (and write) operations.

For memory write operations, the output of the ALU could in fact be loaded into one of the registers, but we will always load it into %r0, thus discarding it.

Here are some examples of RTL statements for memory operations, and how the 24 bits output by the CU would be set for each one:

| RTL                                           | A Bus  | B Bus                  | C Bus  | ALU  | RD | WR |
|-----------------------------------------------|--------|------------------------|--------|------|----|----|
| R[ir] ← Memory[ R[pc] ]                       | 100000 | 000000                 | 100101 | 0101 | 1  | 0  |
| Memory[ R[temp0] ] ← R[ ir <sub>25:29</sub> ] | 100001 | 0, ir <sub>25:29</sub> | 000000 | 0101 | 0  | 1  |

In this table, the A Bus, B Bus, and C Bus columns give the six bit number the CU sends to the corresponding decoders. The busses actually are connected to the 32-bit contents of the correspondingly numbered registers. The subscript "25:29" means bit numbers 25 through 29, which is the RD field of the instruction register.

# **RTL for ALU Operations**

We will use the following notation for ALU operations:

C-Bus ← ALU\_op( A-Bus, B-Bus )

Here, ALU\_op will be the name of one of the sixteen ALU operations listed in Figure 6-4 on page 194. For those operations that ignore the B-Bus, we will set the B Bus bits to all zeros. Here are some examples:

| RTL                                      | A Bus                 | B Bus                | C Bus                 | ALU  | RD | WR |
|------------------------------------------|-----------------------|----------------------|-----------------------|------|----|----|
| R[temp0] ← Sext13( R[ir] )               | 100101                | 000000               | 100001                | 1100 | 0  | 0  |
| $R[rd] \leftarrow Addcc(R[rs1], R[rs2])$ | 0,ir <sub>14:18</sub> | 0, ir <sub>0:4</sub> | 0,ir <sub>25:29</sub> | 0011 | 0  | 0  |

## Decoders

We will use three decoders connected to the bits of the Instruction Register to help with the design of the CU FSM.

- The *Instruction Format Decoder* decodes IR<sub>30:31</sub> to generate signals named *set-br\_op* (00), *call\_op* (01), *alu\_op* (10), and *mem\_op* (11).
- The *Op2 Decoder* decodes  $IR_{22:24}$  to produce 8 signals. Two of these eight outputs are named *branch* (010), and *sethi* (100).

• The *Op3 Decoder* decodes IR<sub>19:24</sub> to produce 64 signals. Some of these are named *ld* (000000), *st* (000100), *addcc* (010000), etc. See Figure 6-2 on page 192 for the complete list of Op2 and Op3 outputs.

There would be a fourth decoder for IR<sub>25:28</sub> used for branch instructions.

The hardwired control unit presented in the textbook uses "one hot" encoding for the state machine in which there is a separate state flip-flop for every state, with exactly one flip-flop true at all times. We can accomplish the same thing by decoding all the state flip-flops to produce a number of signals that are all false except for one corresponding to the current state. In either event, we will have wires named *State\_0*, *State\_1*, etc. for all the possible states of the CU.

#### State Table

It would be impossible to develop a full state table for the CU FSM because with 40 bits of input from the datapath (32 bits from the Instruction Register plus four bits from the Condition Code) there would be  $2^{40}$  rows. But by using the decoders listed above as shorthand notation for the inputs to the CU and RTL statements as shorthand for the outputs, we can show the some of the State Table as follows:

| External Inputs<br>(40 bits from<br>datapath) | Present<br>State                  | Next State<br>(also, flip-flop<br>inputs) | Outputs<br>(22 bits to datapath plus 2 bits to<br>memory)                                                      |
|-----------------------------------------------|-----------------------------------|-------------------------------------------|----------------------------------------------------------------------------------------------------------------|
| Mem_op                                        | State_0<br>(Instruction<br>Fetch) | State_1                                   | $R[ir] \leftarrow Memory[R[pc]]$                                                                               |
| IR <sub>13</sub>                              | State_1                           | State_2                                   | $R[temp0] \leftarrow Sext13(R[ir])$                                                                            |
| ~IR <sub>13</sub>                             | State_1                           | State_3                                   | $R[temp0] \leftarrow Sext13(R[ir])$                                                                            |
|                                               | State_2                           | State_4                                   | $\begin{array}{l} R[temp[0] \leftarrow \\ Add(R[ir_{14:18}],R[temp0]) \end{array}$                             |
|                                               | State_3                           | State_2                                   | $R[temp0] \leftarrow Add(R[ir_{0:4}], R[r0])$                                                                  |
| ld                                            | State_4<br>(Compute EA)           | State_5                                   | $\begin{array}{c} \text{R[temp0]} \leftarrow \\ \text{Add}(\text{R[ir}_{14:18}], \text{R[temp0]}) \end{array}$ |
| st                                            | State_4                           | State_6                                   | $\begin{array}{l} \text{R[temp0]} \leftarrow \\ \text{Add}(\text{R[ir}_{14:18}], \text{R[temp0]}) \end{array}$ |
|                                               | State_5<br>(Execute <i>ld</i> )   | State_99                                  | $R[ir_{25:29}] \leftarrow Memory[R[temp0]]$                                                                    |
|                                               | State_6<br>(Execute <i>st</i> )   | State_99                                  | $Memory[R[temp0]] \leftarrow R[ir_{25:29}]$                                                                    |
|                                               | State_99                          | State_0                                   | $R[pc] \leftarrow Incpc(R[pc])$                                                                                |

#### **State Diagram**

Finally, here is a State Diagram for the above portion of the State Table, with some indications of how this part of the diagram would relate to other parts, not shown here. To make the diagram more manageable, informal and simplified versions of the RTL statements are used to indicate the outputs.

