## **CE1-R3: ADVANCED COMPUTER ARCHITECTURE**

## NOTE:

1. Answer question 1 and any FOUR questions from 2 to 7.

2. Parts of the same question should be answered together and in the same sequence.

Time: 3 Hours Total Marks: 100

1.

- a) What are the problem areas which cause a poorly designed instruction set to stall frequently in a pipelined processor?
- b) What is the Amdahl's law for vectorization?
- c) Why are shared memory machines and distributed memory machines suited for fine grained and coarse grained problems respectively in case of parallel computing?
- d) What do you understand by spatial & temporal locality? What is clustering? How does locality of reference help in clusters?
- e) What is VLIW and how is it different from RISC or CISC?
- f) What do you understand by control flow & data flow scheduling?
- g) Compare circuit switching, store & forward technique and wormhole routing technique.

(7x4)

2. A four segment pipeline implements a function and has the following delays for each segment (b=0.2):

| Segment # | Maximum Delay * |
|-----------|-----------------|
| 1         | 17 ns           |
| 2         | 15 ns           |
| 3         | 19 ns           |
| 4         | 14 ns           |

<sup>\*</sup> excludes fixed clock overhead C of 2 ns, skew factor k=0.

- a) What is the cycle time  $\Delta t$  that maximizes performance without allocating multiple cycles to a segment?
- b) What is the total cycle time T to execute the function (through all stages)?
- c) What is the cycle time that maximizes performance and if each segment can be partitioned into sub segments? Compare its cycle time and throughput with worst case delay and minimum delay case. As a first approximation use.

Sopt= 
$$(((1-b)*(1+k)*T)/(b*C))^{1/2}$$
  
and Throughput G in MIPS =  $(1/(1+(S-1)*b))*(1/((1+k)*(T/S)+C))$   
(3+3+12)

3. Assume a delay of four cycles for an address interlock and a delay of two cycles on an execution interlock. For the following sequence of code, identify all dependencies and compute the total delay.

| ADD.W | R7, R7, 4  |
|-------|------------|
| LD.W  | R1, 0(R7)  |
| MUL.W | R2, R1, R1 |
| ADD.W | R3, R3, R2 |
| LD.W  | R4, 2(R7)  |
| SUB.W | R5, R2, R3 |
| ADD.W | R5, R5, R4 |

Percentage of delay occurs due to address & execution interlock is given by the following table:

| Address Interlock | Execution Interlock | Distance             |
|-------------------|---------------------|----------------------|
| 0.099             | 0.403               | Previous Instruction |
| 0.084             | 0.147               | 2 Instructions back  |
| 0.028             | 0.036               | 3 Instructions back  |
| 0.022             | 0.049               | 4 Instructions back  |
| 0.002             | 0.067               | 5 Instructions back  |

(18)

4.

- a) Discuss branch handling strategies for a pipelined processor under hierarchical memory system.
- b) What is a stride? How does it affect the design of vector memory? Suppose we have 8 memory banks with a bank busy time of 6 clocks and a total memory latency of 12 cycles, how long will it take to complete a 64 element vector load with a stride of 1?

  (10+8)

5.

- a) What is Cache Coherence Problem? What is a snooping cache? Discuss with example the Write Through and Write Once protocols for Cache consistency.
- b) Suppose we have the following parameters for an L1 cache with 4 KB and L2 cache with 64 KB.

The Cache miss rates are:

4 KB - 0.10 misses per reference 64 KB - 0.02 misses per reference

1 Reference per Instruction 3 cycles L1 miss, L2 hit

10 cycles total time L1 miss, L2 miss

what is the excess CPI due to Cache misses.

(10+8)

6.

- a) Write short notes on dual bus design for shared memory multiprocessors with special emphasis on clustered architecture.
- b) What do you understand by essential dependency, ordering dependency and output dependency in an out of order & multiple instruction execution? For Concurrent code execution, what types of dependencies arise in the following code sequence?

```
DIV.F R1, R2, R3
MPY.F R1, R4, R5
ADD.F R4, R5, R6
ADD.F R5, R4, R7
ST.F Alpha, R5
(8+10)
```

7.

- a) Discuss in detail various Interconnect architectures for MIMD computers.
- b) What if there is a control statement in loop body. How will the following code be vectorized?

```
for (i=0; i<1024; i++)
{
    If (A[i] > 0)
        C[i] = B[i];
    else
        D[i] = D[i-1];
}
```