























| Why Allow                                                | ing Structural Hazards                                                                                                                                                                  | ?                                                      |  |  |  |
|----------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------|--|--|--|
| <ul> <li>A processor w/<br/>factors are equ</li> </ul>   | o structural hazards will alwa<br>al, then why a designer allow                                                                                                                         | ys have a lower CPI, if other<br>s structural hazards? |  |  |  |
| Answer:                                                  |                                                                                                                                                                                         |                                                        |  |  |  |
|                                                          | Cost!                                                                                                                                                                                   |                                                        |  |  |  |
| Duplication/se<br>a) costly it<br>b) process<br>it needs | Duplication/separation of IC and DC:<br>a) costly itself<br>b) processor requires twice as much total memory bandwidth, if<br>it needs to support IC and DC accesses in the same cycle. |                                                        |  |  |  |
|                                                          |                                                                                                                                                                                         |                                                        |  |  |  |
|                                                          |                                                                                                                                                                                         |                                                        |  |  |  |
|                                                          |                                                                                                                                                                                         |                                                        |  |  |  |
| CS420/520 pipeline.13                                    | UC. Colorado Springs                                                                                                                                                                    | Adapted from ©UCB97 & ©UCB03                           |  |  |  |







































| Situation                               | Exan<br>sequ             | iple code<br>ence                                              | Action                                                                                                                                                            |
|-----------------------------------------|--------------------------|----------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| No dependence                           | LD<br>DADD<br>DSUB<br>OR | <b>R1,</b> 45(R2)<br>R5,R6,R7<br>R8,R6,R7<br>R9,R6,R7          | No hazard possible because no dependence<br>exists on R1 in the immediately following thre<br>instructions.                                                       |
| Dependence<br>requiring stall           | LD<br>DADD<br>DSUB<br>OR | <b>R1</b> ,45(R2)<br>R5, <b>R1</b> ,87<br>R8,R6,R7<br>R9,R6,R7 | Comparators detect the use of R1 in the DADD<br>and stall the DADD (and DSUB and OR) before the<br>DADD begins EX.                                                |
| Dependence<br>overcome by<br>forwarding | LD<br>DADD<br>DSUB<br>OR | <b>R1,45(R2)</b><br>R5,R6,R7<br>R8, <b>R1,</b> R7<br>R9,R6,R7  | Comparators detect use of R1 in DSUB and<br>forward result of load to ALU in time for DSUE<br>to begin EX.                                                        |
| Dependence with accesses in order       | LD<br>DADD<br>DSUB<br>OR | <b>R1</b> ,45(R2)<br>R5,R6,R7<br>R8,R6,R7<br>R9, <b>R1</b> ,R7 | No action required because the read of R1 by 0<br>occurs in the second half of the ID phase, while<br>the write of the loaded data occurred in the first<br>half. |

| g-Reg ALU                               | ID/FX IR[rt]                                                                                                   |
|-----------------------------------------|----------------------------------------------------------------------------------------------------------------|
|                                         | IF/ID.IR[rs]                                                                                                   |
| g-Reg ALU                               | ID/EX.IR[rt] ==<br>IF/ID.IR[rt]                                                                                |
| ad, Store,<br>U imme, branch            | ID/EX.IR[rt] ==<br>IF/ID.IR[rs]                                                                                |
| e need for load in<br>n requires three/ | nterlock during the ID<br>two comparisons                                                                      |
|                                         | eg-Reg ALU<br>pad, Store,<br>.U imme, branch<br>e need for load in<br>n requires three/<br>/pe 'rs' in the sau |









| Taken Branch                                                                                              | n vs. Not-Ta                                                            | ken Bı   | ranch    |          |              |             |
|-----------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|----------|----------|----------|--------------|-------------|
| Cycle 4 Cycle 5                                                                                           | Cycle 6 Cycle 7                                                         | Cycle 8  | Cycle 9  | Cycle 10 | Cycle 11     |             |
| 12: Beq Ifetch Reg/Dec                                                                                    | Exec Mem                                                                | Wr       | ]        |          |              |             |
| 16: successor Ifetch                                                                                      | Ifetch Reg/Dec                                                          | Exec     | Mem      | Wr       |              |             |
| 20: successor + 1                                                                                         | stall Ifetch                                                            | Reg/Dec  | Exec     | Mem      | Wr           |             |
| 24: successor + 2                                                                                         |                                                                         | Ifetch   | Reg/Dec  | Exec     | Mem          | Wr          |
| How                                                                                                       | this <i>stall</i> can b                                                 | e implen | nented b | y "cont  | rol"?        |             |
| ° <b>Taken</b> branch: If a                                                                               | • <i>Taken</i> branch: If a branch changes the PC to its target address |          |          |          |              |             |
| ° Not-Taken (untak                                                                                        | • Not-Taken (untaken) branch: If a branch sequentially falls through    |          |          |          |              |             |
| <ul> <li>If the branch above is not taken, the second IF for branch successor<br/>is redundant</li> </ul> |                                                                         |          |          |          |              |             |
| <ul> <li>How to take the advantage since the right instruction was indeed fetched?</li> </ul>             |                                                                         |          |          |          |              |             |
| CS420/520 pipeline.39                                                                                     | UC. Colorado Spi                                                        | ings     |          | Adapte   | ed from ©UCB | 97 & ©UCB03 |

| Reducing Pi                                                       | peline Branch Penalt                                                                   | ties:                                               |
|-------------------------------------------------------------------|----------------------------------------------------------------------------------------|-----------------------------------------------------|
| ° Four simple con                                                 | pile-time schemes                                                                      |                                                     |
| <ul> <li>STATIC: fixe<br/>software try<br/>the hardwar</li> </ul> | ed for each branch during th<br>to minimize the branch pen<br>e and of branch behavior | ne entire execution;<br>nalty by using knowledge of |
| <sup>o</sup> More powerful H<br>branch prediction                 | W and SW techniques for b                                                              | oth static and dynamic                              |
| Instruction                                                       | Level Parallelism (ILP)                                                                |                                                     |
|                                                                   |                                                                                        |                                                     |
|                                                                   |                                                                                        |                                                     |
|                                                                   |                                                                                        |                                                     |
|                                                                   |                                                                                        |                                                     |
|                                                                   |                                                                                        |                                                     |
|                                                                   |                                                                                        |                                                     |
|                                                                   |                                                                                        |                                                     |
| CS420/520 pipeline.40                                             | UC. Colorado Springs                                                                   | Adapted from ©UCB97 & ©UCB03                        |









| ° Execution cycle                  | l.                                                                                                |                                                |
|------------------------------------|---------------------------------------------------------------------------------------------------|------------------------------------------------|
| branch in                          | struction                                                                                         |                                                |
| sequentia                          | al successor 1                                                                                    |                                                |
| branch ta                          | rget if taken <mark>or</mark> sequential su                                                       | ccessor 2 if not taken                         |
| • A HW comp<br>- Instru<br>- Thus, | oonent: branch delay slot (1 fo<br>ction inside is executed whet<br>what is the job for the compi | or MIPS)<br>her branch is taken or not<br>ler? |
| –Make                              | the successor instruction                                                                         | valid and useful!                              |
|                                    |                                                                                                   |                                                |
|                                    |                                                                                                   |                                                |
|                                    |                                                                                                   |                                                |
|                                    |                                                                                                   |                                                |

| Technique 4 E                            | Example                   |                       |        |
|------------------------------------------|---------------------------|-----------------------|--------|
| Untaken Ifatah Dag/D                     | na Evan Mam Wr            |                       |        |
| Branch delay<br>Instruction (i+1) Ifetcl | h Reg/Dec Exec Mem Wr     |                       |        |
| Instruction i + 2                        | Ifetch Reg/Dec Exec Mem V | Vr                    |        |
| Instruction i + 3                        | Ifetch Reg/Dec Exec       | Mem Wr                |        |
| Instruction i + 4                        | Ifetch Reg/Dec I          | Exec Mem              | Wr     |
| Taken<br>branch Ifetch Reg/D             | Dec Exec Mem Wr           |                       |        |
| Branch delay<br>Instruction (i+1) Ifetcl | h Reg/Dec Exec Mem Wr     |                       |        |
| Branch target                            | Ifetch Reg/Dec Exec Mem V | Vr                    |        |
| Branch target + 1                        | Ifetch Reg/Dec Exec M     | Mem Wr                |        |
| Branch target + 2                        | Ifetch Reg/Dec I          | Exec Mem              | Wr     |
| CS420/520 pipeline.46                    | UC. Colorado Springs A    | Adapted from ©UCB97 & | ©UCB03 |



