Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions W10D2/menu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
- [Liu YiYu's note](note_liu_yiyu.md)
57 changes: 57 additions & 0 deletions W10D2/note_liu_yiyu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Note for W7D1

*Authored by Liu Yi-Yu*

Async v.s. Sync
- Interruption (I/O)
- Internet
- ATM/IDSM (Euro)
- TCP/IP (USA)

## Interruption

Outer devices interrupt CPU

Using Daisy chain to access:
```mermaid
graph LR
Core((Core))
A[Device]
B[Device]
C[device]
Core --> A
A --> Core
A --> B
B --> A
B --> C
C --> B
```

## Internet
Using supercube to reduce the distance

### ATM/ISDN (Euro)

ATM: Async Transfer Mode

- virtual circuit (the circuit using switch)

- Have full access to the virtual circuit if the connection has been established

- slice the data into pieces to avoid data loss

### TCP/IP (USA)

Using the asynchronous network to implement a synchronous network

### Internet Service
- SNMP (know the state of every node)
- OSPF (larger scale)
- DNS (the largest scale)

Most thing in internet is implemented in user space instead of kernel space

### CDMA (1989)
- Frequent Hopping Spread Spectrum
- Direct Sequence Spread Spectrum
- Orthogonal coding
1 change: 1 addition & 0 deletions W14D1/menu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
- [Liu YiYu's note](note_liu_yiyu.md)
93 changes: 93 additions & 0 deletions W14D1/note_liu_yiyu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Review

```mermaid
graph LR
arch((Arch))
perf[Performance]
func[Functions]
exam[Examination]
vn(Von Neumann)
principles(principles)
rl(Reducing Latency)
p1([Small: =fast])
p2([Simple: RISC])
p3([Tradeoff/Compromise])
p4([Amdahl's Law])

arch --> func
arch --> perf
arch -.- exam
func --- vn
perf ---- principles
perf --- Parallelism
perf --- Locality
perf --- rl
principles --- p1
principles --- p2
principles --- p3
principles --- p4
```

## Functions
Von Neumann's architecture

## Performance
CPI (clock per instruction)
### Reduce Latency
- higher frequency
- CLA

### Principles
- Small (=fast)
- Simple (RISC)
- Tradeoff/Compromise
- Amdahl's Law: make the most common fast. ($\displaystyle s_p=\frac{1}{(1-n)+n/s}$)

## Pipeline

### Basic Principle
- Balance
- Speed up

### Hazard
$\approx$ stalls

#### Structural
Duplicating

Example:

memory conflict

- ID's 4th stage
- Ii's 1st stage

Solution: I-Cache / D-Cache

(or Harvard Structure)

#### Data
- True-dep (RAW)
- small dist: forwarding
- large dist: out of order (hardware) / move code (software)
- Pseudo-dep

#### Control
jump, branch
- Early branch prediction
- calculation delay (e.g. BTB, but return cause an issue)
- delay slot
- Kill

## Locality Cache
$\displaystyle AMAT_{\mathrm{cache}} = T_{\mathrm{hit}} + \eta_{\mathrm{miss}}\times T_{\mathrm{penalty}}$

- $T_{\mathrm{hit}}$: (for cache) small, (direct mapping is fast, and fully associated is slow!)
- $\eta_{\mathrm{miss}}$: higher associativity, smaller miss rate
- $T_{\mathrm{penalty}}$: (for memory / L2 cache) wider bus / multi-bank

### Cache

1. use index to find which line (block)
2. check whether the tag matches
3. find which part of the block is the data
1 change: 1 addition & 0 deletions W7D1/menu.md
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
- [钟逸超](Note_W7D1_Yichao_Zhong.md)
- [刘祎禹](note_liu_yiyu.md)
119 changes: 119 additions & 0 deletions W7D1/note_liu_yiyu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Note for W7D1

*Authored by Liu Yi-Yu*

Target: reduce $AMAT = HitTime + MissRate \times MissPenalty$

## 5. Reducing Misses by Hardware Prefetching Data

```mermaid
graph LR
Core((Core))
Cache[Cache]
Mem[Memory]
WB[Write Buffer / Stream Buffer]
Core --> |va| Cache
Cache --> Core
Cache --> |pa| Mem
Cache --- WB
WB --- Mem
```

## 6. Reducing Misses by Software Prefetching Data

### Load to a register

### Touch a memory address (cache)

### Make accessing order to be consistent with the order in memory

## Different Kinds of Memory

- SRAM
- R-S
- D-latch
- DRAM
- electric capacity
- EDU / FP
- SDRAM

## 7.Reducing Misses by Compiler Optimization

### Instruction
- Reorder procedures in memory so as to reduce conflict misses
- Profiling to look at conflicts(using tools they developed)

### Data
- Merging Arrays
- Reducing conflicts between val & key:
```c++
/* Before: 2 sequential arrays */
int val[SIZE];
int key[SIZE];
/* After: 1 array of stuctures */
struct merge {
int val;
int key;
};
struct merge merged_array[SIZE];
```

- Loop Interchange
- Sequential accesses instead of striding through memory every 100 words
- improved spatial locality
```c++
/* Before */
for (k = 0; k < 100; k = k + 1)
for (j = 0; j < 100; j = j + 1)
for (i = 0; i < 5000; i = i + 1)
x[i][j] = 2 * x[i][j];
/* After */
for (k = 0; k < 100; k = k + 1)
for (i = 0; i < 5000; i = i + 1)
for (j = 0; j < 100; j = j + 1)
x[i][j] = 2 * x[i][j];
```

- Loop Fusion

```c++
/* Before */
for (i = 0; i < N; i = i + 1)
for (j = 0; j < N; j = j + 1)
a[i][j] = 1 / b[i][j] * c[i][j];
for (i = 0; i < N; i = i + 1)
for (j = 0; j < N; j = j + 1)
d[i][j] = a[i][j] + c[i][j];
/* After */
for (i = 0; i < N; i = i+1) {
for (j = 0; j < N; j = j+1) {
a[i][j] = 1 / b[i][j] * c[i][j];
d[i][j] = a[i][j] + c[i][j];
}
}
```

- Blocking:

```c++
/* Before */
for (i = 0; i < N; i = i+1) {
for (j = 0; j < N; j = j+1) {
r = 0;
for (k = 0; k < N; k = k+1) {
r = r + y[i][k]*z[k][j];
}
x[i][j] = r;
};
/* After */
for (jj = 0; jj < N; jj = jj+B)
for (kk = 0; kk < N; kk = kk+B)
for (i = 0; i < N; i = i+1)
for (j = jj; j < min(jj+B-1,N); j = j+1) {
r = 0;
for (k = kk; k < min(kk+B-1,N); k = k+1) {
r = r + y[i][k]*z[k][j];
}
x[i][j] = x[i][j] + r;
}
```