diff --git a/W10D2/menu.md b/W10D2/menu.md index e69de29..03fc318 100644 --- a/W10D2/menu.md +++ b/W10D2/menu.md @@ -0,0 +1 @@ +- [Liu YiYu's note](note_liu_yiyu.md) diff --git a/W10D2/note_liu_yiyu.md b/W10D2/note_liu_yiyu.md new file mode 100644 index 0000000..998e062 --- /dev/null +++ b/W10D2/note_liu_yiyu.md @@ -0,0 +1,57 @@ +# Note for W7D1 + +*Authored by Liu Yi-Yu* + +Async v.s. Sync +- Interruption (I/O) +- Internet + - ATM/IDSM (Euro) + - TCP/IP (USA) + +## Interruption + +Outer devices interrupt CPU + +Using Daisy chain to access: +```mermaid +graph LR +Core((Core)) +A[Device] +B[Device] +C[device] +Core --> A +A --> Core +A --> B +B --> A +B --> C +C --> B +``` + +## Internet +Using supercube to reduce the distance + +### ATM/ISDN (Euro) + +ATM: Async Transfer Mode + +- virtual circuit (the circuit using switch) + +- Have full access to the virtual circuit if the connection has been established + +- slice the data into pieces to avoid data loss + +### TCP/IP (USA) + +Using the asynchronous network to implement a synchronous network + +### Internet Service +- SNMP (know the state of every node) +- OSPF (larger scale) +- DNS (the largest scale) + +Most thing in internet is implemented in user space instead of kernel space + +### CDMA (1989) +- Frequent Hopping Spread Spectrum +- Direct Sequence Spread Spectrum +- Orthogonal coding diff --git a/W14D1/menu.md b/W14D1/menu.md index e69de29..03fc318 100644 --- a/W14D1/menu.md +++ b/W14D1/menu.md @@ -0,0 +1 @@ +- [Liu YiYu's note](note_liu_yiyu.md) diff --git a/W14D1/note_liu_yiyu.md b/W14D1/note_liu_yiyu.md new file mode 100644 index 0000000..92f00a2 --- /dev/null +++ b/W14D1/note_liu_yiyu.md @@ -0,0 +1,93 @@ +# Review + +```mermaid +graph LR +arch((Arch)) +perf[Performance] +func[Functions] +exam[Examination] +vn(Von Neumann) +principles(principles) +rl(Reducing Latency) +p1([Small: =fast]) +p2([Simple: RISC]) +p3([Tradeoff/Compromise]) +p4([Amdahl's Law]) + +arch --> func +arch --> perf +arch -.- exam +func --- vn +perf ---- principles +perf --- Parallelism +perf --- Locality +perf --- rl +principles --- p1 +principles --- p2 +principles --- p3 +principles --- p4 +``` + +## Functions +Von Neumann's architecture + +## Performance +CPI (clock per instruction) +### Reduce Latency +- higher frequency +- CLA + +### Principles +- Small (=fast) +- Simple (RISC) +- Tradeoff/Compromise +- Amdahl's Law: make the most common fast. ($\displaystyle s_p=\frac{1}{(1-n)+n/s}$) + +## Pipeline + +### Basic Principle +- Balance +- Speed up + +### Hazard +$\approx$ stalls + +#### Structural +Duplicating + +Example: + +memory conflict + +- ID's 4th stage +- Ii's 1st stage + +Solution: I-Cache / D-Cache + +(or Harvard Structure) + +#### Data +- True-dep (RAW) + - small dist: forwarding + - large dist: out of order (hardware) / move code (software) +- Pseudo-dep + +#### Control +jump, branch +- Early branch prediction +- calculation delay (e.g. BTB, but return cause an issue) +- delay slot +- Kill + +## Locality Cache +$\displaystyle AMAT_{\mathrm{cache}} = T_{\mathrm{hit}} + \eta_{\mathrm{miss}}\times T_{\mathrm{penalty}}$ + +- $T_{\mathrm{hit}}$: (for cache) small, (direct mapping is fast, and fully associated is slow!) +- $\eta_{\mathrm{miss}}$: higher associativity, smaller miss rate +- $T_{\mathrm{penalty}}$: (for memory / L2 cache) wider bus / multi-bank + +### Cache + +1. use index to find which line (block) +2. check whether the tag matches +3. find which part of the block is the data diff --git a/W7D1/menu.md b/W7D1/menu.md index 033ea6f..153cd61 100644 --- a/W7D1/menu.md +++ b/W7D1/menu.md @@ -1 +1,2 @@ - [钟逸超](Note_W7D1_Yichao_Zhong.md) +- [刘祎禹](note_liu_yiyu.md) diff --git a/W7D1/note_liu_yiyu.md b/W7D1/note_liu_yiyu.md new file mode 100644 index 0000000..d424c80 --- /dev/null +++ b/W7D1/note_liu_yiyu.md @@ -0,0 +1,119 @@ +# Note for W7D1 + +*Authored by Liu Yi-Yu* + +Target: reduce $AMAT = HitTime + MissRate \times MissPenalty$ + +## 5. Reducing Misses by Hardware Prefetching Data + +```mermaid +graph LR +Core((Core)) +Cache[Cache] +Mem[Memory] +WB[Write Buffer / Stream Buffer] +Core --> |va| Cache +Cache --> Core +Cache --> |pa| Mem +Cache --- WB +WB --- Mem +``` + +## 6. Reducing Misses by Software Prefetching Data + +### Load to a register + +### Touch a memory address (cache) + +### Make accessing order to be consistent with the order in memory + +## Different Kinds of Memory + +- SRAM + - R-S + - D-latch +- DRAM + - electric capacity +- EDU / FP +- SDRAM + +## 7.Reducing Misses by Compiler Optimization + +### Instruction +- Reorder procedures in memory so as to reduce conflict misses +- Profiling to look at conflicts(using tools they developed) + +### Data +- Merging Arrays + - Reducing conflicts between val & key: +```c++ +/* Before: 2 sequential arrays */ +int val[SIZE]; +int key[SIZE]; +/* After: 1 array of stuctures */ +struct merge { + int val; + int key; +}; +struct merge merged_array[SIZE]; +``` + +- Loop Interchange + - Sequential accesses instead of striding through memory every 100 words + - improved spatial locality +```c++ +/* Before */ +for (k = 0; k < 100; k = k + 1) + for (j = 0; j < 100; j = j + 1) + for (i = 0; i < 5000; i = i + 1) + x[i][j] = 2 * x[i][j]; +/* After */ +for (k = 0; k < 100; k = k + 1) + for (i = 0; i < 5000; i = i + 1) + for (j = 0; j < 100; j = j + 1) + x[i][j] = 2 * x[i][j]; +``` + +- Loop Fusion + +```c++ +/* Before */ +for (i = 0; i < N; i = i + 1) + for (j = 0; j < N; j = j + 1) + a[i][j] = 1 / b[i][j] * c[i][j]; +for (i = 0; i < N; i = i + 1) + for (j = 0; j < N; j = j + 1) + d[i][j] = a[i][j] + c[i][j]; +/* After */ +for (i = 0; i < N; i = i+1) { + for (j = 0; j < N; j = j+1) { + a[i][j] = 1 / b[i][j] * c[i][j]; + d[i][j] = a[i][j] + c[i][j]; + } +} +``` + +- Blocking: + +```c++ +/* Before */ +for (i = 0; i < N; i = i+1) { + for (j = 0; j < N; j = j+1) { + r = 0; + for (k = 0; k < N; k = k+1) { + r = r + y[i][k]*z[k][j]; + } + x[i][j] = r; +}; +/* After */ +for (jj = 0; jj < N; jj = jj+B) + for (kk = 0; kk < N; kk = kk+B) + for (i = 0; i < N; i = i+1) + for (j = jj; j < min(jj+B-1,N); j = j+1) { + r = 0; + for (k = kk; k < min(kk+B-1,N); k = k+1) { + r = r + y[i][k]*z[k][j]; + } + x[i][j] = x[i][j] + r; + } +```