Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added W10D2/RAID0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added W10D2/RAID1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added W10D2/RAID3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added W10D2/RAID4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added W10D2/RAID5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
52 changes: 52 additions & 0 deletions W10D2/W10D1 李磊智.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Topic: RAID
RAID(Redundant Array of Indepedent Disks) is a system of disk matrix designed for large capacity, good performance and high reliability.

### **Key Tech**
1. Data Stripping
Data stripping can be understand as memory banks. Both of which use multiple blocks in order to improve I/O speed because continuous I/O can be done in different blocks in parallel.
2. Data Validation
Data validation is used to garanteen the correctness of data and can recover damaged data in certain circumstances. In practice, data validation is usually implemented in two ways: Block Check Characte (BCC)$^1$ and Hamming Code $^2$. Both ways use additional data to store validation information in disks thus requiring additional disk storage. When we write information, the corresponding validation information also requires updation.

### **RAID Rank**
1. RAID0
RAID0 is a simple structure with data stripping but without data validation. Such structure provides high speed but poor data protection. When data is lost, it shall not be recovered.
![RAID0](./RAID0.png))
2. RAID1
RAID1 is also called mirroring. It write identical data into working disk and mirror disk. When data in woriking disk is damaged, it can be retrieved in mirror disk. RAID1 provides high level of data security, but is very expensive since only half of the disks are used.
![RAID1](./RAID1.png))


3. RAID3 RAID4
RAID2 uses the technology of Block Check Characte. The validation information is kept in a seperate disks which records the xor result of the corresponding blocks in different disks.
![RAID3](./RAID3.png)) ![RAID4](./RAID4.png))

4. RAID5
RAID5 is an optimation from RAID4. In RAID3 and RAID4 all validation data are stored in one disk. In writing procedure, one must correct the disk along with the validation disk which forces that writng in different disks have to be executed in parallel since they all need to rewrite validation disk. To overcome such bottleneck, RAID5 employs the design of distributing all validation data in all disks as shown in the illustration below. When we exercise writes on different disks, there can be chances that the validation data corresponding to two strips are in different blocks. In this way, two writes can be executed at the same time.
![RAID5](./RAID5.png)

<br> </br>


*1. <https://blog.csdn.net/qq_45080064/article/details/118793849>*
*2. <https://blog.csdn.net/interestANd/article/details/115606013>*

<br> </br>
# Topic: Little's Law
Little’s Law is a theorem that determines the average number of items in a stationary queuing system, based on the average waiting time of an item within a system and the average number of items arriving at the system per unit of time.
$$L=\lambda S$$
<br> </br>
# Topic: Interconnection(NoP)
### Basic concept in topology
Node degree: the number of links between a node and its neighbors.
Network diameter: The maximum length of the shortest path between any two nodes in the network is linearly proportional to the communication delay of the network.
Average shortest distance: the ratio of the sum of the minimum distance between any two IP nodes in the network to the number of paths contained in the sum.
### Possible models
- hypercube
- mesh
- ring
- dynamic network
**...**

For for information, read
<https://blog.csdn.net/weixin_41561691/article/details/104649688>
<https://zhuanlan.zhihu.com/p/93409326>
Binary file added W13D2/ABA.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 19 additions & 0 deletions W13D2/W13D2李磊智.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Topic: Load-linked/store-conditional(LL/SC)
### Definition:
In computer science, load-linked/store-conditional (LL/SC) are a pair of instructions used in multithreading to achieve synchronization. Load-link returns the current value of a memory location, while a subsequent store-conditional to the same memory location will store a new value only if no updates have occurred to that location since the load-link. Together, this implements a lock-free atomic read-modify-write operation.
### Need:
When we try to update a value, we first need to read it into the registers and then modify the value in registers then return the value to our memory. In that process, the value of the original data in the memory is possibly modified by other operators. When we rewtite out value, we overlap the other modification. Thus, we need a protocol to avoid that.
### Example:
![llscExample](./llscExample.png)
The ll/sc operations in the code we give don't have a solid way of implemetation, they only give general idea. The ll operation will use a sort of tag to track any modification on the memory that it read. If in sc operation, we know that the memory has been touched, we abandon current result and redo the operation of ll/sc.

### CAS(Compare and Swap)
In computer science, compare-and-swap (CAS) is an atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of that memory location to a new given value. This is done as a single atomic operation. The atomicity guarantees that the new value is calculated based on up-to-date information; if the value had been updated by another thread in the meantime, the write would fail. The result of the operation must indicate whether it performed the substitution; this can be done either with a simple boolean response (this variant is often called compare-and-set), or by returning the value read from the memory location (not the value written to it).

### ABA Problem
![ABA](./ABA.png)

### Comparason: LL/SC versus CAS
- If any updates have occurred, the store-conditional is guaranteed to fail, even if the value read by the load-link has since been restored. As such, an LL/SC pair is stronger than a read followed by a compare-and-swap (CAS), which will not detect updates if the old value has been restored.
- Real implementations of LL/SC do not always succeed even if there are no concurrent updates to the memory location in question. Any exceptional events between the two operations, such as a context switch, another load-link, or even (on many platforms) another load or store operation, will cause the store-conditional to spuriously fail. Older implementations will fail if there are any updates broadcast over the memory bus. This is called weak LL/SC by researchers, as it breaks many theoretical LL/SC algorithms. Weakness is relative, and some weak implementations can be used for some algorithms.

Binary file added W13D2/llscExample.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added W2D1/DataPath.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added W2D1/VonUnits.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
41 changes: 41 additions & 0 deletions W2D1/W2D1李磊智.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Topic: RISC Instrutions

### 1. 32-bit fixed length instruction
- better support for super-scalar ( aligndein memory)
- less information density in instructions
- easier to decode
### 2. Memory access only via load/store instrctions
### 3. 32 32-bit GPR(general purpose registers)
- orthogonal (general purpose):
no relevence in two different registers
- pual port take pair:
two instrction can access the same register in the same time
### 4. 3-address, reg-reg arithmetic instruction
- example: add -rs1 -rs2 -rd
put the result of value(rs1) + value(rs2) into rd
### 5. single address mode for load/store: base + diaplacement
- example: ld rs1 offset rd
the wanted memory address is value(rs1) + offset
### 6. simple branch conditions
- Branch condition only in
### 7. delayed branch
- When a branch instruction is processed, CPU stops fetching instrctions.
- Slow, can be optimized by prediction
<br> </br>

# Topic: Von Neumann Structure
## 1. Five Units
![Von](./VonUnits.jpg)
- The CPU also contains registers.
- If a device have several CPU, a CPU can also be called a core.
- Interconnection in the structure is implemented by bus. Bus uses broadcast to communicate.

## 2. Key of Performance
### Locality
- registers
- internal bus(used inside CPU) and system bus(connect CPU and other component)
### Parallelism
<br> </br>

# Topic: Data Path
![DataPath](./DataPath.png)
33 changes: 33 additions & 0 deletions W6D2/W6D2 李磊智.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Discussion: ways of reducing cache miss

## **Preceding Knowledge**

### Basic classification of miss:
1. Compulsory: Brought in by the initialization of cache.
2. Capacity: Due to the limitation of the cache size, blocks will be discarded and later retrieved during the program execution.
3. Conflict: If a set is set associative or direct mapped, conflict misses will occur when a block may be discarded and later retrieved if too many blocks map to its set.
----

## **Optimization Practice**
### **review**
1. Larger Block Size
The total miss rate-block size curve shows a tendency of first decline and then rise. The decline is a result of a reduction in compulsory miss since fewer blocks are required in cache initialization. To understand the rise we must consider the effect of conflict miss. With larger block size and the same cache size, the number of blocks will fall, increasing the possibility of different blocks share the same position in the cache. And such effect is more severe in small cache size.
2. Higher Associativity
A higher associativity append the effective capacity pf the cache. If the program use two pieces of data mapped into the same index, the data will be moved between cache and memory constantly.With Higher associativity, conflict miss rate can be effectively reduced.However, a higher associaticity comes at the expense of increased hit time since the select logic is more complex.
### **new**
3. Victim Cache
The victim cache refers to a buffer which collects the replaced blocks from the cache temporarily. Such implemetation is based on the philosophy of combining the speed of direct mapped cache and the associativity of full associative cache. It is also based on the experience that the most often replaced caches are the ones used most recently. Such observation argues for the utility of Victim Cache.
----
**The Connection bewteen Victim Cache And Store Buffer**
Similarity:
- Both structures keep the up-to-date version of the data in the system.
- Both structures keeps information that will eventually be written into memory.
- Both structures helps improve performance.

Difference:
- The data in the victim cache is abandaned by cache while data in the store buffer is from the commmitments of store instructions.
- Victim cache improves performance by reducing cache miss, store buffer improves performance by bypassing values to load instruction and somtimes(in no sequential store buffer) by allowing load instruction execute when store instructions ahead of them are not ready.
- The content in victim cache will be replaced by newly abandoned cache lines automatically while entries in store buffer will be released only when the values it used are ready and be commited into the memory.
-----
4. Pseodo-associative Cache
The idea of pseodo-associative cache resembles the idea of open addresing method in hash table collision. It is usually used in cache with more and smaller cache sets. We main idea of pseodo-associative cache is that when we meet a cache miss we try another block set in a specific location. The locatio is specified by reversing the highest bit of the cache index. For example a index of 1101 will be transformed to 0101. In this way, even thouth we don`t change the structure of cache, we improve the associativity of cache lines.