Skip to content
This repository was archived by the owner on Apr 28, 2023. It is now read-only.

Commit 37bf9bc

Browse files
Address review comments
1 parent 59c3629 commit 37bf9bc

File tree

6 files changed

+92
-91
lines changed

6 files changed

+92
-91
lines changed

CodingConventions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -594,4 +594,4 @@ Coding Conventions for writing Tensor Comprehensions
594594

595595
Please see the following documentation
596596
[entry](https://facebookresearch.github.io/TensorComprehensions/coding_conventions.html)
597-
on how to write Tensor Comprehensions in a standard, legible, fashion.
597+
on how to write Tensor Comprehensions in a standard legible fashion.

docs/doxygen/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ Let's start with a simple example is a matrix vector product:
2828
`A` and `x` are input tensors. `o` is an output tensor.
2929
The statement `o(r) +=! A(r,r_c) * x(r_c)` introduces two index variables `r` and `r_c`.
3030
Their range is inferred by their use indexing `A` and `x`. `r = [0,R)`, `r_c = [0,C)`.
31-
Because `r_c` only appears on the right side,
31+
Because `r_c` only appears on the righthand side,
3232
stores into `o` will reduce over `r_c` with the reduction specified for the loop.
3333
Reductions can occur across multiple variables, but they all share the same kind of associative reduction (e.g. +=)
3434
to maintain invariant (3). `mv` computes the same thing as this C++ loop:

docs/source/coding_conventions.rst

Lines changed: 18 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ Coding Conventions
33

44
In order to increase readability across Tensor Comprehensions written by
55
multiple authors and to reduce the amount of surprising behavior, the
6-
following conventions should be adopted when writing TC. Generally in TC one
6+
following conventions should be adopted when writing TC. Generally in TC, one
77
should increment nesting by 4 whitespaces at each level and align tensor names
88
and indices where appropriate to make memory access patterns emerge. Since
99
these two goals can easily be conflicting, use your best judgement to tradeoff
@@ -12,7 +12,7 @@ between the two goals. Such examples are provided below.
1212
Use indices named after parameters
1313
----------------------------------
1414

15-
Use upper-case names for parameters and input/output tensors.
15+
Use upper-case names for parameters and capital-case names for input/output tensors.
1616
Use lower-case names for indices to match the name of the parameter
1717
corresponding to the dimension upon which they iterate.
1818
In other words, prefer:
@@ -55,46 +55,47 @@ to:
5555
C(m, n) +=! A(m, k) * B(k, n)
5656
}
5757
58-
Filter non-rectangular regions with deta-dependencies
58+
Filter non-rectangular regions with data-dependencies
5959
-----------------------------------------------------
6060

61-
TC semantics only support (hyper-)rectangular iteration spaces. This is a hard
62-
requirement to make range inference non-ambiguous. To simulate non-rectangular
63-
iteration spaces, one can use the following:
61+
TC semantics are restricted to (hyper-)rectangular iteration spaces.
62+
This is a hard requirement to ensure range inference is non-ambiguous (see inference_).
63+
To simulate non-rectangular iteration spaces, one can use the following:
6464

6565
.. code::
6666
6767
def matmul(float(M, K) L, float(K, M) U) -> (LU) {
6868
LU(m1, m2) +=! (r_k >= m1 and r_k =< m2) ? L(m1, r_k) * U(r_k, m2) : 0
6969
}
7070
71-
However, the following is incompatible with range inference and will fail
72-
the semantic checks in the TC compiler:
71+
However, non-(hyper)-rectangular iteration spaces (e.g. triangular) are
72+
incompatible with range inference and will fail the semantic checks in the TC
73+
compiler:
7374

7475
.. code::
7576
7677
def matmul(float(M, K) L, float(K, M) U) -> (LU) {
7778
LU(m1, m2) +=! L(m1, r_k) * U(r_k, m2) where r_k in m1:M, r_k in 0:m2+1
7879
}
7980
80-
The reader may remark that this is an inefficient way of performing
81+
The reader may remark that this is an inefficient way of writing
8182
matrix-multiplication of triangular matrices.
82-
Lowering such operations efficient from TC is the subject of future work.
83+
Lowering such operations efficiently from TC is the subject of future work.
8384

84-
Prefix gradient tensors names with :code:`g_`
85+
Prefix gradient tensors names with :code:`d_`
8586
---------------------------------------------
8687

8788
When implementing backward operations, pass the inputs to the backwards pass
88-
in the same order as the outputs to the forward passs and use the same tensor
89-
name prefixed by :code:`g_`. For instance:
89+
in the same order as the outputs of the forward pass and use the same tensor
90+
name prefixed by :code:`d_`. For instance:
9091

9192
.. code::
9293
9394
def conv(float(N,C,H,W) I, float(M,C,KH,KW) Wt) -> (O) {
9495
...
9596
}
9697
97-
def conv_bw(float(N,C,H,W) I, float(M,C,KH,KW) Wt, float(N,M,HO,WO) g_O) -> (g_I) {
98+
def conv_bw(float(N,C,H,W) I, float(M,C,KH,KW) Wt, float(N,M,HO,WO) d_O) -> (d_I) {
9899
...
99100
}
100101
@@ -110,9 +111,9 @@ and the emergence of an antidiagonal pattern in the reduction accesses:
110111
def matmul(float(M,K) A, float(K,N) B) -> (C) {
111112
C(m, n) +=! A(m, r_k) * B(r_k, n)
112113
}
113-
def matmul_bw(float(M,K) A, float(K,N) B, float(M,N) g_C) -> (g_A, g_B){
114-
g_A(m, k) +=! g_C( m, r_n) * B( k, r_n)
115-
g_B(k, n) +=! g_C(r_m, n) * A(r_m, k)
114+
def matmul_bw(float(M,K) A, float(K,N) B, float(M,N) d_C) -> (d_A, d_B){
115+
d_A(m, k) +=! d_C( m, r_n) * B( k, r_n)
116+
d_B(k, n) +=! d_C(r_m, n) * A(r_m, k)
116117
}
117118
118119
Reasoning on such reduction patterns at the level of TC has already proven

docs/source/framework/pytorch_integration/autograd_with_tc.rst

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,9 @@ Examples
2929
def convolution(float(N,C,H,W) I, float(M,C,KH,KW) W1) -> (O) {{
3030
O(n, m, h, w) +=! I(n, r_c, {sh} * h + r_kh, {sw} * w + r_kw) * W1(m, r_c, r_kh, r_kw)
3131
}}
32-
def convolution_grad(float(N,C,H,W) I, float(M,C,KH,KW) W1, float(N,M,H,W) g_O) -> (g_I, g_W1) {{
33-
g_I(n, c, h, w) +=! g_O( n, r_m, {sh} * h - r_kh, {sw} * w - r_kw) * W1(r_m, c, r_kh, r_kw)
34-
g_W1(m, c, kh, kw) +=! g_O(r_n, m, {sh} * r_h - kh, {sw} * r_w - kw) * I(r_n, c, r_h, r_w)
32+
def convolution_grad(float(N,C,H,W) I, float(M,C,KH,KW) W1, float(N,M,H,W) d_O) -> (d_I, d_W1) {{
33+
d_I(n, c, h, w) +=! d_O( n, r_m, {sh} * h - r_kh, {sw} * w - r_kw) * W1(r_m, c, r_kh, r_kw)
34+
d_W1(m, c, kh, kw) +=! d_O(r_n, m, {sh} * r_h - kh, {sw} * r_w - kw) * I(r_n, c, r_h, r_w)
3535
}}
3636
"""
3737
N, C, H, W, O, kH, kW, sH, sW = 32, 4, 56, 56, 16, 1, 1, 1, 1
@@ -68,9 +68,9 @@ them, the example for that would be:
6868
def convolution(float(N,C,H,W) I, float(M,C,KH,KW) W1) -> (O) {{
6969
O(n, m, h, w) +=! I(n, r_c, {sh} * h + r_kh, {sw} * w + r_kw) * W1(m, r_c, r_kh, r_kw)
7070
}}
71-
def convolution_grad(float(N,C,H,W) I, float(M,C,KH,KW) W1, float(N,M,H,W) g_O) -> (g_I, g_W1) {{
72-
g_I(n, c, h, w) +=! g_O( n, r_m, {sh} * h - r_kh, {sw} * w - r_kw) * W1(r_m, c, r_kh, r_kw)
73-
g_W1(m, c, kh, kw) +=! g_O(r_n, m, {sh} * r_h - kh, {sw} * r_w - kw) * I(r_n, c, r_h, r_w)
71+
def convolution_grad(float(N,C,H,W) I, float(M,C,KH,KW) W1, float(N,M,H,W) d_O) -> (d_I, d_W1) {{
72+
d_I(n, c, h, w) +=! d_O( n, r_m, {sh} * h - r_kh, {sw} * w - r_kw) * W1(r_m, c, r_kh, r_kw)
73+
d_W1(m, c, kh, kw) +=! d_O(r_n, m, {sh} * r_h - kh, {sw} * r_w - kw) * I(r_n, c, r_h, r_w)
7474
}}
7575
"""
7676
N, C, H, W, O, kH, kW, sH, sW = 32, 4, 56, 56, 16, 1, 1, 1, 1
@@ -102,9 +102,9 @@ Let's see how to cache options to file when we tune a training layer.
102102
def convolution(float(N,C,H,W) I, float(M,C,KH,KW) W1) -> (O) {{
103103
O(n, m, h, w) +=! I(n, r_c, {sh} * h + r_kh, {sw} * w + r_kw) * W1(m, r_c, r_kh, r_kw)
104104
}}
105-
def convolution_grad(float(N,C,H,W) I, float(M,C,KH,KW) W1, float(N,M,H,W) g_O) -> (g_I, g_W1) {{
106-
g_I(n, c, h, w) +=! g_O( n, r_m, {sh} * h - r_kh, {sw} * w - r_kw) * W1(r_m, c, r_kh, r_kw)
107-
g_W1(m, c, kh, kw) +=! g_O(r_n, m, {sh} * r_h - kh, {sw} * r_w - kw) * I(r_n, c, r_h, r_w)
105+
def convolution_grad(float(N,C,H,W) I, float(M,C,KH,KW) W1, float(N,M,H,W) d_O) -> (d_I, d_W1) {{
106+
d_I(n, c, h, w) +=! d_O( n, r_m, {sh} * h - r_kh, {sw} * w - r_kw) * W1(r_m, c, r_kh, r_kw)
107+
d_W1(m, c, kh, kw) +=! d_O(r_n, m, {sh} * r_h - kh, {sw} * r_w - kw) * I(r_n, c, r_h, r_w)
108108
}}
109109
"""
110110
N, C, H, W, O, kH, kW, sH, sW = 32, 4, 56, 56, 16, 1, 1, 1, 1
@@ -136,11 +136,11 @@ the example below for how to use it:
136136
tmp(n, m, h, w) +=! I(n, r_c, h + r_kh, w + r_kw) * W1(m, r_c, r_kh, r_kw)
137137
O(n, m, h, w) = tmp(n, m, h, w) + B(m)
138138
}
139-
def convolution_grad(float(N, C, H, W) I, float(M, C, KH, KW) W1, float(M) B, float(N, M, H, W) g_O)
140-
-> (g_I, g_W1, g_B) {
141-
g_I(n, c, h, w) +=! g_O( n, r_m, h - r_kh, w - r_kw) * W1(r_m, c, r_kh, r_kw)
142-
g_W1(m, c, kh, kw) +=! g_O(r_n, m, r_h - kh, r_w - kw) * I(r_n, c, r_h, r_w)
143-
g_B(m) +=! g_O(n, m, h, w)
139+
def convolution_grad(float(N, C, H, W) I, float(M, C, KH, KW) W1, float(M) B, float(N, M, H, W) d_O)
140+
-> (d_I, d_W1, d_B) {
141+
d_I(n, c, h, w) +=! d_O( n, r_m, h - r_kh, w - r_kw) * W1(r_m, c, r_kh, r_kw)
142+
d_W1(m, c, kh, kw) +=! d_O(r_n, m, r_h - kh, r_w - kw) * I(r_n, c, r_h, r_w)
143+
d_B(m) +=! d_O(n, m, h, w)
144144
}
145145
"""
146146

docs/source/framework/pytorch_integration/layers_database.rst

Lines changed: 39 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,8 @@ Average pooling
3232

3333
.. code::
3434
35-
def avgpool(float(B, C, H, W) input) -> (output) {{
36-
output(b, c, h, w) +=! input(b, c, h * {sH} + r_kh, w * {sW} + r_kw) / ({kH} * {kW})
35+
def avgpool(float(B, C, H, W) Input) -> (Output) {{
36+
Output(b, c, h, w) +=! Input(b, c, h * {sH} + r_kh, w * {sW} + r_kw) / ({kH} * {kW})
3737
where r_kh in 0:{kH}, r_kw in 0:{kW}
3838
}}
3939
@@ -43,8 +43,8 @@ Max pooling
4343

4444
.. code::
4545
46-
def maxpool(float(B, C, H, W) input) -> (output) {{
47-
output(b, c, h, w) max=! input(b, c, h * {sH} + r_kh, w * {sW} + r_kw)
46+
def maxpool(float(B, C, H, W) Input) -> (Output) {{
47+
Output(b, c, h, w) max=! Input(b, c, h * {sH} + r_kh, w * {sW} + r_kw)
4848
where r_kh in 0:{kH}, r_kw in 0:{kW}
4949
}}
5050
@@ -76,9 +76,9 @@ Strided Convolution Gradient
7676

7777
.. code::
7878
79-
def convolution_grad(float(N, C, H, W) I, float(M, C, KH, KW) W1, float(N, M, H, W) g_O) -> (g_I, g_W1) {{
80-
g_I(n, c, h, w) +=! g_O(n, r_m, {sh} * h - r_kh, {sw} * w - r_kw) * W1(r_m, c, r_kh, r_kw)
81-
g_W1(m, c, kh, kw) +=! g_O(n, m, {sh} * r_h - kh, {sw} * r_w - kw) * I(r_n, c, r_h, r_w)
79+
def convolution_grad(float(N, C, H, W) I, float(M, C, KH, KW) W1, float(N, M, H, W) d_O) -> (d_I, d_W1) {{
80+
d_I(n, c, h, w) +=! d_O(n, r_m, {sh} * h - r_kh, {sw} * w - r_kw) * W1(r_m, c, r_kh, r_kw)
81+
d_W1(m, c, kh, kw) +=! d_O(n, m, {sh} * r_h - kh, {sw} * r_w - kw) * I(r_n, c, r_h, r_w)
8282
}}
8383
8484
Simple Group Convolution
@@ -140,11 +140,11 @@ Softmax
140140

141141
.. code::
142142
143-
def softmax(float(N, D) I) -> (O, maxVal, expDistance, expSum) {
144-
maxVal(n) max=! I(n, d)
145-
expDistance(n, d) = exp(I(n, d) - maxVal(n))
146-
expSum(n) +=! expDistance(n, d)
147-
O(n, d) = expDistance(n, d) / expSum(n)
143+
def softmax(float(N, D) I) -> (O, MaxVal, ExpDistance, ExpSum) {
144+
MaxVal(n) max=! I(n, d)
145+
ExpDistance(n, d) = exp(I(n, d) - MaxVal(n))
146+
ExpSum(n) +=! ExpDistance(n, d)
147+
O(n, d) = ExpDistance(n, d) / ExpSum(n)
148148
}
149149
150150
Tanh
@@ -191,9 +191,9 @@ Matmul Gradient
191191

192192
.. code::
193193
194-
def matmul_bw(float(M,K) A, float(K,N) B, float(M,N) g_C) -> (g_A, g_B){
195-
g_A(m, k) +=! g_C( m, r_n) * B( k, r_n)
196-
g_B(k, n) +=! g_C(r_m, n) * A(r_m, k)
194+
def matmul_bw(float(M,K) A, float(K,N) B, float(M,N) d_C) -> (d_A, d_B){
195+
d_A(m, k) +=! d_C( m, r_n) * B( k, r_n)
196+
d_B(k, n) +=! d_C(r_m, n) * A(r_m, k)
197197
}
198198
199199
Batch Matmul
@@ -219,8 +219,8 @@ Add
219219

220220
.. code::
221221
222-
def add(float(N) A, float(N) B) -> (output) {
223-
output(n) = A(n) + B(n)
222+
def add(float(N) A, float(N) B) -> (Output) {
223+
Output(n) = A(n) + B(n)
224224
}
225225
226226
Tensor Operations
@@ -231,8 +231,8 @@ Indexing
231231

232232
.. code::
233233
234-
def indexing(float(H, W) input, int32(L) index) -> (output) {{
235-
output(l, w) = input(index(l), w)
234+
def indexing(float(H, W) Input, int32(L) Index) -> (Output) {{
235+
Output(l, w) = Input(Index(l), w)
236236
}}
237237
238238
Lookup Table
@@ -327,17 +327,17 @@ Batch Normalization
327327

328328
.. code::
329329
330-
def batchnorm(float(N,C,H,W) I, float(C) rMeanIn, float(C) rVarIn)
331-
-> (O, rMeanOut, rVarOut, mean, centered, variance, expectedVariance, normalizedOut)
330+
def batchnorm(float(N,C,H,W) I, float(C) RMeanIn, float(C) RVarIn)
331+
-> (O, RMeanOut, RVarOut, Mean, Centered, Variance, ExpectedVariance, normalizedOut)
332332
{{
333-
mean(c) +=! I(nn, c, hh, ww)
334-
mean(c) = mean(c) / (N * H * W)
335-
rMeanOut(c) = (1 - {momentum}) * rMeanIn(c) + {momentum} * mean(c)
336-
centered(n, c, h, w) = I(n, c, h, w) - rMeanOut(c)
337-
variance(n, c, h, w) = centered(n, c, h, w) * centered(n, c, h, w)
338-
expectedVariance(c) +=! (variance(n, c, h, w) + {eps}) / (N * H * W)
339-
rVarOut(c) = rsqrt((1 - {momentum}) * rVarIn(c) + {momentum} * expectedVariance(c))
340-
O(n, c, h, w) = centered(n, c, h, w) * rVarOut(c)
333+
Mean(c) +=! I(nn, c, hh, ww)
334+
Mean(c) = Mean(c) / (N * H * W)
335+
RMeanOut(c) = (1 - {momentum}) * RMeanIn(c) + {momentum} * Mean(c)
336+
Centered(n, c, h, w) = I(n, c, h, w) - RMeanOut(c)
337+
Variance(n, c, h, w) = Centered(n, c, h, w) * Centered(n, c, h, w)
338+
ExpectedVariance(c) +=! (Variance(n, c, h, w) + {eps}) / (N * H * W)
339+
RVarOut(c) = rsqrt((1 - {momentum}) * RVarIn(c) + {momentum} * ExpectedVariance(c))
340+
O(n, c, h, w) = Centered(n, c, h, w) * RVarOut(c)
341341
normalizedOut(n, c, h, w) = O(n, c, h, w)
342342
}}
343343
@@ -346,12 +346,12 @@ Layer Normalization
346346

347347
.. code::
348348
349-
def layernorm(float(T, B, C) I) -> (O, mean, centered, var) {{
350-
mean(t, b) +=! I(t, b, c) / C
351-
centered(t, b, c) = I(t, b, c) - mean(t, b)
352-
var(t, b) +=! centered(t, b, c) * centered(t, b, c)
353-
var(t, b) = (var(t, b) + {eps}) / C
354-
O(t, b, c) = centered(t, b, c) / rsqrt(var(t, b))
349+
def layernorm(float(T, B, C) I) -> (O, Mean, Centered, Var) {{
350+
Mean(t, b) +=! I(t, b, c) / C
351+
Centered(t, b, c) = I(t, b, c) - Mean(t, b)
352+
Var(t, b) +=! Centered(t, b, c) * Centered(t, b, c)
353+
Var(t, b) = (Var(t, b) + {eps}) / C
354+
O(t, b, c) = Centered(t, b, c) / rsqrt(Var(t, b))
355355
}}
356356
357357
Distance Functions
@@ -362,10 +362,10 @@ Cosine Similarity
362362

363363
.. code::
364364
365-
def cosine_similarity(float(M, N) I1, float(M, N) I2) -> (O, sumI1, sumI2) {{
366-
sumI1(m) +=! I1(m, n) * I1(m, n)
367-
sumI2(m) +=! I2(m, n) * I2(m, n)
368-
O(m) +=! (I1(m, n) * I2(m, n)) / fmax(rsqrt(sumI1(m)) * sqrt(sumI2(m)), {eps})
365+
def cosine_similarity(float(M, N) I1, float(M, N) I2) -> (O, SumI1, SumI2) {{
366+
SumI1(m) +=! I1(m, n) * I1(m, n)
367+
SumI2(m) +=! I2(m, n) * I2(m, n)
368+
O(m) +=! (I1(m, n) * I2(m, n)) / fmax(rsqrt(SumI1(m)) * sqrt(SumI2(m)), {eps})
369369
}}
370370
371371
What operations can not be expressed

0 commit comments

Comments
 (0)