[TIR] add support for multi-blocking layout and their transformation #9996

yangulei · 2022-01-20T08:21:26Z

Main works in this PR including:

add ceildiv() and shapediv() for the cases need padding.
add boundary checking in layout transform.
add support multi-blocking and tensor padding, include the inference of index and shape.

With those enhancement, a tensor with layout="OIHW" could be transformed to a tensor with layout="OHWI16i64o2i".

masahi · 2022-01-20T08:25:56Z

Nice, is it possible to remove custom layout transform in

tvm/python/tvm/topi/x86/conv2d_alter_op.py

Lines 171 to 181 in d1dafbd

    
           kernel_IHWO = relay.transpose(kernel_expr, axes=(1, 2, 3, 0)) 
        
           kernel_IHWOo = relay.reshape(kernel_IHWO, (in_channel, kh, kw, out_channel // oc_bn, oc_bn)) 
        
           kernel_OHWoI = relay.transpose(kernel_IHWOo, axes=(3, 1, 2, 4, 0)) 
        
           kernel_OHWoIi = relay.reshape( 
        
               kernel_OHWoI, (out_channel // oc_bn, kh, kw, oc_bn, in_channel // ic_bn, ic_bn) 
        
           ) 
        
           kernel_OHWoIie = relay.reshape( 
        
               kernel_OHWoIi, 
        
               (out_channel // oc_bn, kh, kw, oc_bn, in_channel // ic_bn, ic_bn // n_elems, n_elems), 
        
           ) 
        
           kernel_OIHWioe = relay.transpose(kernel_OHWoIie, axes=(0, 4, 1, 2, 5, 3, 6))

now? See also this recent discussion https://discuss.tvm.apache.org/t/do-we-still-need-relay-nn-contrib-conv2d-nchwc-op/11903

yangulei · 2022-01-20T08:51:23Z

@masahi Thanks for your timely comment.
Yes, the code section you referred to is doing a multi-blocking layout transformation, that is exactly what this PR for.
And I don't think we need relay.nn.contrib_conv2d_NCHWc either, since it's just the result of a simple layout transform form a regular conv2d.

masahi · 2022-01-20T08:56:18Z

I'm more than happy to see those ad-hoc transform removed, can you do that? I'd also like to remove conv2d_nchwc entirely, but that would be a breaking change and needs further discussion.

yangulei · 2022-01-20T09:04:32Z

Yeah, it's a pleasure to make the code clean and clear. I'll search the code for workarounds about multi-blocking and refine them.

masahi · 2022-01-20T09:18:45Z

Sounds great!

I'll search the code for workarounds about multi-blocking and refine them

I think the one we already discussed is the only instance of such a workaround.

tqchen · 2022-01-20T18:37:58Z

include/tvm/tir/op.h

+ * \note this function does eager constant folding for
+ *       shape types(int32, int64) when possible.
+ */
+TVM_DLL PrimExpr shapediv(PrimExpr a, PrimExpr b, Span span = Span());


the name shapediv could be confusing given indexdiv used the other case, how about nonneg_ceildiv?

I think it's a kind of symmetry, as indexdiv is an alias of floordiv to prevent access out-of-boundary, and shapediv is an alias of ceildiv to prevent the shrink of a Tensor.
If this is confusing, I prefer to remove indexdiv and shapediv since they are just aliases of floordiv and ceildiv now, or we can keep them and add check codes for non-negative then change their names to nonneg_floordiv/ceildiv.

tqchen · 2022-01-20T18:38:58Z

also cc @yzhliu @comaniac @vinx13

masahi · 2022-02-08T09:29:45Z

@yangulei Any update? I'm doing some VNNI stuff, I want to transform like

tvm/tests/python/contrib/test_gemm_acc32_vnni.py

Line 54 in 720e7b1

    
           packedW = te.placeholder((n // 16, 16 * (k // 4), 4), name="packedW", dtype="int8")

yangulei · 2022-02-10T00:46:09Z

@masahi The workaround we talked about has been removed in 1c7ef99.
The transformation you mentioned doesn't seems like blocking only. Could this test written in a way including the transformation like "NK" to "NK16n4k"?

masahi · 2022-02-10T00:49:55Z

Yeah I think that's possible. If I understand correctly, the point is to have 16 x 4 loop in the inner-most axis, so something like (n // 16, k // 4, 16, 4) should be ok as well. Does this answer your question?

yangulei · 2022-02-10T00:56:22Z

Yes, the transformation from "NK" to "NK16n4k" is well supported, even without this PR.

include/tvm/topi/transform.h

python/tvm/topi/x86/conv2d_alter_op.py

masahi

LGTM modulo one clarifying question

…pache#9996) * add ceildiv and shapediv * add boundary checking in layout_transform * support multi-blocking and shape padding * refine the log for shape transform * add test for multi-blocking layout transform * delete unwanted comments * remove workaround * fix lint errors

yangulei added 6 commits January 13, 2022 09:56

add ceildiv and shapediv

b9bcc61

add boundary checking in layout_transform

de43055

support multi-blocking and shape padding

488f51e

refine the log for shape transform

5eba789

add test for multi-blocking layout transform

3c7895b

delete unwanted comments

9c274e9

yangulei requested review from areusch, comaniac, Huyuwei, Hzfengsy, icemelon, jcf94, jroesch, junrushao, jwfromm, kevinthesun, kparzysz-quic, Laurawly, masahi, mbrookhart, merrymercy, tqchen, vinx13, yzhliu and ZihengJiang as code owners January 20, 2022 08:21

masahi self-assigned this Jan 20, 2022

tqchen reviewed Jan 20, 2022

View reviewed changes

remove workaround

1c7ef99

This was referenced Jan 27, 2022

[TIR, Relay] improve bfloat16 support yangulei/tvm#2

Closed

[TIR, Relay] improve bfloat16 support #10112

Merged

yangulei force-pushed the upstream_layout branch 2 times, most recently from d023ec1 to aab406b Compare February 17, 2022 09:01

fix lint errors

aab406b

masahi reviewed Feb 21, 2022

View reviewed changes

include/tvm/topi/transform.h Show resolved Hide resolved

masahi reviewed Feb 21, 2022

View reviewed changes

python/tvm/topi/x86/conv2d_alter_op.py Show resolved Hide resolved

masahi approved these changes Feb 21, 2022

View reviewed changes

masahi merged commit 8d76075 into apache:main Feb 21, 2022

masahi mentioned this pull request Feb 21, 2022

[ARM_CPU] Conv2d int8 intrinsic for cortex-A72 #10310

Merged

ganler mentioned this pull request Mar 8, 2022

[Bug] GetStoreRule failure at simple Conv2d + Squeeze model #10528

Open

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIR] add support for multi-blocking layout and their transformation #9996

[TIR] add support for multi-blocking layout and their transformation #9996

yangulei commented Jan 20, 2022

masahi commented Jan 20, 2022

yangulei commented Jan 20, 2022

masahi commented Jan 20, 2022

yangulei commented Jan 20, 2022

masahi commented Jan 20, 2022

tqchen Jan 20, 2022

yangulei Jan 21, 2022

tqchen commented Jan 20, 2022

masahi commented Feb 8, 2022

yangulei commented Feb 10, 2022

masahi commented Feb 10, 2022

yangulei commented Feb 10, 2022

masahi left a comment

[TIR] add support for multi-blocking layout and their transformation #9996

[TIR] add support for multi-blocking layout and their transformation #9996

Conversation

yangulei commented Jan 20, 2022

masahi commented Jan 20, 2022

yangulei commented Jan 20, 2022

masahi commented Jan 20, 2022

yangulei commented Jan 20, 2022

masahi commented Jan 20, 2022

tqchen Jan 20, 2022

Choose a reason for hiding this comment

yangulei Jan 21, 2022

Choose a reason for hiding this comment

tqchen commented Jan 20, 2022

masahi commented Feb 8, 2022

yangulei commented Feb 10, 2022

masahi commented Feb 10, 2022

yangulei commented Feb 10, 2022

masahi left a comment

Choose a reason for hiding this comment