Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TIR] add support for multi-blocking layout and their transformation #9996

Merged
merged 8 commits into from
Feb 21, 2022

Conversation

yangulei
Copy link
Contributor

Main works in this PR including:

  • add ceildiv() and shapediv() for the cases need padding.
  • add boundary checking in layout transform.
  • add support multi-blocking and tensor padding, include the inference of index and shape.

With those enhancement, a tensor with layout="OIHW" could be transformed to a tensor with layout="OHWI16i64o2i".

@masahi
Copy link
Member

masahi commented Jan 20, 2022

Nice, is it possible to remove custom layout transform in

kernel_IHWO = relay.transpose(kernel_expr, axes=(1, 2, 3, 0))
kernel_IHWOo = relay.reshape(kernel_IHWO, (in_channel, kh, kw, out_channel // oc_bn, oc_bn))
kernel_OHWoI = relay.transpose(kernel_IHWOo, axes=(3, 1, 2, 4, 0))
kernel_OHWoIi = relay.reshape(
kernel_OHWoI, (out_channel // oc_bn, kh, kw, oc_bn, in_channel // ic_bn, ic_bn)
)
kernel_OHWoIie = relay.reshape(
kernel_OHWoIi,
(out_channel // oc_bn, kh, kw, oc_bn, in_channel // ic_bn, ic_bn // n_elems, n_elems),
)
kernel_OIHWioe = relay.transpose(kernel_OHWoIie, axes=(0, 4, 1, 2, 5, 3, 6))
now? See also this recent discussion https://discuss.tvm.apache.org/t/do-we-still-need-relay-nn-contrib-conv2d-nchwc-op/11903

@masahi masahi self-assigned this Jan 20, 2022
@yangulei
Copy link
Contributor Author

@masahi Thanks for your timely comment.
Yes, the code section you referred to is doing a multi-blocking layout transformation, that is exactly what this PR for.
And I don't think we need relay.nn.contrib_conv2d_NCHWc either, since it's just the result of a simple layout transform form a regular conv2d.

@masahi
Copy link
Member

masahi commented Jan 20, 2022

I'm more than happy to see those ad-hoc transform removed, can you do that? I'd also like to remove conv2d_nchwc entirely, but that would be a breaking change and needs further discussion.

@yangulei
Copy link
Contributor Author

Yeah, it's a pleasure to make the code clean and clear. I'll search the code for workarounds about multi-blocking and refine them.

@masahi
Copy link
Member

masahi commented Jan 20, 2022

Sounds great!

I'll search the code for workarounds about multi-blocking and refine them

I think the one we already discussed is the only instance of such a workaround.

* \note this function does eager constant folding for
* shape types(int32, int64) when possible.
*/
TVM_DLL PrimExpr shapediv(PrimExpr a, PrimExpr b, Span span = Span());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name shapediv could be confusing given indexdiv used the other case, how about nonneg_ceildiv?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a kind of symmetry, as indexdiv is an alias of floordiv to prevent access out-of-boundary, and shapediv is an alias of ceildiv to prevent the shrink of a Tensor.
If this is confusing, I prefer to remove indexdiv and shapediv since they are just aliases of floordiv and ceildiv now, or we can keep them and add check codes for non-negative then change their names to nonneg_floordiv/ceildiv.

@tqchen
Copy link
Member

tqchen commented Jan 20, 2022

also cc @yzhliu @comaniac @vinx13

@masahi
Copy link
Member

masahi commented Feb 8, 2022

@yangulei Any update? I'm doing some VNNI stuff, I want to transform like

packedW = te.placeholder((n // 16, 16 * (k // 4), 4), name="packedW", dtype="int8")

@yangulei
Copy link
Contributor Author

@masahi The workaround we talked about has been removed in 1c7ef99.
The transformation you mentioned doesn't seems like blocking only. Could this test written in a way including the transformation like "NK" to "NK16n4k"?

@masahi
Copy link
Member

masahi commented Feb 10, 2022

Yeah I think that's possible. If I understand correctly, the point is to have 16 x 4 loop in the inner-most axis, so something like (n // 16, k // 4, 16, 4) should be ok as well. Does this answer your question?

@yangulei
Copy link
Contributor Author

Yes, the transformation from "NK" to "NK16n4k" is well supported, even without this PR.

@yangulei yangulei force-pushed the upstream_layout branch 2 times, most recently from d023ec1 to aab406b Compare February 17, 2022 09:01
Copy link
Member

@masahi masahi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo one clarifying question

@masahi masahi merged commit 8d76075 into apache:main Feb 21, 2022
pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
…pache#9996)

* add ceildiv and shapediv

* add boundary checking in layout_transform

* support multi-blocking and shape padding

* refine the log for shape transform

* add test for multi-blocking layout transform

* delete unwanted comments

* remove workaround

* fix lint errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants