[Relay] Add `conv2d_backward_weight` op (without topi) #9954

masahi · 2022-01-18T03:56:50Z

This PR adds a Relay op for the gradient of conv2d op with respect to weight (wgrad for short). It is implemented simply using the existing equivalent expressions in

tvm/python/tvm/relay/op/_tensor_grad.py

Lines 428 to 467 in 850abb0

    
           grad = tile(grad, [1, in_channel // attrs.groups, 1, 1]) 
        
           grad = reshape(grad, [-1, 1, 0, 0])  # batch * oc * ic // groups, 1, oh, ow 
        
           data = reshape(data, [1, -1, 0, 0])  # 1, batch * ic, ih, iw 
        
           backward_weight = _nn.conv2d( 
        
               data, 
        
               grad, 
        
               strides=attrs.dilation, 
        
               padding=attrs.padding, 
        
               dilation=attrs.strides, 
        
               groups=in_channel * batch, 
        
           ) 
        
           # infer shape of backward_weight 
        
           padded_weight_grad_h = ( 
        
               in_h - (grad_h - 1) * stride_h - 1 + fpad_top + fpad_bottom 
        
           ) // dilation_h + 1 
        
           padded_weight_grad_w = ( 
        
               in_w - (grad_w - 1) * stride_w - 1 + fpad_left + fpad_right 
        
           ) // dilation_w + 1 
        
           backward_weight = reshape( 
        
               backward_weight, 
        
               [ 
        
                   batch, 
        
                   in_channel // attrs.groups, 
        
                   out_channel, 
        
                   padded_weight_grad_h, 
        
                   padded_weight_grad_w, 
        
               ], 
        
           ) 
        
           backward_weight = _sum(backward_weight, axis=0) 
        
           backward_weight = transpose(backward_weight, [1, 0, 2, 3]) 
        
           assert padded_weight_grad_h >= filter_h 
        
           assert padded_weight_grad_w >= filter_w 
        
           if padded_weight_grad_h > filter_h or padded_weight_grad_w > filter_w: 
        
               backward_weight = strided_slice( 
        
                   backward_weight, 
        
                   begin=[0, 0, 0, 0], 
        
                   end=[out_channel, in_channel // attrs.groups, filter_h, filter_w], 
        
               )

and make it the target for conv2d_backward_weight op legalization. So no topi op has been added for now.

The motivation for introducing this op is threefold:

Both cutlass and cuDNN have a dedicated op for wgrad. Having an op that maps one-to-one makes it easy to offload this op to these backends.
The existing implementation, as a composition of nn.con2d and other ops, works but it is likely to be inefficient since it involves tile(grad, [1, in_channel // attrs.groups, 1, 1]) , a larger group conv2d workload and other post-processing (sum, transpose, slice etc). A direct implementation would likely be much faster.
The third reason is more subtle but this was what necessitated this PR. If I want to use cuDNN or cutlass wgrad with NHWC layout, I'd run convert_layout pass on a backward graph, resulting in.

  %1 = tile(%dy, reps=[1, 4, 1, 1]) /* ty=Tensor[(2, 32, 32, 32), float32] */;
  %2 = reshape(%1, newshape=[-1, 1, 0, 0]) /* ty=Tensor[(64, 1, 32, 32), float32] */;
  %3 = layout_transform(%0, src_layout="NCHW", dst_layout="NHWC") /* ty=Tensor[(1, 32, 32, 8), float32] */;
  %4 = layout_transform(%2, src_layout="OIHW", dst_layout="OHWI") /* ty=Tensor[(64, 32, 32, 1), float32] */;
  %5 = nn.conv2d(%3, %4, padding=[1, 1, 1, 1], groups=8, data_layout="NHWC", kernel_layout="OHWI") /* ty=Tensor[(1, 3, 3, 64), float32] */;
  %6 = layout_transform(%5, src_layout="NHWC", dst_layout="NCHW") /* ty=Tensor[(1, 64, 3, 3), float32] */;
  %7 = reshape(%6, newshape=[2, 4, 8, 3, 3]) /* ty=Tensor[(2, 4, 8, 3, 3), float32] */;
  %8 = sum(%7, axis=[0]) /* ty=Tensor[(4, 8, 3, 3), float32] */;
  transpose(%8, axes=[1, 0, 2, 3]) /* ty=Tensor[(8, 4, 3, 3), float32] */

I cannot pattern match this graph and extract NHWC conv2d wgrad, since layout_transform ops are "too close" to nn.conv2d. I need them to happen before tile and after transpose. So this anti-behavior is the strong reason to want a dedicated wgrad op representation in Relay.

cc @vinx13 @tkonolige @comaniac @YuchenJin

python/tvm/relay/op/nn/_nn.py

comaniac

LGTM

tkonolige

Thanks @masahi!

…rad)

* python plumbing * add cpp def * legalize worked * clean up * layout conversion doesnt work * extract wgrad body * fix convert layout * black * fix kernel size * revert irrelevant change * add doc, clarify the meanings of parameters * update layout convert * test passed * fixed layout conversion * update convert layout * remove print * remove layout convert for now * minor fix * removed unused import * add wgrad python reference * add test stub * add doc * test other stride and pad * tweak * more pylint filter * fix typo in doc * swap arg order (data, grad) to be consistent with conv2d_transpose(dgrad)

masahi requested review from anijain2305, icemelon, jroesch, junrushao, jwfromm, MarisaKirisame, mbrookhart, slyubomirsky, vinx13, wweic, yzhliu, zhiics, ZihengJiang, areusch, comaniac, Huyuwei, jcf94, kevinthesun, Laurawly, merrymercy and tqchen as code owners January 18, 2022 03:56

masahi force-pushed the relay-conv2d-grad branch from 5c2504a to 87c287e Compare January 18, 2022 07:33

masahi assigned vinx13 Jan 18, 2022

masahi force-pushed the relay-conv2d-grad branch 2 times, most recently from 62880c6 to 7d9dca8 Compare January 18, 2022 13:24

comaniac reviewed Jan 18, 2022

View reviewed changes

python/tvm/relay/op/nn/_nn.py Show resolved Hide resolved

comaniac approved these changes Jan 18, 2022

View reviewed changes

tkonolige approved these changes Jan 18, 2022

View reviewed changes

masahi force-pushed the relay-conv2d-grad branch from 7d9dca8 to 387703d Compare January 19, 2022 00:47

python plumbing

efdf5a6

masahi added 22 commits January 19, 2022 17:11

extract wgrad body

7d1c839

fix convert layout

f939c27

black

c349e2a

fix kernel size

ed5a8dd

revert irrelevant change

816529c

add doc, clarify the meanings of parameters

b297d33

update layout convert

123f982

test passed

f8b9273

fixed layout conversion

e911786

update convert layout

3eb1228

remove print

3778c55

remove layout convert for now

7096a4d

minor fix

eacf283

removed unused import

3f41633

add wgrad python reference

eeae52a

add test stub

339f07d

add doc

c4a6b59

test other stride and pad

c77ce29

tweak

41119cd

more pylint filter

3af609a

fix typo in doc

1aad114

swap arg order (data, grad) to be consistent with conv2d_transpose(dg…

f91e9d9

…rad)

masahi force-pushed the relay-conv2d-grad branch from 387703d to f91e9d9 Compare January 19, 2022 08:11

masahi merged commit fd5915a into apache:main Jan 19, 2022

masahi mentioned this pull request Jan 19, 2022

[CUDNN] Support gradient kernels #9986

Merged

Lyken17 mentioned this pull request Mar 2, 2022

Fix gradient OP for nn.conv2d #10439

Closed

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relay] Add `conv2d_backward_weight` op (without topi) #9954

[Relay] Add `conv2d_backward_weight` op (without topi) #9954

masahi commented Jan 18, 2022 •

edited

Loading

comaniac left a comment

tkonolige left a comment

	grad = tile(grad, [1, in_channel // attrs.groups, 1, 1])
	grad = reshape(grad, [-1, 1, 0, 0]) # batch * oc * ic // groups, 1, oh, ow
	data = reshape(data, [1, -1, 0, 0]) # 1, batch * ic, ih, iw

	backward_weight = _nn.conv2d(
	data,
	grad,
	strides=attrs.dilation,
	padding=attrs.padding,
	dilation=attrs.strides,
	groups=in_channel * batch,
	)
	# infer shape of backward_weight
	padded_weight_grad_h = (
	in_h - (grad_h - 1) * stride_h - 1 + fpad_top + fpad_bottom
	) // dilation_h + 1
	padded_weight_grad_w = (
	in_w - (grad_w - 1) * stride_w - 1 + fpad_left + fpad_right
	) // dilation_w + 1
	backward_weight = reshape(
	backward_weight,
	[
	batch,
	in_channel // attrs.groups,
	out_channel,
	padded_weight_grad_h,
	padded_weight_grad_w,
	],
	)
	backward_weight = _sum(backward_weight, axis=0)
	backward_weight = transpose(backward_weight, [1, 0, 2, 3])

	assert padded_weight_grad_h >= filter_h
	assert padded_weight_grad_w >= filter_w
	if padded_weight_grad_h > filter_h or padded_weight_grad_w > filter_w:
	backward_weight = strided_slice(
	backward_weight,
	begin=[0, 0, 0, 0],
	end=[out_channel, in_channel // attrs.groups, filter_h, filter_w],
	)

[Relay] Add conv2d_backward_weight op (without topi) #9954

[Relay] Add conv2d_backward_weight op (without topi) #9954

Conversation

masahi commented Jan 18, 2022 • edited Loading

comaniac left a comment

Choose a reason for hiding this comment

tkonolige left a comment

Choose a reason for hiding this comment

[Relay] Add `conv2d_backward_weight` op (without topi) #9954

[Relay] Add `conv2d_backward_weight` op (without topi) #9954

masahi commented Jan 18, 2022 •

edited

Loading