[MetaSchedule] Add MultiLevelTilingTensorCore rule for auto-tensorization on CUDA #12059

vinx13 · 2022-07-11T18:22:00Z

Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>

cc @junrushao1994 @masahi @spectrometerHBH @jinhongyii

…tion on CUDA Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Hongyi Jin <3231950289@qq.com>

masahi

I haven't started looking at multi_level_tiling_tensor_core.cc yet.

How about providing an integration test to demonstrate that auto-tensorization on cuda works now?

include/tvm/meta_schedule/schedule_rule.h

python/tvm/meta_schedule/testing/schedule_rule.py

masahi · 2022-07-12T02:55:49Z

src/meta_schedule/postproc/rewrite_tensorize.cc

+            LOG(WARNING) << "Tensorize failed with error " << e.what();
+          }
+        });
+      } else if (block_name.find("init") && vectorize_init_loop) {


Do we ever hit this condition after your change in rewrite_reduction_block.cc?

To vectorize init loop, should we switch to using tir::attr::meta_schedule_auto_tensorize_init?

In rewrite_reduction_block, tir::attr::meta_schedule_auto_tensorize will be removed from the init block by default, unless the original reduction block is annotated with tir::attr::meta_schedule_auto_tensorize_init. tir::attr::meta_schedule_auto_tensorize_init will be renamed to tir::attr::meta_schedule_auto_tensorize so that in rewrite_tensorize we can check a single annotation. However I hit another issue that block_name.find("init") is not safe. I changed the logic here a bit let me know if that makes sense to you

src/meta_schedule/schedule_rule/multi_level_tiling.h

src/meta_schedule/schedule_rule/multi_level_tiling_tensor_core.cc

python/tvm/meta_schedule/testing/schedule_rule.py

masahi · 2022-07-13T01:22:04Z

python/tvm/meta_schedule/testing/schedule_rule.py

@@ -110,6 +112,32 @@ def multi_level_tiling(target: Target) -> ScheduleRule:
    raise NotImplementedError(f"{target.kind.name} is not supported")


+def multi_level_tiling_tensor_core(
+    target: Target, scope="shared", in_dtype="float16", out_dtype="float32", trans_b=False


Needs doc on what scope is. Or just rename it to reuse_scope or something.

Do read and write always use the same scope?

it's write scope here but I think we also need a read scope param to support different read scopes

python/tvm/tir/tensor_intrin/cuda.py

masahi · 2022-07-13T01:30:35Z

src/meta_schedule/schedule_rule/multi_level_tiling_tensor_core.cc

+  };
+
+  f_tensorize_load(0, "wmma.matrix_a", intrin_group.load_a_intrin);
+  f_tensorize_load(1, "wmma.matrix_b", intrin_group.load_b_intrin);


Can we infer the scope from the provided intrinsic? Otherwise I think we need to associate scope information to intrinsics somehow.

This can be left for future work as long as there is a clear solution. I imagine, we can traverse and examine the intrinsic prim func to extract scope information, at worst.

src/meta_schedule/schedule_rule/multi_level_tiling_tensor_core.cc

src/tir/schedule/analysis.h

…tion on CUDA (apache#12059) * [MetaSchedule] Add MultiLevelTilingTensorCore rule for auto-tensorization on CUDA Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> * address comments * update intrin registrations * fix tests * address comments * add warning when storage align doesn't work * remove print Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Hongyi Jin <3231950289@qq.com>

github-actions bot requested review from junrushao, masahi and spectrometerHBH July 11, 2022 18:22

vinx13 force-pushed the feat/auto-tensorization-tensor-core-upstream branch 5 times, most recently from 2c85011 to 14bd9ee Compare July 11, 2022 21:39

[MetaSchedule] Add MultiLevelTilingTensorCore rule for auto-tensoriza…

8dd2de5

…tion on CUDA Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Hongyi Jin <3231950289@qq.com>

vinx13 force-pushed the feat/auto-tensorization-tensor-core-upstream branch from 14bd9ee to 8dd2de5 Compare July 11, 2022 22:19

masahi reviewed Jul 12, 2022

View reviewed changes

vinx13 force-pushed the feat/auto-tensorization-tensor-core-upstream branch from 8bd7a05 to 806c890 Compare July 12, 2022 19:19

address comments

826a3fe

vinx13 force-pushed the feat/auto-tensorization-tensor-core-upstream branch from 806c890 to 826a3fe Compare July 12, 2022 19:20

update intrin registrations

564bc89

vinx13 force-pushed the feat/auto-tensorization-tensor-core-upstream branch 2 times, most recently from a3472de to b2faca6 Compare July 12, 2022 22:52

fix tests

d27dd7b

vinx13 force-pushed the feat/auto-tensorization-tensor-core-upstream branch from b2faca6 to d27dd7b Compare July 12, 2022 23:53

masahi approved these changes Jul 13, 2022

View reviewed changes

vinx13 force-pushed the feat/auto-tensorization-tensor-core-upstream branch from cde325a to 65bbba6 Compare July 13, 2022 18:04

masahi reviewed Jul 13, 2022

View reviewed changes

src/meta_schedule/schedule_rule/multi_level_tiling_tensor_core.cc Show resolved Hide resolved

masahi reviewed Jul 13, 2022

View reviewed changes

src/tir/schedule/analysis.h Outdated Show resolved Hide resolved

address comments

8b7fc70

vinx13 force-pushed the feat/auto-tensorization-tensor-core-upstream branch 2 times, most recently from f4b585e to 5ad0386 Compare July 13, 2022 21:09

add warning when storage align doesn't work

c4269e7

vinx13 force-pushed the feat/auto-tensorization-tensor-core-upstream branch from 5ad0386 to c4269e7 Compare July 13, 2022 22:28

vinx13 mentioned this pull request Jul 13, 2022

[RFC][Tracking Issue] Meta Schedule (AutoTIR) #8473

Closed

62 tasks

remove print

f5ed84c

vinx13 merged commit e084791 into apache:main Jul 14, 2022

AndrewZhaoLuo mentioned this pull request Oct 4, 2022

TVM v0.10.0.rc0 Release Candidate Notes #12979

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MetaSchedule] Add MultiLevelTilingTensorCore rule for auto-tensorization on CUDA #12059

[MetaSchedule] Add MultiLevelTilingTensorCore rule for auto-tensorization on CUDA #12059

vinx13 commented Jul 11, 2022

masahi left a comment

masahi Jul 12, 2022

vinx13 Jul 12, 2022 •

edited

Loading

masahi Jul 13, 2022

vinx13 Jul 13, 2022

masahi Jul 13, 2022

masahi Jul 13, 2022

[MetaSchedule] Add MultiLevelTilingTensorCore rule for auto-tensorization on CUDA #12059

[MetaSchedule] Add MultiLevelTilingTensorCore rule for auto-tensorization on CUDA #12059

Conversation

vinx13 commented Jul 11, 2022

masahi left a comment

Choose a reason for hiding this comment

masahi Jul 12, 2022

Choose a reason for hiding this comment

vinx13 Jul 12, 2022 • edited Loading

Choose a reason for hiding this comment

masahi Jul 13, 2022

Choose a reason for hiding this comment

vinx13 Jul 13, 2022

Choose a reason for hiding this comment

masahi Jul 13, 2022

Choose a reason for hiding this comment

masahi Jul 13, 2022

Choose a reason for hiding this comment

vinx13 Jul 12, 2022 •

edited

Loading