Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TIR][VM] Revert a change to lower_tvm_builtin.cc from #6126 #8274

Merged
merged 2 commits into from
Sep 9, 2021

Conversation

mbrookhart
Copy link
Contributor

This change is causing a regression in tests/python/fronend/onnx/test_forward.py:test_loop, and causes the generated IR for the shape function to be very different.

I'm really, really confused why this didn't fail in CI, it fails in every local setup and CI docker image we've tried outside of CI.

cc @jwfromm @tmoreau89 @tqchen @zhanghaohit

this PR:

allocate(v_copy_shape_func: Pointer(int64), int64, [1]) {
  attr [0] "extern_scope" = 0 {
    v_expand_dim_shape_func: Pointer(int64)[0] = 1i64
    v_expand_dim_shape_func[1] = (int64*)v_copy_shape_func[0]
  }
  attr [0] "extern_scope" = 0 {
    v_concatenate_shape_func: Pointer(int64)[0] = 0i64
    v_concatenate_shape_func[0] = ((int64*)v_concatenate_shape_func[0] + (int64*)placeholder: Pointer(int64)[0])
    v_concatenate_shape_func[0] = ((int64*)v_concatenate_shape_func[0] + (int64*)v_expand_dim_shape_func[0])
    v_concatenate_shape_func[1] = (int64*)placeholder[1]
    assert(((int64*)v_concatenate_shape_func[1] == (int64*)v_expand_dim_shape_func[1]), "Dims mismatch in the inputs of concatenate.")
    0
  }
}

current main:

attr [v_expand_dim_shape_func: Pointer(int64)] "storage_alignment" = 128 {
  let v_expand_dim_shape_func = @tir.TVMBackendAllocWorkspace(1, dev_id: int32, 16u64, 0, 64, dtype=handle)
   {
    if @tir.isnullptr(v_expand_dim_shape_func, dtype=bool) {
      @tir.tvm_throw_last_error(, dtype=int32)
    }
    attr [v_copy_shape_func: Pointer(int64)] "storage_scope" = "global";
    attr [v_copy_shape_func] "storage_alignment" = 128 {
      let v_copy_shape_func = @tir.TVMBackendAllocWorkspace(1, dev_id, 8u64, 0, 64, dtype=handle)
       {
        if @tir.isnullptr(v_copy_shape_func, dtype=bool) {
          @tir.tvm_throw_last_error(, dtype=int32)
        }
         {
          attr [0] "extern_scope" = 0 {
            v_expand_dim_shape_func[0] = 1i64
            v_expand_dim_shape_func[1] = (int64*)v_copy_shape_func[0]
          }
          attr [0] "extern_scope" = 0 {
            v_concatenate_shape_func: Pointer(int64)[0] = 0i64
            v_concatenate_shape_func[0] = ((int64*)v_concatenate_shape_func[0] + (int64*)placeholder: Pointer(int64)[0])
            v_concatenate_shape_func[0] = ((int64*)v_concatenate_shape_func[0] + (int64*)v_expand_dim_shape_func[0])
            v_concatenate_shape_func[1] = (int64*)placeholder[1]
            assert(((int64*)v_concatenate_shape_func[1] == (int64*)v_expand_dim_shape_func[1]), "Dims mismatch in the inputs of concatenate.")
            0
          }
        }
      }
      if (@tir.TVMBackendFreeWorkspace(1, dev_id, v_copy_shape_func, dtype=int32) != 0) {
        @tir.tvm_throw_last_error(, dtype=int32)
      }
    }
  }
  if (@tir.TVMBackendFreeWorkspace(1, dev_id, v_expand_dim_shape_func, dtype=int32) != 0) {
    @tir.tvm_throw_last_error(, dtype=int32)
  }
}

@mbrookhart mbrookhart changed the title revert a change to lower_tvm_builtin.cc from #6126 [TIR][VM] Revert a change to lower_tvm_builtin.cc from #6126 Jun 17, 2021
@jwfromm
Copy link
Contributor

jwfromm commented Jun 17, 2021

I guess unsurprisingly this revision causes the error that lead to the code being removed in #6126. We'll have to either fix it a different way or temporarily disable that portion of VTA testing.

@tmoreau89
Copy link
Contributor

tmoreau89 commented Jun 17, 2021

We'll want to not disable the VTA tests because that backend will quickly bitrot.

At this point, we can either:
(1) Revert #6126 but that is less than desirable
(2) Identify a fix that resolves the tests/python/fronend/onnx/test_forward.py:test_loop and VTA unit test - but this can take a couple days

@tmoreau89
Copy link
Contributor

I tried to reproduce the failing VTA test with the changes applied by @mbrookhart but I wasn't able to reproduce the error https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-8274/1/pipeline#step-458-log-1910. To be continued.

@zhanghaohit
Copy link
Contributor

zhanghaohit commented Jun 18, 2021

In my option, even though this revert makes tests/python/fronend/onnx/test_forward.py:test_loop pass, there may be a hidden bug for the vm runtime/other implementation. What if the condition if (constant_size > 0 && constant_size * nbytes < runtime::kMaxStackAlloca) is not true, it will generate the same IR as the current main.

I admit that the removal of the following optimization:

    if (device_type_.defined()) {
      if (const auto* dev_type = device_type_.as<IntImmNode>()) {
        if (dev_type->value == kDLCPU) {
          int32_t constant_size = op->constant_allocation_size();
          if (constant_size > 0 && constant_size * nbytes < runtime::kMaxStackAlloca) {
            return stmt;
          }
        }
      }
    }

may cause some performance regressions. But this optimization will raise LLVM function signature errors when there are multiple targets (For now, I cannot find an alternative fix to make this work).

So there may be two bugs in the current codebase:

  • the VM runtime (also maybe other parts) cannot handle the IR generated by the current main (e.g., if constant_size > runtime::kMaxStackAlloca)
  • when there are multiple targets, the generated IR / LLVM code with this PR is not correct.

If this problem blocks something urgent to merge, I think we can do a quick fix first (choose either one of the following two) and open a bug issue for further fix:

For the two hidden bugs, I think it takes some time to find and fix them.

What do you think?

@mbrookhart
Copy link
Contributor Author

ping @tmoreau89 @zhanghaohit any update on this?

@mbrookhart
Copy link
Contributor Author

@tmoreau89 @zhanghaohit

I took this suggestion:

to fix the VTA unittest, we can remove the multiple targets test in deploy_classification.py. Just remove the "sim" in these three lines: deploy_classification.py#L194, deploy_classification.py#L206
deploy_classification.py#L224

To disable the tutorial.

@mbrookhart mbrookhart force-pushed the revert_lower_tvm_builtin_change branch from d902c54 to b0f8c02 Compare July 29, 2021 21:31
@mbrookhart
Copy link
Contributor Author

@tmoreau89 @zhanghaohit those comments did not, in fact, prevent the tutorial from being run, so this still failed. I'm fixing the CI issues onnx tests today, and this is now blocking that. Any thoughts on how to proceed?

@zhanghaohit
Copy link
Contributor

See this: #8643

@tmoreau89 @zhanghaohit those comments did not, in fact, prevent the tutorial from being run, so this still failed. I'm fixing the CI issues onnx tests today, and this is now blocking that. Any thoughts on how to proceed?

@zhanghaohit
Copy link
Contributor

@tmoreau89 @zhanghaohit those comments did not, in fact, prevent the tutorial from being run, so this still failed. I'm fixing the CI issues onnx tests today, and this is now blocking that. Any thoughts on how to proceed?

@mbrookhart I think maybe you apply the changes from this PR (#8643) to see whether it works, since we already discuss a lot here. I will delete #8643 later.

@mbrookhart mbrookhart force-pushed the revert_lower_tvm_builtin_change branch 4 times, most recently from 7fed7a3 to ba3d6e8 Compare August 19, 2021 21:34
@mbrookhart mbrookhart force-pushed the revert_lower_tvm_builtin_change branch 4 times, most recently from 1a60541 to 060c72e Compare August 31, 2021 19:46
@mbrookhart mbrookhart force-pushed the revert_lower_tvm_builtin_change branch 2 times, most recently from b102df8 to 2172b82 Compare September 3, 2021 19:59
@mbrookhart mbrookhart force-pushed the revert_lower_tvm_builtin_change branch from 2172b82 to 56331a4 Compare September 7, 2021 16:49
@tmoreau89 tmoreau89 merged commit 90676c4 into apache:main Sep 9, 2021
@mbrookhart mbrookhart deleted the revert_lower_tvm_builtin_change branch September 9, 2021 15:50
@tmoreau89
Copy link
Contributor

Thanks @mbrookhart and @zhanghaohit the fix has been merged.

@tmoreau89
Copy link
Contributor

@zhanghaohit would you be able to create an issue that tracks the two hidden bugs that require deeper investigation?

@zhanghaohit
Copy link
Contributor

@zhanghaohit would you be able to create an issue that tracks the two hidden bugs that require deeper investigation?

Sure. I create two issues: #8977 #8978

@@ -113,6 +113,16 @@ class BuiltinLower : public StmtExprMutator {
op = stmt.as<AllocateNode>();
// Get constant allocation bound.
int64_t nbytes = GetVectorBytes(op->dtype);
if (device_type_.defined()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @mbrookhart

Can you explain the rationale for not generating TVMBAW calls for static size allocates for DLCPU ?

cc: @grant-arm @Mousius @mbaret

@manupak
Copy link
Contributor

manupak commented Sep 14, 2021

@tmoreau89 @areusch @mbrookhart ,

IIUC,

This is problematic for micro because now the constant sized allocates are forced to be placed on stack (bypassing TVMPlatformAllocate abstraction) because that is how the codegen_c lowers the allocate.

void CodeGenC::VisitStmt_(const AllocateNode* op) {
ICHECK(!is_zero(op->condition));
std::string vid = AllocVarID(op->buffer_var.get());
this->PrintIndent();
int32_t constant_size = op->constant_allocation_size();
ICHECK_GT(constant_size, 0) << "Can only handle constant size stack allocation for now";
auto scope = GetPtrStorageScope(op->buffer_var);
alloc_storage_scope_[op->buffer_var.get()] = scope;
PrintStorageScope(scope, stream);
PrintType(op->dtype, stream);
stream << ' ' << vid << '[' << constant_size << "];\n";
RegisterHandleType(op->buffer_var.get(), op->dtype);
this->PrintStmt(op->body);
}

I dont think is desired.

@mbrookhart
Copy link
Contributor Author

Hi @manupa-arm I think we have some conflict between what the VM needs to run on CPU and what microprocessors need. Can we talk about this on the issues @zhanghaohit created #8977 #8978?

I think @jroesch might have some thoughts on separating the needs.

mbrookhart pushed a commit to mbrookhart/tvm that referenced this pull request Sep 15, 2021
jroesch pushed a commit that referenced this pull request Sep 16, 2021
* enable the onnx tests after PR #8274 merged

* fix lint
AndrewZhaoLuo added a commit to AndrewZhaoLuo/tvm that referenced this pull request Sep 16, 2021
* main: (102 commits)
  Implementation of relay_to_tir target hook (apache#8423)
  [Onnx] Fix NLL Loss tests (apache#8971)
  [Bugfix] Fix other div zero errors also in rewrite_simplify (apache#8983)
  [ONNX] enable the onnx tests after PR apache#8274 merged (apache#9019)
  [Hexagon] Disable `thread_local` on Hexagon (apache#9025)
  [Hexagon] Allow undefined symbols in libtvm_runtime.so on Hexagon (apache#9024)
  [Onnx] Add momentum (apache#9000)
  fix (apache#9021)
  [Community] @AndrewZhaoLuo -> Reviewer (apache#9020)
  [Hexagon] Implement model launcher (apache#8986)
  [Relay][Pass] Add ExtractOperators pass (apache#8996)
  [BYOC][TensorRT] Add TensorRT own int8 calibration support to TensorRT BYOC integration (apache#8808)
  [ONNX] Add Einsum converter (apache#8985)
  Add standalone_crt/ to be part of the wheel package, when available. (apache#9005)
  [Relay] Remove memory planing from LowerTEPass  (apache#8974)
  [Hexagon] Treat floats as float32 when passing args to offloaded kernels (apache#9010)
  [Runtime] Pipeline Executor Initial patch. (apache#8702)
  [Hexagon] `llvm-options` attribute is an array of strings (apache#9011)
  disable cuda int8 schedule for non-cuda gpu target (apache#9014)
  [Torch] Add an option to make imported models compatible with the Relay text parser (apache#9015)
  ...
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
…pache#8274)

* revert a change to lower_tvm_builtin.cc from apache#6126

* disable sim target on VTA tutorial

fix bad refactor

try again
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
* enable the onnx tests after PR apache#8274 merged

* fix lint
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
…pache#8274)

* revert a change to lower_tvm_builtin.cc from apache#6126

* disable sim target on VTA tutorial

fix bad refactor

try again
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
* enable the onnx tests after PR apache#8274 merged

* fix lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants