Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[runtime][Hexagon] AOTExecutor implementation for C Codegen #10311

Merged
merged 2 commits into from
Mar 4, 2022

Conversation

mehrdadh
Copy link
Member

@mehrdadh mehrdadh commented Feb 18, 2022

This is a follow up PR to #10283 which adds hexagon implementation of AOTExecutor.

cc @kparzysz-quic @areusch

Copy link
Contributor

@kparzysz-quic kparzysz-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments.

@@ -82,7 +82,11 @@ void* HexagonDeviceAPIv2::AllocDataSpace(Device dev, size_t nbytes, size_t align
}

void HexagonDeviceAPIv2::FreeDataSpace(Device dev, void* ptr) {
CHECK(TVMDeviceExtType(dev.device_type) == kDLHexagon);
bool device = false;
if ((TVMDeviceExtType(dev.device_type) == kDLHexagon) || (DLDeviceType(dev.device_type) == kDLCPU)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is kDLCPU allowed here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on AOT we currently require device_type == kDLCPU. this hack seems consistent with the previous approach to patch the CPU device with Hexagon. we could add a comment here to explain.

imo we should take up the issue of what device type to actually use here when we address the compilation path for Hexagon code.


#define TVM_METADATA_VERSION 1
static const constexpr int64_t kMetadataVersion = TVM_METADATA_VERSION;
#ifdef __cplusplus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary. This file will not compile as anything other than C++.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in #10283, I moved the #ifdef to exclude any c++ syntax and test_cpp_aot now proves that this file compiles under C. it is my intent that this file can be compiled without the need for C++.

int64_t num_shape;
DLDataType dtype;
};
#ifdef __cplusplus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's discuss above

include/tvm/runtime/metadata.h Outdated Show resolved Hide resolved
include/tvm/runtime/metadata_base.h Outdated Show resolved Hide resolved
include/tvm/runtime/metadata_base.h Outdated Show resolved Hide resolved
python/tvm/contrib/hexagon/hexagon.py Outdated Show resolved Hide resolved
src/relay/backend/aot_executor_codegen.cc Outdated Show resolved Hide resolved
@areusch
Copy link
Contributor

areusch commented Feb 23, 2022

@kparzysz-quic some of these are from the underlying PRs #10283 and #10282 , i'll attempt to address them there and respond here when done.

@mehrdadh mehrdadh force-pushed the hexagon/hexagon_aot_executor branch 5 times, most recently from c3abc27 to f373d8a Compare March 1, 2022 17:11
@mehrdadh mehrdadh force-pushed the hexagon/hexagon_aot_executor branch from f373d8a to a01b389 Compare March 3, 2022 19:02
@mehrdadh mehrdadh changed the title [Draft][runtime][Hexagon] AOTExecutor implementation for C Codegen [runtime][Hexagon] AOTExecutor implementation for C Codegen Mar 3, 2022
@mehrdadh mehrdadh marked this pull request as ready for review March 3, 2022 19:03
@mehrdadh
Copy link
Member Author

mehrdadh commented Mar 3, 2022

@kparzysz-quic PTAL

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM, a minor nit and one question. could address these in a follow-on. i defer to @kparzysz-quic for final approval

src/runtime/hexagon/hexagon/hexagon_device_api_v2.cc Outdated Show resolved Hide resolved
python/tvm/contrib/hexagon/hexagon.py Outdated Show resolved Hide resolved
@mehrdadh mehrdadh requested a review from manupak as a code owner March 3, 2022 21:48
@github-actions github-actions bot requested a review from areusch March 3, 2022 21:48
Copy link
Contributor

@kparzysz-quic kparzysz-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments. Looks good in general.

hexagon_output = graph_mod.get_output(0).numpy()
launcher.stop_server()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea about the with indent is that it starts a session, and leaving the indent should stop the session. This isn't really happening right now because there is no way (that I could see) to close the connection by doing something like rpc.close() in the __exit__ function in Session.
According to that, the stop_server() should happen after the session has been closed, i.e. outside of the indent. That is,

with launcher.start_session() as sess:
    # run, etc.
# getting here should terminate sess
launcher.stop_server()

I noticed this in several places in this file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. I will fix this in a follow up PR if CI stays green. I have another PR which fixes few things to be able to run the tests on hardware nightly

"rpc_server_port" : 7070,
"rpc_tracker_host": tvm_tracker_host,
"rpc_tracker_port": tvm_tracker_port,
"rpc_server_port": 7070,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem I just realized with the RPC server: there is no way that I know of to gracefully shut it down. When stop_server() terminates the server, this leaves the socket (7070) in a TIME_WAIT state. As a consequence, in the next test, start_server() would fail to bind to 7070, which would still be considered "in use", and the test as a whole would fail.

My workaround for that was to use a different server port in each test, i.e. 7070, 7071, etc., at least in the short term, until the connection shutdown can be improved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we still have that issue. I think we should try to fix it before merging your PR on adding the simulator, otherwise it would become flaky and people complain. Have you seen this issue with running on simulator?

Copy link
Contributor

@kparzysz-quic kparzysz-quic Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen it outside of docker. I don't know how to fix it, other than adding a sleep that will wait long enough for the socket to become available again... Using different port numbers is guaranteed to work, but it requires paying attention when writing tests---it would be easy to make a mistake.

Edit: I don't know why I didn't see this issue when I ran it directly (i.e. not in docker). ( ゚o゚)

@masahi masahi merged commit 83f8e54 into apache:main Mar 4, 2022
@mehrdadh mehrdadh deleted the hexagon/hexagon_aot_executor branch March 4, 2022 17:07
ziqiangxu8457 pushed a commit to ziqiangxu8457/tvm that referenced this pull request Mar 6, 2022
…0311)

* Hexagon AOT tests work

* fix and address comments
pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
…0311)

* Hexagon AOT tests work

* fix and address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants