-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[runtime][Hexagon] AOTExecutor implementation for C Codegen #10311
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments.
@@ -82,7 +82,11 @@ void* HexagonDeviceAPIv2::AllocDataSpace(Device dev, size_t nbytes, size_t align | |||
} | |||
|
|||
void HexagonDeviceAPIv2::FreeDataSpace(Device dev, void* ptr) { | |||
CHECK(TVMDeviceExtType(dev.device_type) == kDLHexagon); | |||
bool device = false; | |||
if ((TVMDeviceExtType(dev.device_type) == kDLHexagon) || (DLDeviceType(dev.device_type) == kDLCPU)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is kDLCPU
allowed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on AOT we currently require device_type == kDLCPU. this hack seems consistent with the previous approach to patch the CPU device with Hexagon. we could add a comment here to explain.
imo we should take up the issue of what device type to actually use here when we address the compilation path for Hexagon code.
include/tvm/runtime/metadata.h
Outdated
|
||
#define TVM_METADATA_VERSION 1 | ||
static const constexpr int64_t kMetadataVersion = TVM_METADATA_VERSION; | ||
#ifdef __cplusplus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary. This file will not compile as anything other than C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in #10283, I moved the #ifdef
to exclude any c++ syntax and test_cpp_aot now proves that this file compiles under C. it is my intent that this file can be compiled without the need for C++.
include/tvm/runtime/metadata.h
Outdated
int64_t num_shape; | ||
DLDataType dtype; | ||
}; | ||
#ifdef __cplusplus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's discuss above
@kparzysz-quic some of these are from the underlying PRs #10283 and #10282 , i'll attempt to address them there and respond here when done. |
743a6bf
to
c4d9c77
Compare
c3abc27
to
f373d8a
Compare
f373d8a
to
a01b389
Compare
@kparzysz-quic PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall LGTM, a minor nit and one question. could address these in a follow-on. i defer to @kparzysz-quic for final approval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments. Looks good in general.
hexagon_output = graph_mod.get_output(0).numpy() | ||
launcher.stop_server() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea about the with
indent is that it starts a session, and leaving the indent should stop the session. This isn't really happening right now because there is no way (that I could see) to close the connection by doing something like rpc.close()
in the __exit__
function in Session
.
According to that, the stop_server()
should happen after the session has been closed, i.e. outside of the indent. That is,
with launcher.start_session() as sess:
# run, etc.
# getting here should terminate sess
launcher.stop_server()
I noticed this in several places in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. I will fix this in a follow up PR if CI stays green. I have another PR which fixes few things to be able to run the tests on hardware nightly
"rpc_server_port" : 7070, | ||
"rpc_tracker_host": tvm_tracker_host, | ||
"rpc_tracker_port": tvm_tracker_port, | ||
"rpc_server_port": 7070, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a problem I just realized with the RPC server: there is no way that I know of to gracefully shut it down. When stop_server()
terminates the server, this leaves the socket (7070) in a TIME_WAIT
state. As a consequence, in the next test, start_server()
would fail to bind to 7070, which would still be considered "in use", and the test as a whole would fail.
My workaround for that was to use a different server port in each test, i.e. 7070, 7071, etc., at least in the short term, until the connection shutdown can be improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we still have that issue. I think we should try to fix it before merging your PR on adding the simulator, otherwise it would become flaky and people complain. Have you seen this issue with running on simulator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen it outside of docker. I don't know how to fix it, other than adding a sleep that will wait long enough for the socket to become available again... Using different port numbers is guaranteed to work, but it requires paying attention when writing tests---it would be easy to make a mistake.
Edit: I don't know why I didn't see this issue when I ran it directly (i.e. not in docker). ( ゚o゚)
…0311) * Hexagon AOT tests work * fix and address comments
…0311) * Hexagon AOT tests work * fix and address comments
This is a follow up PR to #10283 which adds hexagon implementation of AOTExecutor.
cc @kparzysz-quic @areusch