Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MetaSchedule][M3c] XGB-based Cost Model #9859

Merged
merged 4 commits into from
Jan 7, 2022

Conversation

junrushao
Copy link
Member

@junrushao junrushao commented Jan 6, 2022

This PR is part of the stage M3c of the meta schedule project (#8473).

The architecture is re-designed by Junru and Xiyou. In this PR we introduced a XGB-based cost model based on meta schedule's cost model interface. Unittests are included.

Thanks to all co-authors for contributing!

Co-authored-by: Xiyou Zhou <xiyou@octoml.ai>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>

Co-authored-by: Xiyou Zhou <xiyou@octoml.ai>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
@junrushao
Copy link
Member Author

CC: @comaniac @yzhliu @merrymercy

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarification and that makes lots of sense. LGTM.

One potential issue Ansor faced before is that when training data gets bigger and bigger, the time to train the XGBoost cost model becomes tedious even the accuracy isn't further improved. What Ansor has done is simply reduce the re-training frequency (e.g., re-train per 2 rounds) when training data size is larger than a threshold. Other than that, we can also refer to the accuracy between the predicted cost and new measured latencies to determine whether to re-train the model in the next round. These are just my two cents and we could probably revisit this issue in the future.

@junrushao
Copy link
Member Author

@comaniac Thanks for the extremely valuable feedback!

when training data gets bigger and bigger, the time to train the XGBoost cost model becomes tedious even the accuracy isn't further improved

That's exactly what I'm observing too! In this particular case, hyper-parameters of XGB might not be suitable any more, which limits the model capacity, and we might have to tweak around to find out the best hyperparameters.

What Ansor has done is simply reduce the re-training frequency (e.g., re-train per 2 rounds) when training data size is larger than a threshold.

This is how Ansor deals with this right now...We might consider better heuristics in the future, including switching models, tweaking model capacity with AutoML stuff, etc.

we can also refer to the accuracy between the predicted cost and new measured latencies to determine whether to re-train the model in the next round

Using our current interface, this is pretty simple to do so. We have a validate method that allows us to validate the rmse of the cost model's prediction - and I used this method quite frequently in model debugging too.

Anyway, I think we are pretty aligned with the methodology and path to improvement. Let's work together to improve it in the future

Copy link
Member

@zxybazh zxybazh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@junrushao junrushao merged commit 256d170 into apache:main Jan 7, 2022
@junrushao junrushao changed the title [MetaSchedule] XGB-based Cost Model [MetaSchedule][M3c] XGB-based Cost Model Jan 26, 2022
ylc pushed a commit to ylc/tvm that referenced this pull request Feb 16, 2022
* [MetaSchedule] XGB-based Cost Model

Co-authored-by: Xiyou Zhou <xiyou@octoml.ai>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>

* Fix lint

* fix doc

* fix mypy

Co-authored-by: Xiyou Zhou <xiyou@octoml.ai>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants