[ci] Add weights job #13316

alvicsam · 2023-02-06T10:50:43Z

PR adds job that runs benchmarks and creates artifacts with git diff. The job runs on new GCP runners, the goal is to deprecate bm* machines. Weights generation is currently in progress

https://github.com/paritytech/ci_cd/issues/697
https://github.com/paritytech/ci_cd/issues/733

@mordamax @athei

mordamax · 2023-02-06T14:50:19Z

@alvicsam worth to re-generate weights in same PR?

alvicsam · 2023-02-06T14:53:28Z

Sure, I ran the new job so now I'm waiting for the new weights

athei · 2023-02-06T21:02:20Z

How would that pipeline be triggered? Is it manually?

mordamax · 2023-02-06T21:41:14Z

How would that pipeline be triggered? Is it manually?

very soon, I hope, you'll be able to run them through command bot too. bot bench $ all something like this
But today/now yes - through gitlab manually

athei · 2023-02-06T21:47:39Z

That is fine! Just wanted to make sure it is not run on every commit or something 😄

athei · 2023-02-07T01:03:36Z

I pushed the weight results. Is it expected that the weights get worse by that much?

This also doesn't look promising:

alvicsam · 2023-02-07T08:32:08Z

How would that pipeline be triggered? Is it manually?

Yes, the job is manual and available in every PR and commit to be run.

I pushed the weight results. Is it expected that the weights get worse by that much?

It can be, afaiu consistency matters. We can run benchmarks one more time and compare results of the runs on new runners. However, results on polkadot and cumulus were consistent.

@mordamax can we run only some of the benchmarks with the bot in substrate? Where bot will run the benchmark (I mean on which runner)?

This also doesn't look promising

@oleg-plakida can tell more about this benchmark. AFAIR the results weren't consistent and different glibc version could change the output

athei · 2023-02-07T11:37:37Z

It can be, afaiu consistency matters.

So how do we continue here? This doesn't sound very confident.

We can run benchmarks one more time and compare results of the runs on new runners.

What good can come from this?

If it is the same numbers: Why does a supposedly faster machine perform 20% worse in many benchmarks?

If it is completely different numbers: The numbers are not consistent.

However, results on polkadot and cumulus were consistent.

That dosn't help us here in substrate. I suggest reverting the bot to the old runners until this is figured out. This is blocking all of my PRs.

alvicsam · 2023-02-07T11:52:04Z

So how do we continue here? This doesn't sound very confident.

Results are consistent in polkadot and cumulus. I see no reason why they won't be consistent here. If you have doubts I suggested options for confirming the stability of the results.

What good can come from this?

The machines are generated dynamically so several ppl can run benchmarks in the same time in different PRs. Also it's possible to parallelise benchmarks in the future.

If it is the same numbers: Why does a supposedly faster machine perform 20% worse in many benchmarks?

Because benchmarks run only on one core. And core frequency on new runners is lower than on the old ones.

athei · 2023-02-07T12:42:24Z

If you have doubts I suggested options for confirming the stability of the results.

Okay let's re-run then. I tried to trigger. I hope I did it the right way:
https://gitlab.parity.io/parity/mirrors/substrate/-/jobs/2358232

What good can come from this?

The machines are generated dynamically so several ppl can run benchmarks in the same time in different PRs. Also it's possible to parallelise benchmarks in the future.

Okay this might be a misunderstanding. I didn't mean from the whole endeavour. I understand that bare metal machines are annoying for you to manage. I meant from re-running the benchmarks.

mateo-moon · 2023-02-07T12:44:57Z

I pushed the weight results. Is it expected that the weights get worse by that much?

This also doesn't look promising:

I've ran tests with benchmark machine on different setups and machines. The results i got show that this particular benchnmarking approach isn't correct and tell nothing about real runtime performance of pallets. And there are 2 different evidence of this:

I've got better results in benchmark machine but worse result for benchmark pallet for "AMD Milan" and vise versa for "Intel Ice Lake". So we can switch the machine CPU to "AMD Milan" and you will have results like in the screenshot, but performance of real test will be worse. In my opinion this is enough to say that benchmark machine results literally says nothing about final performance of the runtime node. But there is also second point:
The performance of this result tigthly coupled with libc library version of the particular environment where node will be ran. I've got different results with different libc libraries. That means that results will be different from host to host unless there are completely identical. But this doesn't matter, because if result of the measure is dependent on some external parameter and you can't garant consistency of this parameter the result is also can't be trusted.

athei · 2023-02-07T12:48:58Z

@oleg-plakida So I guess we should just remove this then?

mateo-moon · 2023-02-07T12:51:25Z

@oleg-plakida So I guess we should just remove this then?

The command or our concern?

athei · 2023-02-07T12:52:15Z

This micro benchmark which is not reflective of the actual performance we are interested in.

mateo-moon · 2023-02-07T13:01:35Z

This micro benchmark which is not reflective of the actual performance we are interested in.

I would say that it would be nice to have benchmark like this, and i suppose that was the idea at the beginning, but we shouldn't trust it a lot right now at least until we bring it to the consistency. And i assume it's hard challenge. But your question is realy interesting. Does anyone really use this benchmark for node testing!?

athei · 2023-02-07T13:09:41Z

Once the benchmarks are finished I will make a new PR into this one with them. This way we can use the weight UI to make a comparison. Please don't commit them to this PR.

mateo-moon · 2023-02-07T13:10:09Z

But i suppose consistency the only matters for us. The setup which is used for measuring and not the performance of the setup. As long as we compare result produced in the the reference set up we can measure code performance.

athei · 2023-02-08T04:36:52Z

Committed the new weights here: #13336

pallet-contracts just went up by 50% for some benchmarks. This never happened on the old machines.

ggwpez · 2023-02-08T12:39:19Z

The benchmark machine is not updated yet for ths new specs, so it will report as failing: #13317

The single-threaded CPU speed is expected to be slower than old ref hardware because it is using cloud VM which have server CPUs.
With faster disk speed we should get a bit cheaper read&writes to make up for it.

Our current goal is indeed to have consistent results, so lets re-run them a few times.

athei · 2023-02-08T12:48:24Z

@ggwpez Have you checked the link above? We ran gitlab-update_substrate_weights twice. We should get consistent results, right?

ggwpez · 2023-02-08T12:53:45Z

@ggwpez Have you checked the link above? We ran gitlab-update_substrate_weights twice. We should get consistent results, right?

Ah, thought you compared against master 🤦‍♂️
Going to double-check the consistency on old bm2.

ggwpez · 2023-02-08T15:06:39Z

I re-run the worst offender instr_i64add a dozen times on bm2 and it always fluctuated between 0.9 - 1.1 µs base and consistent 756 ns component.
The change was probably from some rust version update or other change, so the new weights for that one are good.
We could also re-run all on bm2 bm* with the env overwrite.

athei · 2023-02-08T15:27:55Z

I re-run the worst offender instr_i64add a dozen times on bm2 and it always fluctuated between 0.9 - 1.1 µs base and consistent 756 ns component.
The change was probably from some rust version update or other change, so the new weights for that one are good.
We could also re-run all on bm2 bm* with the env overwrite.

Sorry I am confused now. There was no version update between the two runs we did in this PR. You think we should merge this PR as-is?

ggwpez · 2023-02-08T16:24:47Z

Sorry I am confused now. There was no version update between the two runs we did in this PR. You think we should merge this PR as-is?

I tested it on master, so there was an update since then which now pollutes these results. But that wont explain inconsistencies here, yea.
Can you use #13336 (comment) in the meantime?
Then we can still re-run it a few times just to be sure.

athei · 2023-02-08T16:34:40Z

I tested it on master, so there was an update since then which now pollutes these results.

Update to rustc? Wasmi is sometimes really senstive to those. It relies on some things being a tail call and rust gives no guarentee for that :(. @Robbepop Is that still the case. I remember faintly that you found a workaround for this.

Can you use #13336 (comment) in the meantime?

Yeah. @alvicsam is trying to get it running here: #13268

Robbepop · 2023-02-08T16:41:47Z

Update to rustc? Wasmi is sometimes really senstive to those. It relies on some things being a tail call and rust gives no guarentee for that :(. @Robbepop Is that still the case. I remember faintly that you found a workaround for this.

Yes, wasmi is still super sensitive to changes such as used rustc and LLVM versions. Unfortunately it is not possible to fix this reliably without Rust support for guaranteed tail calls which are not a thing in Rust, yet. Until then the best we can hope for is that future updates won't degrade the performance without a way to fix it. Fingers crossed.

The good thing is that the Rust devs take performance regressions very seriously. The bad thing is that so far I always had to dig out myself when they popped up. Rustc issues are still open and nobody working on them although having high priority.

alvicsam · 2023-02-10T16:00:39Z

bot help

command-bot · 2023-02-10T16:00:43Z

Here's a link to docs

[ci] Add weights job

fb559dd

alvicsam requested a review from a team as a code owner February 6, 2023 10:50

github-actions bot added the A0-please_review Pull request needs code review. label Feb 6, 2023

fix job name

86cbd69

alvicsam added B0-silent Changes should not be mentioned in any release notes A3-in_progress and removed A0-please_review Pull request needs code review. labels Feb 6, 2023

Add new weight baseline

33805dd

athei requested review from kianenigma and athei as code owners February 7, 2023 00:57

This was referenced Feb 7, 2023

[contracts] make debug_message execution outcome invariant to node debug logging setting #13197

Merged

contracts: Use proof_size from benchmarks #13268

Merged

alvicsam added A0-please_review Pull request needs code review. A0-please_review and removed A3-in_progress A0-please_review Pull request needs code review. labels Feb 7, 2023

alvicsam mentioned this pull request Feb 8, 2023

As weights gcp2 #13339

Closed

ggwpez mentioned this pull request Feb 8, 2023

Re-run weight baseline #13336

Closed

the-right-joyce added A0-please_review Pull request needs code review. C1-low PR touches the given topic and has a low impact on builders. and removed A0-please_review labels Feb 13, 2023

alvicsam closed this Feb 16, 2023

alvicsam mentioned this pull request Feb 17, 2023

Update hardware requirements for benchmark machine #13308

Closed

athei deleted the as-weights-gcp branch March 10, 2023 11:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ci] Add weights job #13316

[ci] Add weights job #13316

alvicsam commented Feb 6, 2023 •

edited

Loading

mordamax commented Feb 6, 2023

alvicsam commented Feb 6, 2023

athei commented Feb 6, 2023

mordamax commented Feb 6, 2023 •

edited

Loading

athei commented Feb 6, 2023

athei commented Feb 7, 2023

alvicsam commented Feb 7, 2023

athei commented Feb 7, 2023 •

edited

Loading

alvicsam commented Feb 7, 2023 •

edited

Loading

athei commented Feb 7, 2023

mateo-moon commented Feb 7, 2023 •

edited

Loading

athei commented Feb 7, 2023

mateo-moon commented Feb 7, 2023

athei commented Feb 7, 2023

mateo-moon commented Feb 7, 2023

athei commented Feb 7, 2023

mateo-moon commented Feb 7, 2023

athei commented Feb 8, 2023

ggwpez commented Feb 8, 2023 •

edited

Loading

athei commented Feb 8, 2023

ggwpez commented Feb 8, 2023

ggwpez commented Feb 8, 2023

athei commented Feb 8, 2023

ggwpez commented Feb 8, 2023

athei commented Feb 8, 2023

Robbepop commented Feb 8, 2023 •

edited

Loading

alvicsam commented Feb 10, 2023

command-bot bot commented Feb 10, 2023

[ci] Add weights job #13316

[ci] Add weights job #13316

Conversation

alvicsam commented Feb 6, 2023 • edited Loading

mordamax commented Feb 6, 2023

alvicsam commented Feb 6, 2023

athei commented Feb 6, 2023

mordamax commented Feb 6, 2023 • edited Loading

athei commented Feb 6, 2023

athei commented Feb 7, 2023

alvicsam commented Feb 7, 2023

athei commented Feb 7, 2023 • edited Loading

alvicsam commented Feb 7, 2023 • edited Loading

athei commented Feb 7, 2023

mateo-moon commented Feb 7, 2023 • edited Loading

athei commented Feb 7, 2023

mateo-moon commented Feb 7, 2023

athei commented Feb 7, 2023

mateo-moon commented Feb 7, 2023

athei commented Feb 7, 2023

mateo-moon commented Feb 7, 2023

athei commented Feb 8, 2023

ggwpez commented Feb 8, 2023 • edited Loading

athei commented Feb 8, 2023

ggwpez commented Feb 8, 2023

ggwpez commented Feb 8, 2023

athei commented Feb 8, 2023

ggwpez commented Feb 8, 2023

athei commented Feb 8, 2023

Robbepop commented Feb 8, 2023 • edited Loading

alvicsam commented Feb 10, 2023

command-bot bot commented Feb 10, 2023

alvicsam commented Feb 6, 2023 •

edited

Loading

mordamax commented Feb 6, 2023 •

edited

Loading

athei commented Feb 7, 2023 •

edited

Loading

alvicsam commented Feb 7, 2023 •

edited

Loading

mateo-moon commented Feb 7, 2023 •

edited

Loading

ggwpez commented Feb 8, 2023 •

edited

Loading

Robbepop commented Feb 8, 2023 •

edited

Loading