Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrations on arrow-cpp-feedstock run into timeout while rerendering #5815

Open
jdblischak opened this issue Apr 26, 2024 · 21 comments · Fixed by conda-forge/arrow-cpp-feedstock#1375
Labels

Comments

@jdblischak
Copy link
Member

Comment:

The pin for aws-crt-cpp was recently updated to 0.26.8 (#5762), which started the current migration aws_crt_cpp0268. It was successful for tiledb-feedstock (conda-forge/tiledb-feedstock#279), but unfortunately it failed with a rerendering error for arrow-cpp-feedstock. This is causing a problem because tiledb-py-feedstock requires tiledb and pyarrow, which currently generates a solver error (TileDB-Inc/conda-forge-nightly-controller#92 (comment)).

I've never attempted to troubleshoot a migration error for a feedstock that I don't maintain. As a first step, could someone with sufficient access try to restart the migration? Is there anything I can do to help?

@xhochy
Copy link
Member

xhochy commented Apr 26, 2024

The problem is that the rerendering takes longer than 900s. This is a newly added timeout in the bot. We need to make an exception for arrow-cpp in the bot (or wait for the next conda release).

@jdblischak
Copy link
Member Author

The problem is that the rerendering takes longer than 900s. This is a newly added timeout in the bot.

Can I manually apply the migration locally and send a PR to arrow-cpp-feedstock? Or would that interfere with the migration?

@xhochy
Copy link
Member

xhochy commented Apr 26, 2024

It should solve your problem, but long-term the bot needs the timeout fix.

@jdblischak
Copy link
Member Author

It should solve your problem, but long-term the bot needs the timeout fix.

Done in conda-forge/arrow-cpp-feedstock#1375

@h-vetinari
Copy link
Member

The problem is that the rerendering takes longer than 900s. This is a newly added timeout in the bot.

When did this get added? It's too tight for several feedstocks - IMO it needs to be bumped ASAP. CC @beckermr

Fortunately, a bunch of speedups landed in the upcoming conda-build 24.5, but we don't have that one available yet.

@beckermr
Copy link
Member

That timeout has been in place for several years. I did some refactoring but idk what would have changed to make a difference like this.

@h-vetinari
Copy link
Member

It will effectively break all migrations from working on the arrow-feedstock, which has a disproportionate impact because we have so many deps there that need migrating, as well as 4 maintenance branches. Basically, that feedstock is not workable without the bot opening these PRs, so we need to fix this real soon, as its going to cause logjams in several places otherwise. A timeout exception for arrow is probably the easiest. Or we could let the bot run on an unreleased conda-build (that we could build into a label). Or investigate what recent refactors regressed...

@beckermr
Copy link
Member

The timeout has been there several years and nothing was broken. I honestly don't get what's suddenly changed or regressed.

@beckermr
Copy link
Member

I just checked the pull requests for arrow-feedstock and I cannot find a PR from the bot for several years at least. Maybe I messed up checking?

Otherwise, this matches the fact that the bot has been enforcing a timeout for many years on long rerenders.

What is suddenly going to break for arrow-feedstock now when it never got prs anyways?

@h-vetinari
Copy link
Member

I just checked the pull requests for arrow-feedstock and I cannot find a PR from the bot for several years at least.

I don't know how you searched, but we've been getting 10+ PRs per month. Last one was April 23rd 02:55 UTC.

I honestly don't get what's suddenly changed or regressed.

Arrow has been getting long rerender times for a while (and once @mbargull got aware of this, he opened conda/conda-build#5224 and tackled some bottlenecks; the result of that is going to show up in conda-build 24.5). It's possible that we were close to the timeout already, and that some minor regression pushed us from, say, ~850sec over 900.

@beckermr
Copy link
Member

Those prs are for arrow-cpp, not arrow.

You mentioned arrow above, right?

@beckermr
Copy link
Member

It will effectively break all migrations from working on the arrow-feedstock, which has a disproportionate impact because we have so many deps there that need migrating, as well as 4 maintenance branches. Basically, that feedstock is not workable without the bot opening these PRs,

^ This is what is was responding too.

@h-vetinari
Copy link
Member

Yes, sorry, that was an inaccuracy on my part. There is a (IMO) much less important arrow library for timestamps, but I was referring to apache/arrow, the "composable data stack" behemoth that's eating the world (in a nice way), which for us is built on https://github.com/conda-forge/arrow-cpp-feedstock (various libs & python bindings)

@beckermr
Copy link
Member

Ahhhhh that makes more sense.

On the timeout itself, the bot enforces a timeout due to very long rerenders maing it slower than it already is. The 15 minute (=900 second) timeout is an order of magnitude or more longer than the typical feedstock rerendering time. I get that this is frustrating.

However, looking at the arrow-cpp recipe (which I have not spend much time on at all), it appears to be nearly a dozen packages in one. Splitting that feedstock up into more than one would potentially have other benefits in addition to easing the burden on our already overstressed tools.

@jdblischak
Copy link
Member Author

The timeout has been there several years and nothing was broken. I honestly don't get what's suddenly changed or regressed.

To share my anecdotal experience, I just manually applied this migration to the 14.x, 13.x, and 12.x branches. Rerendering the 12.x and 13.x branches was relatively quick (they finished before 14.x even though I started them afterwards). 14.x took a lot longer, and obviously main took the longest (15+ min). So I don't think the recent large increase in rerendering time is solely due to changes to conda-smithy. I assume that something about the structure of the feedstock changed between 13.x and 14.x, and has only worsened the rerendering speed over time. Note that I am using the latest conda-smithy release (3.34.1).

@h-vetinari
Copy link
Member

I didn't notice that you had marked this issue as closed in conda-forge/arrow-cpp-feedstock#1375, I'm going to reopen it, as the underlying problem hasn't been fixed.

@h-vetinari h-vetinari reopened this May 3, 2024
@h-vetinari h-vetinari changed the title aws-crt-cpp 0.26.8 migration has stalled with rerendering error for arrow-cpp-feedstock migrations on arrow-cpp-feedstock run into timeout while rerendering May 3, 2024
@h-vetinari
Copy link
Member

Splitting that feedstock up into more than one would potentially have other benefits in addition to easing the burden on our already overstressed tools.

I opened an RFC for that: conda-forge/arrow-cpp-feedstock#1381

@xhochy
Copy link
Member

xhochy commented May 11, 2024

Migrations have been coming again for arrow-cpp, so closing this.

@xhochy xhochy closed this as completed May 11, 2024
@h-vetinari
Copy link
Member

h-vetinari commented May 11, 2024

Really? I haven't seen a bot PR yet. All the recent migrations were done by hand.

@h-vetinari h-vetinari reopened this May 11, 2024
@xhochy
Copy link
Member

xhochy commented May 11, 2024

Ah! I only have seen migrations and thought you wanted to see them happen automatically. I'll look into this next week (when I fix the aws-sdk-cpp release)

@h-vetinari
Copy link
Member

h-vetinari commented May 14, 2024

It's possible that the performance improvements in conda-build 24.5 unblocked this. There are new PRs for main (much reduced after splitting of pyarrow bits) and v15.x / v14.x (still big to rerender) and below:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants