Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore getting the pipeline completing again by adjusting settings and runtime parameters #349

Open
kltm opened this issue Dec 12, 2023 · 3 comments

Comments

@kltm
Copy link
Member

kltm commented Dec 12, 2023

We are currently having issues with regularly and quickly getting full-data runs out of the pipeline. This is seriously affecting snapshot and release.

As a bandaid to more long-term solutions (like pipeline refactoring and hardware purchasing), we're going to briefly experiment with limiting pipeline bandwidth (number of "workers") and increasing runtime resources for various parts.

This is a partial response to #316

Sending notice to @mugitty @sierra-moxon @dustine32

Tagging @pgaudet

@kltm
Copy link
Member Author

kltm commented Dec 12, 2023

On a console review, I'm noticing a lot of late errors like:

    03:27:00  + rsync -avz -e ssh -o StrictHostKeyChecking=no -o IdentitiesOnly=true -o IdentityFile=**** /opt/go-site/pipeline/target/blazegraph-production.jnl.gz skyhook@skyhook.berkeleybop.org:/home/skyhook/snapshot/products/blazegraph/
    03:27:00  sending incremental file list
    03:27:15  blazegraph-production.jnl.gz
    03:27:15  deflate on token returned 0 (21379 bytes left)
    03:27:15  rsync error: error in rsync protocol data stream (code 12) at token.c(481) [sender=3.2.7]

I'm going to try scp instead of rsync here for a bit.

(Noting that the internet says things like "need rsync on target", "need full path on target", and "need full path on ssh bin"; none really explain why it is intermittent.)

@kltm
Copy link
Member Author

kltm commented Dec 14, 2023

Shockingly got a pass here--not sure if changes or lucky. Try again with same set on release.

kltm added a commit that referenced this issue Dec 14, 2023
kltm added a commit that referenced this issue Dec 16, 2023
kltm added a commit that referenced this issue Dec 16, 2023
@kltm
Copy link
Member Author

kltm commented Dec 23, 2023

The stop issue seems to be continuing.
Reduced executors to 5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

1 participant