Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken nodejs.org webhook since December 17th #2123

Closed
nschonni opened this issue Jan 3, 2020 · 18 comments · Fixed by #2639
Closed

Broken nodejs.org webhook since December 17th #2123

nschonni opened this issue Jan 3, 2020 · 18 comments · Fixed by #2639

Comments

@nschonni
Copy link
Member

nschonni commented Jan 3, 2020

The last deploy in https://nodejs.org/github-webhook.log was for nodejs/nodejs.org@a1196dd
The next commit in this repo was 9df1fc4 but it doesn't look like it should affect the hook so maybe something manual happened.

/cc @Trott

@Trott
Copy link
Member

Trott commented Jan 3, 2020

I don't know much about the webhook, although I can learn I suppose? But hopefully someone else on @nodejs/build knows what to do?

@XhmikosR
Copy link
Contributor

XhmikosR commented Jan 3, 2020

May I also add that it might be better to rotate the log more frequently since it gets quite huge very fast?

@phillipj
Copy link
Member

phillipj commented Jan 5, 2020

Here's a couple of pointers I've learned after working with the @nodejs-github-bot which has a similar GitHub webhook setup:

  • github-webhook is a web server that gets installed globally on the server where nodejs.org is hosted
  • github-webhook systemd service ensures the above web server is running and restarted it accordingly when needed
  • github-webhook.json configures the webhook server to listen for certain GitHub webhook events and execute certain commands
  • build-site.sh nodejs gets executed by github-webhook when commits lands on master

Hopefully the above gives some hints about what might be wrong or where to look 🤷‍♂

@rvagg
Copy link
Member

rvagg commented Jan 6, 2020

weird error messages in the system logs for github-webhook:

Jan 14 14:10:48 infra-digitalocean-ubuntu1604-x64-1 github-webhook[2647]: Emitted 'error' event at:
Jan 14 14:10:48 infra-digitalocean-ubuntu1604-x64-1 github-webhook[2647]:     at lazyFs.open (internal/fs/streams.js:273:12)
Jan 14 14:10:48 infra-digitalocean-ubuntu1604-x64-1 github-webhook[2647]:     at FSReqWrap.oncomplete (fs.js:141:20)
Jan 14 14:11:19 infra-digitalocean-ubuntu1604-x64-1 github-webhook[2956]: events.js:167
Jan 14 14:11:19 infra-digitalocean-ubuntu1604-x64-1 github-webhook[2956]:       throw er; // Unhandled 'error' event
Jan 14 14:11:19 infra-digitalocean-ubuntu1604-x64-1 github-webhook[2956]:       ^
Jan 14 14:11:19 infra-digitalocean-ubuntu1604-x64-1 github-webhook[2956]: Error: ENOENT: no such file or directory, open '/home/nodejs/github-webhook.log'

but the log file does exist; perhaps something to do with a file rotation, although we don't have an automatic one yet.

I've restarted the service but don't have access to nodejs/nodejs.org to trigger a new build to see if it works. Can someone with perms push a new commit or go to the Settings in GitHub and re-trigger a master push event?

@nschonni
Copy link
Member Author

nschonni commented Jan 6, 2020

I just merged nodejs/nodejs.org#2872 so it should trigger now

@rvagg
Copy link
Member

rvagg commented Jan 6, 2020

Mon Jan 06 2020 03:10:57 GMT+0000 (Coordinated Universal Time)
Took 226139 ms
stdout: HEAD is now at b8f965d build(deps-dev): bump nock from 11.7.0 to 11.7.1

looks like it's working again

for the record, this just required a sudo systemctl restart github-webhook on the web server for those (few) with access.. as for the cause, I really don't know.

I'll manually rotate that log not but it does need to be automated by logrotate at some point.

@rvagg rvagg closed this as completed Jan 6, 2020
@XhmikosR
Copy link
Contributor

XhmikosR commented Jan 6, 2020

FYI https://nodejs.org/github-webhook.log seems to be empty for me

@rvagg
Copy link
Member

rvagg commented Jan 6, 2020

I emptied it, push to master and it should fill 🤞

@nschonni
Copy link
Member Author

nschonni commented Jan 6, 2020

Merged nodejs/nodejs.org#2875 so it should trigger now

@nschonni
Copy link
Member Author

nschonni commented Jan 6, 2020

Looks like it logged the deployment now

@BethGriggs
Copy link
Member

Posted in IRC, but this seems to have happened again for the v12.18.2 release post - nodejs/nodejs.org#3289

@rvagg
Copy link
Member

rvagg commented Jul 1, 2020

/cc @codebytere

Here's the output of the bottom of https://nodejs.org/github-webhook.log:

...
event="push", match="ref == "refs/heads/master" && repository.full_name == "nodejs/nodejs.org"", exec="/home/nodejs/build-site.sh nodejs"
Thu Jun 25 2020 13:54:58 GMT+0000 (Coordinated Universal Time)
Took 125670 ms
stdout: build-site.sh already running
event="push", match="ref == "refs/heads/master" && repository.full_name == "nodejs/nodejs.org"", exec="/home/nodejs/build-site.sh nodejs"
Tue Jun 30 2020 13:46:00 GMT+0000 (Coordinated Universal Time)
Took 27 ms
stdout: build-site.sh already running
event="push", match="ref == "refs/heads/master" && repository.full_name == "nodejs/nodejs.org"", exec="/home/nodejs/build-site.sh nodejs"
Tue Jun 30 2020 13:46:00 GMT+0000 (Coordinated Universal Time)
Took 12 ms

The bit above that is a successful build, which took 125670 ms.

The next two I believe are yesterday's 12.x and 14.x but says build-site.sh already running. That comes from here:

pidof -s -o '%PPID' -x $(basename $0) > /dev/null 2>&1 && \
echo "$(basename $0) already running" && \
exit 1
- that's pretty solid I think, I'm not sure how that would fail if build-site.sh wasn't running.

So, the webhook isn't the only thing that can trigger a website build, it also gets triggered by changes to the downloads/release/index.json when there's a new release! We have a crontab entry in that which checks every 5 minutes, so it can be fairly quick after you've promoted your releases. So here's my theory: you both did your releases, the auto-build ran after detecting the changes, then you did your blog posts and pushed to nodejs.org but when the webhook triggered it found that build-site.sh was already running so it bailed.

We don't have a queueing mechanism built into any of this, that would be the logical thing to add, except it adds a significant amount of complication. This is probably going to require some kind of queueing mechanism to fix it, but that's going to add a significant amount of complexity. I did build one for unofficial-builds which works quite nicely so maybe I could borrow from that. An alternative might be to just remove that crontab job entirely and rely on the blog posts to update the website. Each release gets a blog post so maybe that's reasonable? If you ever need to force an update then just force push something to nodejs/nodejs.org.

For now I've just run it manually. In future just remember that pushing any new commit to nodejs/nodejs.org should send a trigger again. Alternatively if you have access to the webhook settings in that repo you can replay the last one to the same effect.

@MylesBorins
Copy link
Contributor

This seems to be happening again

https://nodejs.org/github-webhook.log is showing a commit from 3 weeks ago and the site is not updating

@MylesBorins MylesBorins reopened this Apr 29, 2021
@richardlau
Copy link
Member

I'm seeing this is https://nodejs.org/github-webhook.log for nodejs/nodejs.org@ef6cf1f:

event="push", match="ref == "refs/heads/master" && repository.full_name == "nodejs/nodejs.org"", exec="/home/nodejs/build-site.sh nodejs"
Thu Apr 29 2021 19:25:59 GMT+0000 (Coordinated Universal Time)
Took 21368 ms
stdout: HEAD is now at ef6cf1f Add graceful-fs. (#3700)
stdout: Removing node_modules/
stderr: HEAD is now at ef6cf1f... Add graceful-fs. (#3700)
stdout: latest: Pulling from library/node
stdout: Digest: sha256:25516f3de85ebf588e29d81052495d2e1177b55cddbd7ddab2f5ff2c4496dd5e
stdout: Status: Image is up to date for node:latest
stdout: docker.io/library/node:latest
stdout: Get:1 http://deb.debian.org/debian buster InRelease [121 kB]
stdout: Get:2 http://security.debian.org/debian-security buster/updates InRelease [65.4 kB]
stdout: Get:3 http://deb.debian.org/debian buster-updates InRelease [51.9 kB]
stdout: Get:4 http://deb.debian.org/debian buster/main amd64 Packages [7907 kB]
stdout: Get:5 http://security.debian.org/debian-security buster/updates/main amd64 Packages [285 kB]
stdout: Get:6 http://deb.debian.org/debian buster-updates/main amd64 Packages [10.9 kB]
stdout: Fetched 8441 kB in 2s (5058 kB/s)
stdout: Reading package lists...
stdout: Reading package lists...
stdout: Building dependency tree...
stdout: Reading state information...
stdout: The following additional packages will be installed:
stdout:   libpopt0
stdout: Suggested packages:
stdout:   openssh-server
stdout: The following NEW packages will be installed:
stdout:   libpopt0 rsync
stdout: 0 upgraded, 2 newly installed, 0 to remove and 2 not upgraded.
stdout: Need to get 446 kB of archives.
stdout: After this operation, 974 kB of additional disk space will be used.
stdout: Get:1 http://deb.debian.org/debian buster/main amd64 libpopt0 amd64 1.16-12 [49.4 kB]
stdout: Get:2 http://deb.debian.org/debian buster/main amd64 rsync amd64 3.1.3-6 [397 kB]
stderr: debconf: delaying package configuration, since apt-utils is not installed
stdout: Fetched 446 kB in 0s (24.9 MB/s)
stdout: Selecting previously unselected package libpopt0:amd64.
stdout: (Reading database ... 
(Reading database ... 5%
(Reading database ... 10%
(Reading database ... 15%
(Reading database ... 20%
(Reading database ... 25%
(Reading database ... 30%
(Reading database ... 35%
(Reading database ... 40%
(Reading database ... 45%
(Reading database ... 50%
(Reading database ... 55%
(Reading database ... 60%
(Reading database ... 65%
(Reading database ... 70%
(Reading database ... 75%
(Reading database ... 80%
(Reading database ... 85%
(Reading database ... 90%
(Reading database ... 95%
(Reading database ... 100%
(Reading database ... 23985 files and directories currently installed.)
stdout: Preparing to unpack .../libpopt0_1.16-12_amd64.deb ...
stdout: Unpacking libpopt0:amd64 (1.16-12) ...
stdout: Selecting previously unselected package rsync.
stdout: Preparing to unpack .../rsync_3.1.3-6_amd64.deb ...
stdout: Unpacking rsync (3.1.3-6) ...
stdout: Setting up libpopt0:amd64 (1.16-12) ...
stdout: Setting up rsync (3.1.3-6) ...
stdout: invoke-rc.d: could not determine current runlevel
stdout: invoke-rc.d: policy-rc.d denied execution of restart.
stdout: Processing triggers for libc-bin (2.28-10) ...
stdout: Adding group `nodejs' (GID 1001) ...
stdout: Done.
stdout: Adding user `nodejs' ...
stdout: Adding new user `nodejs' (1001) with group `nodejs' ...
stdout: Creating home directory `/home/nodejs' ...
stdout: Copying files from `/etc/skel' ...
stderr: npm notice 
stderr: npm notice New minor version of npm available! 7.10.0 -> 7.11.1
stderr: npm notice Changelog: <https://github.com/npm/cli/releases/tag/v7.11.1>
stderr: npm notice Run `npm install -g npm@7.11.1` to update!
stderr: npm notice 
stderr: npm notice 
stderr: npm notice New minor version of npm available! 7.10.0 -> 7.11.1
stderr: npm notice Changelog: <https://github.com/npm/cli/releases/tag/v7.11.1>
stderr: npm notice Run `npm install -g npm@7.11.1` to update!
stderr: npm notice 
stderr: npm http fetch POST 200 https://registry.npmjs.org/-/npm/v1/security/advisories/bulk 1025ms
stdout: 
stdout: added 255 packages, and audited 256 packages in 7s
stdout: 
stdout: found 0 vulnerabilities
stdout: 
stdout: > deploy
stdout: > npm-run-all load-schedule build:deploy external:survey
stdout: 
stdout: 
stdout: > load-schedule
stdout: > curl -sS https://raw.githubusercontent.com/nodejs/Release/master/schedule.json -o source/schedule.json
stdout: 
stdout: 
stdout: > build:deploy
stdout: > node build.js --preserveLocale
stdout: 
stderr: node: ../src/coroutine.cc:134: void* find_thread_id_key(void*): Assertion `thread_id_key != 0x7777' failed.
stderr: Aborted (core dumped)
stderr: ERROR: "build:deploy" exited with 134.
event="push", match="ref == "refs/heads/master" && repository.full_name == "nodejs/nodejs.org"", exec="/home/nodejs/build-site.sh nodejs"
Thu Apr 29 2021 19:26:16 GMT+0000 (Coordinated Universal Time)
Took 17125 ms

in particular it looks like coroutine has failed, which I presume is a dependency for the website?

@richardlau
Copy link
Member

src/coroutine.cc is from fibers and the assertion looks like laverdet/node-fibers#451. I'm not sure what version of Node.js the webhook is running but the mention of npm 7.10 would suggest Node.js 16, which fibers is not compatible with.

@richardlau
Copy link
Member

@MylesBorins
Copy link
Contributor

MylesBorins commented Apr 29, 2021

@richardlau seems like we should maybe switch to Node.js 14 node:lts? (you suggested this in the other thread)

@richardlau
Copy link
Member

@richardlau seems like we should maybe switch to Node.js 14 node:lts? (you suggested this in the other thread)

Indeed I did suggest that and have done. PR to sync the copy of the script to what is deployed on the website: #2639
I am able to successfully run the build-site.sh script manually on the www server. I think testing the webhook itself requires pushing to the nodejs/nodejs.org repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants