Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/telemetry/internal/upload: TestSimpleServer serving 503 and deadlocking on multiple builders #62137

Closed
bcmills opened this issue Aug 18, 2023 · 9 comments
Assignees
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. telemetry x/telemetry issues
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Aug 18, 2023

TestSimpleServer is logging a 503 response and then deadlocking, consistently on both the linux-riscv64-unmatched and linux-amd64-wsl builders:

https://build.golang.org/log/62c59c3e9c4426454662218f8aba4048231990f3:

first_test.go:30: 503
panic: test timed out after 40m0s
running tests:
	TestSimpleServer (40m0s)

goroutine 36 [running]:
testing.(*M).startAlarm.func1()
	/tmp/workdir-host-linux-riscv64-unmatched/go/src/testing/testing.go:2259 +0x300
created by time.goFunc
	/tmp/workdir-host-linux-riscv64-unmatched/go/src/time/sleep.go:177 +0x50

goroutine 1 [chan receive]:
testing.(*T).Run(0x3f940016c0, {0x337e60?, 0x31b98d7c837a4?}, 0x352698)
	/tmp/workdir-host-linux-riscv64-unmatched/go/src/testing/testing.go:1649 +0x380
testing.runTests.func1(0x1a0?)
	/tmp/workdir-host-linux-riscv64-unmatched/go/src/testing/testing.go:2054 +0x4c
testing.tRunner(0x3f940016c0, 0x3f94046c80)
	/tmp/workdir-host-linux-riscv64-unmatched/go/src/testing/testing.go:1595 +0x104
testing.runTests(0x3f94070960?, {0x58bc00, 0x2, 0x2}, {0x1cd0c?, 0x3f940745e0?, 0x5c8d40?})
	/tmp/workdir-host-linux-riscv64-unmatched/go/src/testing/testing.go:2052 +0x404
testing.(*M).Run(0x3f94070960)
	/tmp/workdir-host-linux-riscv64-unmatched/go/src/testing/testing.go:1925 +0x564
main.main()
	_testmain.go:49 +0x1a8

goroutine 3 [chan receive]:
golang.org/x/telemetry/internal/upload.TestSimpleServer(0x3f94001860)
	/tmp/workdir-host-linux-riscv64-unmatched/gopath/src/golang.org/x/telemetry/internal/upload/first_test.go:32 +0x1b8
testing.tRunner(0x3f94001860, 0x352698)
	/tmp/workdir-host-linux-riscv64-unmatched/go/src/testing/testing.go:1595 +0x104
created by testing.(*T).Run in goroutine 1
	/tmp/workdir-host-linux-riscv64-unmatched/go/src/testing/testing.go:1648 +0x36c

goroutine 4 [IO wait]:
internal/poll.runtime_pollWait(0x3f98269f40, 0x72)
	/tmp/workdir-host-linux-riscv64-unmatched/go/src/runtime/netpoll.go:345 +0xc4
internal/poll.(*pollDesc).wait(0x3f9411c000?, 0x25490?, 0x0)
	/tmp/workdir-host-linux-riscv64-unmatched/go/src/internal/poll/fd_poll_runtime.go:84 +0x44
internal/poll.(*pollDesc).waitRead(...)
	/tmp/workdir-host-linux-riscv64-unmatched/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x3f9411c000)
	/tmp/workdir-host-linux-riscv64-unmatched/go/src/internal/poll/fd_unix.go:611 +0x260
net.(*netFD).accept(0x3f9411c000)
	/tmp/workdir-host-linux-riscv64-unmatched/go/src/net/fd_unix.go:172 +0x34
net.(*TCPListener).accept(0x3f940220c0)
	/tmp/workdir-host-linux-riscv64-unmatched/go/src/net/tcpsock_posix.go:152 +0x34
net.(*TCPListener).Accept(0x3f940220c0)
	/tmp/workdir-host-linux-riscv64-unmatched/go/src/net/tcpsock.go:315 +0x38
net/http.(*Server).Serve(0x3f9411a0f0, {0x3b0c10, 0x3f940220c0})
	/tmp/workdir-host-linux-riscv64-unmatched/go/src/net/http/server.go:3053 +0x2f0
net/http.Serve(...)
	/tmp/workdir-host-linux-riscv64-unmatched/go/src/net/http/server.go:2592
golang.org/x/telemetry/internal/upload.testServer(0x0?)
	/tmp/workdir-host-linux-riscv64-unmatched/gopath/src/golang.org/x/telemetry/internal/upload/utils_test.go:89 +0x144
created by golang.org/x/telemetry/internal/upload.setup in goroutine 3
	/tmp/workdir-host-linux-riscv64-unmatched/gopath/src/golang.org/x/telemetry/internal/upload/utils_test.go:26 +0xb0
FAIL	golang.org/x/telemetry/internal/upload	2400.140s

(attn @pjweinb, @jamalc, @hyangah)

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 18, 2023
@gopherbot gopherbot added the telemetry x/telemetry issues label Aug 18, 2023
@gopherbot gopherbot added this to the Unreleased milestone Aug 18, 2023
@pjweinb
Copy link

pjweinb commented Aug 18, 2023 via email

@bcmills
Copy link
Contributor Author

bcmills commented Aug 18, 2023

The linux-amd64-wsl builders in particular run normal linux binaries, so it isn't at all obvious to me that this failure mode is platform specific as opposed to just timing-sensitive or sensitive to certain details of network configuration.

If it is timing-sensitive or configuration-sensitive, then this failure mode could potentially also affect users on first-class ports, depending on what hardware they are using.

@bcmills
Copy link
Contributor Author

bcmills commented Aug 18, 2023

Perhaps a good starting point would be to add logging to the test to get more useful output, so that we better understand the nature of the failure?

@pjweinb
Copy link

pjweinb commented Aug 18, 2023 via email

@bcmills
Copy link
Contributor Author

bcmills commented Aug 18, 2023

Probably the easiest way is to upload a CL to gerrit and add the builder as a SlowBot in the same comment where you set the TryBot+1 vote, like:

TRY=linux-amd64-wsl

(You could also use a gomote, but those are pretty awkward for changes in x repos. 🤷‍♂️)

@bcmills
Copy link
Contributor Author

bcmills commented Aug 21, 2023

The two builders that are affected are, IIRC, both located in China.

Is it possible that something about this test is failing due to the time zone being on the other side of UTC?
(This may be related to #62192.)

If so, that could be an issue for Go users in other time zones independent of platform.

@pjweinb
Copy link

pjweinb commented Aug 21, 2023 via email

@bcmills
Copy link
Contributor Author

bcmills commented Aug 21, 2023

That's probably #62192, I'm gussing? 🙃

@findleyr
Copy link
Contributor

findleyr commented Nov 6, 2023

This may also be resolved by https://go.dev/cl/538297.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. telemetry x/telemetry issues
Projects
None yet
Development

No branches or pull requests

5 participants