Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlock when ryuk does not acknowledge filters #843

Merged
merged 6 commits into from
Sep 5, 2018

Conversation

thammerl
Copy link
Contributor

@thammerl thammerl commented Aug 21, 2018

This PR should fix the issues #581 and #780.

In case ryuk cannot parse the request, no acknowledgement is sent resulting in a deadlock as ResourceReaper ignores the end of the stream and will continue waiting for one. This can happen under heavy load.

See relevant code in ryuk. From the documentation of reader.readString:

If ReadString encounters an error before finding a delimiter, it returns the data read before the error and the error itself (often io.EOF)

The thread in ResourceReaper needs to check for the return value null when calling BufferedReader#readLine in order to check for the closed stream. From the official documentation of that method:

A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached

I could reproduce the issue by building a patched version of testcontainers 1.8.3 release with extended logging resulting in the following log output (I truncated the file):

ryuk_endless_loop.txt

As you can see, the thread gets stuck trying to read from the ryuk socket waiting for an acknowledgement.

Can someone think of a way on how to implement an automated test for that?

@bsideup
Copy link
Member

bsideup commented Sep 3, 2018

@thammerl the change looks ok in general, thanks!
Since I'm really not sure if we can do an integration test for this, maybe you can write a unit test instead?

@bsideup bsideup added this to the next milestone Sep 3, 2018
@thammerl
Copy link
Contributor Author

thammerl commented Sep 4, 2018

@bsideup I added unit tests for the filter registration logic in 8e71ac6. In order to be able to implement unit tests, I had to extract a FilterRegistry encapsulating the communication with Ryuk.

With 8e71ac6 the Travis CI build began to fail due to a failing Couchbase integration test. I'm not sure whether this is really related to my change. The gradle task couchbase:test is successful on my local machine.

UPDATE: The build became stable again with 7881d05. So I guess the tests are just flaky.

@bsideup bsideup merged commit 7ea1483 into testcontainers:master Sep 5, 2018
@bsideup
Copy link
Member

bsideup commented Sep 5, 2018

@thammerl the failure was unrelated to the change and got fixed after I re-triggered the tests :)
Merged, thanks! 👍

@rnorth
Copy link
Member

rnorth commented Sep 10, 2018

We have this out in a Release Candidate build (1.9.0-rc1) for anyone who is keen to try it!

Release notes

@kszalontai
Copy link

I've tried it on Codeship, but the error is still present

2018-09-10 08:59:37 build-env 08:59:37.755 [Test worker] INFO  org.testcontainers.dockerclient.DockerClientProviderStrategy - Will use 'okhttp' transport
2018-09-10 08:59:38 build-env 08:59:38.634 [Test worker] INFO  org.testcontainers.dockerclient.EnvironmentAndSystemPropertyClientProviderStrategy - Found docker client settings from environment
2018-09-10 08:59:38 build-env 08:59:38.648 [Test worker] INFO  org.testcontainers.dockerclient.DockerClientProviderStrategy - Found Docker environment with Environment variables, system properties and defaults. Resolved: 
2018-09-10 08:59:38 build-env dockerHost=tcp://18.211.213.220:2376
2018-09-10 08:59:38 build-env apiVersion='{UNKNOWN_VERSION}'
2018-09-10 08:59:38 build-env registryUrl='https://index.docker.io/v1/'
2018-09-10 08:59:38 build-env registryUsername='root'
2018-09-10 08:59:38 build-env registryPassword='null'
2018-09-10 08:59:38 build-env registryEmail='null'
2018-09-10 08:59:38 build-env dockerConfig='DefaultDockerClientConfig[dockerHost=tcp://18.211.213.220:2376,registryUsername=root,registryPassword=<null>,registryEmail=<null>,registryUrl=https://index.docker.io/v1/,dockerConfigPath=/root/.docker,sslConfig=LocalDirectorySSLConfig{dockerCertPath=/home/ubuntu/.docker},apiVersion={UNKNOWN_VERSION},dockerConfig=<null>]'
2018-09-10 08:59:38 build-env
2018-09-10 08:59:38 build-env 08:59:38.650 [Test worker] INFO  org.testcontainers.DockerClientFactory - Docker host IP address is 18.211.213.220
2018-09-10 08:59:38 build-env 08:59:38.836 [Test worker] INFO  org.testcontainers.DockerClientFactory - Connected to docker: 
2018-09-10 08:59:38 build-env Server Version: 18.03.1-ce
2018-09-10 08:59:38 build-env API Version: 1.37
2018-09-10 08:59:38 build-env Operating System: Ubuntu 16.04.4 LTS
2018-09-10 08:59:38 build-env Total Memory: 3762 MB
2018-09-10 09:01:40 build-env 
2018-09-10 09:01:40 build-env xxx.HBaseTemplateStoreTest > classMethod FAILED
2018-09-10 09:01:40 build-env java.lang.ExceptionInInitializerError
2018-09-10 09:01:40 build-env
2018-09-10 09:01:40 build-env Caused by:
2018-09-10 09:01:40 build-env java.lang.IllegalStateException: Can not connect to Ryuk

@toadzky
Copy link
Contributor

toadzky commented Sep 11, 2018

I'm still seeing the error locally when I run tests too. It's intermittent (sometimes it does work) but it's failing probably more than 50% of the time. I'm running Ubuntu 18.04 and when it fails it just spams

08:41:10.506 [testcontainers-ryuk] WARN  o.t.utility.ResourceReaper - Can not connect to Ryuk at localhost:32773
java.net.SocketException: Connection reset

and a stacktrace in the output for a while before shutting down.

The ryuk container lasts a while longer (up to a minute) and the only logs it outputs are

2018/09/11 14:44:51 Starting on port 8080...
panic: Timed out waiting for the initial connection

goroutine 1 [running]:
main.main()
	/go/src/github.com/bsideup/moby-ryuk/main.go:92 +0xe1e

@cmbernard333
Copy link

I am seeing the same issue as @toadzky with this error repeating over and over again when running a postgres container with testcontainers 1.14.3

2020-09-30 16:48:56.309  INFO 9874 --- [           main] o.t.utility.RegistryAuthLocator          : Credential helper/store (docker-credential-desktop) does not have credentials for quay.io
2020-09-30 16:48:56.905  WARN 9874 --- [containers-ryuk] o.testcontainers.utility.ResourceReaper  : Can not connect to Ryuk at localhost:32770

java.net.ConnectException: Connection refused (Connection refused)
	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:na]
	at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399) ~[na:na]
	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242) ~[na:na]
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224) ~[na:na]
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403) ~[na:na]
	at java.base/java.net.Socket.connect(Socket.java:609) ~[na:na]
	at java.base/java.net.Socket.connect(Socket.java:558) ~[na:na]
	at java.base/java.net.Socket.<init>(Socket.java:454) ~[na:na]
	at java.base/java.net.Socket.<init>(Socket.java:231) ~[na:na]
	at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:121) ~[testcontainers-1.13.0.jar:na]
	at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27) ~[duct-tape-1.0.8.jar:na]
	at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:119) ~[testcontainers-1.13.0.jar:na]
	at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]```

@rnorth
Copy link
Member

rnorth commented Oct 1, 2020

@cmbernard333 That's an unrelated issue to this one. Please see #3166

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants