Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multidim-interop test spec #117

Merged
merged 3 commits into from
Jan 31, 2023
Merged

Add multidim-interop test spec #117

merged 3 commits into from
Jan 31, 2023

Conversation

MarcoPolo
Copy link
Contributor

This defines what the test should do.

Note that none of the tests are currently conforming to this. Go prints to stdout; all the tests have the dialer also RPUSH; and none of them emit the structured JSON result.

Once we merge this, I'll update the existing tests.

@MarcoPolo
Copy link
Contributor Author

@marten-seemann I think this will give us what we need to measure handshake RTTs. Am I missing something?

multidim-interop/README.md Outdated Show resolved Hide resolved
multidim-interop/README.md Outdated Show resolved Hide resolved
Comment on lines 53 to 55
4. Sleep for the duration of test_timeout. The test runner will kill this
process when the dialer finishes.
5. If the timeout is hit, exit with a non-zero error code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can’t the server just run indefinitely, until it’s killed by the test runner?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would allow us to get rid of the test timeout env. The timeout would be enforced by the runner, which would simplify the test implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't generally like having things running indefinitely. And it's not like this would work past the first try anyways since it only publishes the address once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would allow us to get rid of the test timeout env.

It wouldn't. You still want a timeout in the redis BLPOP command on the dialer side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would allow us to get rid of the test timeout env.

It wouldn't. You still want a timeout in the redis BLPOP command on the dialer side.

That timeout doesn’t need to be configurable, does it? There’s no way the redis lookup will take significant time, unless the other host fails to boot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the BLPOP command. This one depends on the other host starting up, listening, and then broadcasting their address. Originally this was hardcoded as 10 seconds, because that seemed like a large enough time frame. But in CI with a NodeJS listener that startup can take more than 10 seconds. This is the increase the spurred this change.

Now we can say, let's just set some arbitrarily high timeout here (e.g. 1 hour) then let the test runner kill things after N seconds. But in my experience, it's been helpful to see where the tests fails with a timeout. Did the dialer fail to get the listener's address? Or did the dialer fail to connect and ping?

Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this Marco, I can update #114 and the Ping test on rust-libp2p when this gets merged.

multidim-interop/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@BigLep BigLep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for documenting. I'll leave to other to approve.

multidim-interop/README.md Outdated Show resolved Hide resolved
multidim-interop/README.md Outdated Show resolved Hide resolved
multidim-interop/README.md Outdated Show resolved Hide resolved
multidim-interop/README.md Outdated Show resolved Hide resolved
multidim-interop/README.md Outdated Show resolved Hide resolved
multidim-interop/README.md Outdated Show resolved Hide resolved
multidim-interop/README.md Outdated Show resolved Hide resolved
multidim-interop/README.md Outdated Show resolved Hide resolved
@thomaseizinger
Copy link
Contributor

One thought but perhaps it is obvious to others: I haven't used redis before but it seems to work really well for such synchronisation requirements. However, if our test runner is now specific to the test, why doesn't it perform the synchronisation by reading and writing to stdin/stdout? We would probably not use compose then but just start the docker containers directly via JavaScript.

Something like:

  • Start all nodes
  • Wait for listener to print address (in JSON) on stdout
  • Tell dialer to dial address
  • Wait for dialer to exit
  • Kill listener

I am not sure it is overall simpler but it would remove the dependency on redis and compose. One downside is that running the binaries / containers yourself is a little more involved because you have to type into stdin now.

@MarcoPolo
Copy link
Contributor Author

The nice thing about using compose is the test runner doesn't have to have much logic about the test itself. You can run npm run test -- --name-filter="<some test filter>" --emit-only and get the exact compose yaml that you can run manually to run the test. You can replace the image with another one and tweak it.

It's also much easier to debug and develop this with a redis instance in the background that deals with passing information around. There are Redis clients for many languages (even prolog!).

Finally, it's nice that the runner runs compose and that's it. Sure it's looking at stdout once, but that's easy. The annoying part is dealing with lifetimes of containers.

@MarcoPolo
Copy link
Contributor Author

Thanks for all the comments all! Merging this and creating an issue to update the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants