Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: grpc reconnection #1521

Merged
merged 6 commits into from
Feb 17, 2021
Merged

Conversation

kstasik
Copy link
Contributor

@kstasik kstasik commented Feb 10, 2021

I have successfully reproduced this issue on 0.15.0 but i think it also applies to recent version. After disconnection it tries to reconnect only once. With this fix it should try reconnect periodically until successful connection.

Please let me know what do you think. Maybe there is some other better way to fix it.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Feb 10, 2021

CLA Signed

The committers are authorized under a signed CLA.

@codecov
Copy link

codecov bot commented Feb 10, 2021

Codecov Report

Merging #1521 (0597d42) into main (3bce9c9) will increase coverage by 0.0%.
The diff coverage is 100.0%.

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #1521   +/-   ##
=====================================
  Coverage   78.0%   78.0%           
=====================================
  Files        127     127           
  Lines       6600    6600           
=====================================
+ Hits        5153    5154    +1     
+ Misses      1203    1202    -1     
  Partials     244     244           
Impacted Files Coverage Δ
exporters/otlp/otlpgrpc/connection.go 86.9% <100.0%> (+0.9%) ⬆️

@kstasik kstasik changed the title [WIP] fix: grpc reconnection fixed fix: grpc reconnection Feb 10, 2021
@MrAlias
Copy link
Contributor

MrAlias commented Feb 10, 2021

Is there an issue tracking this? If not, can we start there? It would be nice to better understand the problem this is attempting to solve.

@kstasik
Copy link
Contributor Author

kstasik commented Feb 10, 2021

I hope it helps a little bit:

#1524

@jmacd
Copy link
Contributor

jmacd commented Feb 10, 2021

It would be great if we had a test that would prevent regressions, since this bug has been a major issue of this nature. I am not very familiar with the code, but this certainly looks correct.

Copy link
Member

@XSAM XSAM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't reproduce this bug, and the test TestNewExporter_collectorConnectionDiesThenReconnects works fine.

@@ -13,6 +13,10 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
- Rename project default branch from `master` to `main`.
- Reverse order in which `Resource` attributes are merged, per change in spec. (#1501)

## Fixed

- Fixed otlpgrpc reconnection issue.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Fixed otlpgrpc reconnection issue.
- Fixed otlpgrpc reconnection issue. (#1521)

@kstasik
Copy link
Contributor Author

kstasik commented Feb 11, 2021

thank you - i didn't notice integration tests. I'm not sure about my solution but I think this test doesn't cover this line:

https://github.com/open-telemetry/opentelemetry-go/blob/main/exporters/otlp/otlpgrpc/connection.go#L158

Which i think causes goroutine to hang. I will try to write new test or extend this.

@kstasik
Copy link
Contributor Author

kstasik commented Feb 11, 2021

I think changing disconnectedCh channel to buffered may be better option but I will change this PR once i have time to write integration tests.

Copy link
Member

@punya punya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may also fix the test flake I ran into elsewhere (TestNewExporter_collectorConnectionDiesThenReconnects).

@punya
Copy link
Member

punya commented Feb 11, 2021

I can't reproduce this bug, and the test TestNewExporter_collectorConnectionDiesThenReconnects works fine.

Actually, I saw it flake several times in CI yesterday, and locally I see it fail about 5% of the time (https://gist.githubusercontent.com/punya/1db2a5fc7aede3baa06685997e145f3b/raw/bc52d8e2ba68772539a36c87f8ff9d1492e2ecf6/runs.txt).

@kstasik
Copy link
Contributor Author

kstasik commented Feb 12, 2021

btw. in this case connected() function was very misleading for me (grpc exporter). It doesn't mean exporter is connected and thanks to this reconnection goroutine it almost always returns true.

First reconnection always successfully creates new "connection" which is really not established.
https://pkg.go.dev/google.golang.org/grpc#DialContext

I changed fix to buffered channel, please review and let me know what do you think?

Copy link
Member

@XSAM XSAM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

@Aneurysm9 Aneurysm9 merged commit 8da5299 into open-telemetry:main Feb 17, 2021
ldelossa pushed a commit to ldelossa/opentelemetry-go that referenced this pull request Mar 5, 2021
* fix: grpc reconnection fixed

* chore: changelog update

* fix: grpc reconnection issue - red test

* fix: grpc reconnection open-telemetry#1524

* fix: grpc reconnection issue cleanup
chenzhihao added a commit to hstan/opentelemetry-go that referenced this pull request May 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants