-
Notifications
You must be signed in to change notification settings - Fork 1.3k
chore/otel-collector: upgrade to v0.103.0, remove jaegerexporter #63171
Conversation
Caution License checking failed, please read: how to deal with third parties licensing. |
1 similar comment
Caution License checking failed, please read: how to deal with third parties licensing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR updates GRPC from 1.61 to 1.64 which deprecates the .Dial*
functions. This is caught by the CI, and we most likely want to update it.
Looking at jaegertracing/jaeger#5330 it seems that there are a few little semantics differences to take in account, making this potentially not trivial.
I'll ping the source team as @ggilmore has been leading this for a while.
I think I am fine with keeping the deprecated method for now, given the complexity laid out in the thread JH linked. The release notes say:
So LGTM. |
- MPL-2.0 | ||
- :who: | ||
:why: | ||
:versions: [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be true for all things in here, and is unrelated to the PR, but I am confused why versions is empty :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's optional fields you can record when tracking a dependency change, e.g. "this code was MPL for vX to vY"
go.mod
Outdated
@@ -40,8 +40,6 @@ replace ( | |||
github.com/gomodule/redigo => github.com/gomodule/redigo v1.8.9 | |||
// Pending: Renamed to github.com/google/gnostic. Transitive deps still use the old name (kubernetes/kubernetes). | |||
github.com/googleapis/gnostic => github.com/googleapis/gnostic v0.5.5 | |||
// Pending a release cut of https://github.com/prometheus/alertmanager/pull/3010 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes!!
go.mod
Outdated
cloud.google.com/go/bigquery v1.61.0 | ||
cloud.google.com/go/kms v1.17.1 | ||
cloud.google.com/go/monitoring v1.19.0 | ||
cloud.google.com/go/profiler v0.4.0 | ||
cloud.google.com/go/pubsub v1.37.0 | ||
cloud.google.com/go/secretmanager v1.11.5 | ||
cloud.google.com/go/storage v1.38.0 | ||
github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/metric v0.45.0 | ||
github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/trace v1.21.0 | ||
cloud.google.com/go/pubsub v1.38.0 | ||
cloud.google.com/go/secretmanager v1.13.1 | ||
cloud.google.com/go/storage v1.40.0 | ||
github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/metric v0.48.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sad that these all had to upgrade for this otel bump, makes it harder to track down regressions :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah the dependency tree is pretty wild. I did upgrade aggressively but we've ran into more issues with not upgrading in lockstep, than we have with upgrading everything OTEL depends on in one go
32336fe
to
75327cc
Compare
COLLECTOR_OTLP_ENABLED: 'true' | ||
JAEGER_OTLP_GRPC_PORT: 4320 | ||
JAEGER_OTLP_HTTP_PORT: 4321 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to enable this on deploy-*
@@ -47,15 +47,15 @@ func TestNewExhaustive(t *testing.T) { | |||
(useFullDeadline . true) | |||
(patternInfo . TextPatternInfo{"content",case,nopath,filematchlimit:1000000,lang:cpp,F:"(?i)(?:\\.cpp$)|(?:\\.c\\+\\+$)|(?:\\.cc$)|(?:\\.cp$)|(?:\\.cppm$)|(?:\\.cxx$)|(?:\\.h$)|(?:\\.h\\+\\+$)|(?:\\.hh$)|(?:\\.hpp$)|(?:\\.hxx$)|(?:\\.inc$)|(?:\\.inl$)|(?:\\.ino$)|(?:\\.ipp$)|(?:\\.ixx$)|(?:\\.re$)|(?:\\.tcc$)|(?:\\.tpp$)|(?:\\.txx$)"}) | |||
(numRepos . 0) | |||
(pathRegexps . [(?i)(?:\.cpp$)|(?:\.c\+\+$)|(?:\.cc$)|(?:\.cp$)|(?:\.cppm$)|(?:\.cxx$)|(?:\.h$)|(?:\.h\+\+$)|(?:\.hh$)|(?:\.hpp$)|(?:\.hxx$)|(?:\.inc$)|(?:\.inl$)|(?:\.ino$)|(?:\.ipp$)|(?:\.ixx$)|(?:\.re$)|(?:\.tcc$)|(?:\.tpp$)|(?:\.txx$)]) | |||
(pathRegexps . ["(?i)(?:\\.cpp$)|(?:\\.c\\+\\+$)|(?:\\.cc$)|(?:\\.cp$)|(?:\\.cppm$)|(?:\\.cxx$)|(?:\\.h$)|(?:\\.h\\+\\+$)|(?:\\.hh$)|(?:\\.hpp$)|(?:\\.hxx$)|(?:\\.inc$)|(?:\\.inl$)|(?:\\.ino$)|(?:\\.ipp$)|(?:\\.ixx$)|(?:\\.re$)|(?:\\.tcc$)|(?:\\.tpp$)|(?:\\.txx$)"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know where this is coming from? cc @varungandhi-src perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhchabran sorry, I don't know about this. I've flagged this in the search platform team channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently we use go.opentelemetry.io/otel
to encode values in the s-exp printer for the tests (see here).
Without looking at the source of go.opentelemetry.io/otel
, My best bet is they changed the way they process arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @stefanhengl. Do you think this is creating a problem here? (I don't have context on this, was flagged as a reviewer and spotted this, which prompted me to reach out as this could break something)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. As far as I can see it is only used in tests.
) Thread: https://sourcegraph.slack.com/archives/C1JH2BEHZ/p1717797870638299 One problem caused by this upgrade is that the [deprecated `jaegerexporter`](open-telemetry/opentelemetry-specification#2858) no longer builds at all with the last published version, so for the upgrade to go through it must be removed. I've updated local `sg start` to work with this change, but some Release team support is needed for deployment configuration + customer-facing docs changes: https://sourcegraph.slack.com/archives/C1JH2BEHZ/p1718143249191349?thread_ts=1717797870.638299&cid=C1JH2BEHZ, since current guidance asks customers to configure `jaegerexporter`. Part of https://linear.app/sourcegraph/issue/SEC-1680 Closes https://linear.app/sourcegraph/issue/CORE-177 ## Test plan Followed steps shared in https://sourcegraph.slack.com/archives/C04MYFW01NV/p1718136211292469 to run locally, since `sg run jaeger otel-collector` alone is insufficient to get updated images: ```diff diff --git a/wolfi-images/opentelemetry-collector.yaml b/wolfi-images/opentelemetry-collector.yaml index b995a3d531b..76de80d4f84 100644 --- a/wolfi-images/opentelemetry-collector.yaml +++ b/wolfi-images/opentelemetry-collector.yaml @@ -7,7 +7,11 @@ contents: - mailcap ## opentelemetry-collector packages - - opentelemetry-collector@sourcegraph + - opentelemetry-collector@branch + keyring: + - https://packages.sgdev.org/sourcegraph-melange-dev.rsa.pub + repositories: + - '@Branch https://packages.sgdev.org/branches/otel-collector-upgrade' paths: - path: /otel-collector ``` plus `sg wolfi lock opentelemetry-collector` will make `sg run otel-collector` use the correct image. **The above diffs SHOULD NOT be committed. The lock should happen post-merge.** Spot-checked some traces locally with: ``` sg run jaeger otel-collector sg start ``` ![image](https://github.com/sourcegraph/sourcegraph/assets/23356519/dff88d5a-db9a-4039-b7ba-682e120bdc41) ![image](https://github.com/sourcegraph/sourcegraph/assets/23356519/7e553894-0026-47de-ae38-ba5502b94c63) ## Changelog - OpenTelemetry Collector: Upgraded OpenTelemetry Collector to v0.103.0 - OpenTelemetry Collector: The [deprecated `jaegerexporter` has been removed](open-telemetry/opentelemetry-specification#2858). Users of `exporter: { jaeger: ... }` in the OpenTelemetry Collector should use `exporter: { otlp/jaeger: ... }` to send traces directly to Jaeger via its OTLP receiver. - Users of the default Jaeger configurations now need to provide `JAEGER_OTLP_GRPC_PORT` as well as the existing `JAEGER_HOST` configuration. --------- Co-authored-by: Anish Lakhwara <anish+github@lakhwara.com> Co-authored-by: Anish Lakhwara <anish+git@lakhwara.com>
<!-- PR description tips: https://www.notion.so/sourcegraph/Write-a-good-pull-request-description-610a7fd3e613496eb76f450db5a49b6e --> We need to update the wolfi lock image for https://github.com/sourcegraph/sourcegraph/pull/63171 in order for `sg run` to work We've made all the changes to the deployment repos for this to be pushed out in the release today. ## Test plan <!-- REQUIRED; info at https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles --> Manually tested ## Changelog <!-- OPTIONAL; info at https://www.notion.so/sourcegraph/Writing-a-changelog-entry-dd997f411d524caabf0d8d38a24a878c --> - fix(build): update wolfi lock for otel-collector
…#63790) The OTEL upgrade https://github.com/sourcegraph/sourcegraph/pull/63171 bumps the `prometheus/common` package too far via transitive deps, causing us to generate configuration for alertmanager that altertmanager doesn't accept, at least until the alertmanager project cuts a new release with a newer version of `promethues/common`. For now we forcibly downgrade with a replace. Everything still builds, so we should be good to go. ## Test plan `sg start` and `sg run prometheus`. On `main`, editing `observability.alerts` will cause Alertmanager to refuse to accept the generated configuration. With this patch, all is well it seems - config changes go through as expected. This is a similar test plan for https://github.com/sourcegraph/sourcegraph/pull/63329 ## Changelog - Fix Prometheus Alertmanager configuration failing to apply `observability.alerts` from site config
…#63790) The OTEL upgrade https://github.com/sourcegraph/sourcegraph/pull/63171 bumps the `prometheus/common` package too far via transitive deps, causing us to generate configuration for alertmanager that altertmanager doesn't accept, at least until the alertmanager project cuts a new release with a newer version of `promethues/common`. For now we forcibly downgrade with a replace. Everything still builds, so we should be good to go. ## Test plan `sg start` and `sg run prometheus`. On `main`, editing `observability.alerts` will cause Alertmanager to refuse to accept the generated configuration. With this patch, all is well it seems - config changes go through as expected. This is a similar test plan for https://github.com/sourcegraph/sourcegraph/pull/63329 ## Changelog - Fix Prometheus Alertmanager configuration failing to apply `observability.alerts` from site config (cherry picked from commit ffa873f)
… generated config (#63793) The OTEL upgrade https://github.com/sourcegraph/sourcegraph/pull/63171 bumps the `prometheus/common` package too far via transitive deps, causing us to generate configuration for alertmanager that altertmanager doesn't accept, at least until the alertmanager project cuts a new release with a newer version of `promethues/common`. For now we forcibly downgrade with a replace. Everything still builds, so we should be good to go. ## Test plan `sg start` and `sg run prometheus`. On `main`, editing `observability.alerts` will cause Alertmanager to refuse to accept the generated configuration. With this patch, all is well it seems - config changes go through as expected. This is a similar test plan for https://github.com/sourcegraph/sourcegraph/pull/63329 ## Changelog - Fix Prometheus Alertmanager configuration failing to apply `observability.alerts` from site config <br> Backport ffa873f from #63790 Co-authored-by: Robert Lin <robert@bobheadxi.dev>
Thread: https://sourcegraph.slack.com/archives/C1JH2BEHZ/p1717797870638299
One problem caused by this upgrade is that the deprecated
jaegerexporter
no longer builds at all with the last published version, so for the upgrade to go through it must be removed. I've updated localsg start
to work with this change, but some Release team support is needed for deployment configuration + customer-facing docs changes: https://sourcegraph.slack.com/archives/C1JH2BEHZ/p1718143249191349?thread_ts=1717797870.638299&cid=C1JH2BEHZ, since current guidance asks customers to configurejaegerexporter
.Part of https://linear.app/sourcegraph/issue/SEC-1680
Closes https://linear.app/sourcegraph/issue/CORE-177
Test plan
Followed steps shared in https://sourcegraph.slack.com/archives/C04MYFW01NV/p1718136211292469 to run locally, since
sg run jaeger otel-collector
alone is insufficient to get updated images:plus
sg wolfi lock opentelemetry-collector
will makesg run otel-collector
use the correct image.The above diffs SHOULD NOT be committed. The lock should happen post-merge.
Spot-checked some traces locally with:
Changelog
jaegerexporter
has been removed. Users ofexporter: { jaeger: ... }
in the OpenTelemetry Collector should useexporter: { otlp/jaeger: ... }
to send traces directly to Jaeger via its OTLP receiver.JAEGER_OTLP_GRPC_PORT
as well as the existingJAEGER_HOST
configuration.