Migrate from gRPC to armeria for testing agent in memory exporter #5314

anuraaga · 2022-02-07T03:19:56Z

There is currently a big gap between the published agent and our tests because for historical reasons we include gRPC in the testing exporter. This switches it to Armeria to avoid the gRPC dependency, which automatically switches the tests to using the okhttp export codepath.

...er/src/main/java/io/opentelemetry/javaagent/testing/exporter/AgentTestingLogsCustomizer.java

mateuszrzeszutek · 2022-02-07T11:28:50Z

...r/src/main/java/io/opentelemetry/javaagent/testing/exporter/AgentTestingExporterFactory.java

+        Arrays.asList(
+            AgentTestingTracingCustomizer.spanProcessor.forceFlush(),
+            AgentTestingLogsCustomizer.logProcessor.forceFlush());
+    CompletableResultCode.ofAll(results).join(10, TimeUnit.SECONDS);


WDYT about using OpenTelemetrySdkAccess instead? It flushes the meter provider too

Oops realized that we don't want to flush metrics here anyways (there's no such thing as pending metric exports really, all exports happen at random times and are valid)

trask

grpc without grpc 🤯

anuraaga · 2022-02-08T06:50:34Z

This PR seems to cause a core dump on Java 15 OpenJ9 reliably, anyone have an idea what could cause it?

https://github.com/open-telemetry/opentelemetry-java-instrumentation/runs/5104586391?check_suite_focus=true
https://github.com/open-telemetry/opentelemetry-java-instrumentation/runs/5103974755?check_suite_focus=true

https://github.com/open-telemetry/opentelemetry-java-instrumentation/suites/5214301289/artifacts/159630121

@laurit Maybe you've seen it before?

trask · 2022-02-08T22:41:33Z

This PR seems to cause a core dump on Java 15 OpenJ9 reliably, anyone have an idea what could cause it?

ya this is weird. it seems to consistently seg fault on the dubbo and camel tests, and none of the others.

and there's no consistency in the "current thread" in the core dumps between runs, which is normally what i'd use to search on to see if it's a known bug

and there's a lot of openj9 segfault issues

and even if we could run integration tests on Java 17 today, I don't think we could run them on openj9 17 due to #5051 (comment) and https://adoptopenjdk.net/archive.html?variant=openjdk8&jvmVariant=openj9

maybe it's best to go with the @SuppressWarnings route you mentioned yesterday for updating to OTel SDK 1.11.0 and postponing this PR for a bit?

i'm also ok with skipping those two tests on openj9 15 if we want to move forward with this PR

…nstrumentation into testing-agent-armeria

laurit · 2022-02-09T15:18:29Z

I played around with this a bit. Firstly reporting this to openj9 is probably futile. Afaik they build all their runtimes from the same code base and if the bug was still there then there is a good chance 8 & 11 would also fail similarly.
The bug goes away (tested with :instrumentation:apache-dubbo-2.7:javaagent:test) when executor instrumentation is disabled or when in AgentTestingCustomizer

    autoConfigurationCustomizer.addMeterProviderCustomizer(
        (meterProvider, config) ->
            meterProvider.registerMetricReader(
                PeriodicMetricReader.builder(AgentTestingExporterFactory.metricExporter)
                    .setInterval(Duration.ofMillis(300))
                    .newMetricReaderFactory()));

is commented out or when in AgentInstaller

    // If noop OpenTelemetry is enabled, autoConfiguredSdk will be null and AgentListeners are not
    // called
    AutoConfiguredOpenTelemetrySdk autoConfiguredSdk = null;
    if (config.getBoolean(JAVAAGENT_NOOP_CONFIG, false)) {
      logger.info("Tracing and metrics are disabled because noop is enabled.");
      GlobalOpenTelemetry.set(NoopOpenTelemetry.getInstance());
    } else {
      autoConfiguredSdk = installOpenTelemetrySdk(config);
    }

    if (autoConfiguredSdk != null) {
      runBeforeAgentListeners(agentListeners, config, autoConfiguredSdk);
    }

is moved after byte-buddy instrumenter is set up or when in AbstractExecutorInstrumentation java.util.concurrent.ForkJoinPool is removed from included executors list.
The key to this crash probably has something to to with java.util.concurrent.ForkJoinPool. While commenting out addMeterProviderCustomizer seems somewhat random I think it is still indirectly related to ForkJoinPool. Metrics send data to armeria which uses caffeine bounded cache, which uses ForkJoinPool. Changing interval Duration.ofSeconds(3) seems to also avoid the crash (idk maybe just makes it not repro on each run).
Similarly moving code around in AgentInstaller doesn't have an obvious relation to ForkJoinPool. ForkJoinPool is loaded early in AgentInstaller.installBytebuddyAgent from call to List<AgentListener> agentListeners = loadOrdered(AgentListener.class);. Seems like moving this block ensures that first metrics are sent after byte-buddy transformer is installed and byte-buddy has redefined ForkJoinPool.
The most unintrusive fix for this crash that I could come up with is to modify OtlpInMemoryMetricExporter so that it would fail export attempts before AgentListener.afterAgent is called.
To avoid similar issues in the future I'd urge maintainers to consider altering agent initialization sequence so that byte-buddy transformer is installed before sdk is initialized.

mateuszrzeszutek · 2022-02-09T17:16:55Z

To avoid similar issues in the future I'd urge maintainers to consider altering agent initialization sequence so that byte-buddy transformer is installed before sdk is initialized.

I don't think we can do that either - that could cause some instrumentations to initialize with the no-op OpenTelemetry implementation (since the global is set after SDK is initialized)

anuraaga · 2022-02-10T06:08:52Z

Thanks @laurit for the great digging! Delayed the metric export to after agent initializes

trask · 2022-02-10T06:13:39Z

...porter/src/main/java/io/opentelemetry/javaagent/testing/exporter/AgentTestingCustomizer.java

+  }
+
+  @SuppressWarnings("ImmutableEnumChecker")
+  private enum StartableMetricReader implements MetricReaderFactory, MetricReader {


can you add a brief comment explaining why this is needed (or just pointing to the PR discussion)?

anuraaga · 2022-02-10T06:54:17Z

For reference, I was expecting test slowdown with this PR going from in-memory transport to using localhost network traffic. After getting some builds through, dunno if I like it. Instead sent #5332

laurit · 2022-02-10T10:12:20Z

To avoid similar issues in the future I'd urge maintainers to consider altering agent initialization sequence so that byte-buddy transformer is installed before sdk is initialized.

I don't think we can do that either - that could cause some instrumentations to initialize with the no-op OpenTelemetry implementation (since the global is set after SDK is initialized)

I didn't say it would be easy :) There wouldn't be too many instrumentations that are affected, these could use an extra agent started check and bail out when sdk isn't ready yet. Perhaps this agent started check could even be baked into instrumenter api somehow? For another case where running initial retransformation concurrently with background thread started sdk code produces strangeness see #4697 Basically it boils down to replacing one set of problems with another, hopefully more manageable, set of problems.

Migrate from gRPC to armeria for testing agent in memory exporter

5c2b47b

anuraaga requested a review from a team February 7, 2022 03:19

anuraaga marked this pull request as draft February 7, 2022 05:12

jedis flush before clearing

49b59f4

anuraaga commented Feb 7, 2022

View reviewed changes

...er/src/main/java/io/opentelemetry/javaagent/testing/exporter/AgentTestingLogsCustomizer.java Show resolved Hide resolved

Other jedis's

c2bdfff

mateuszrzeszutek approved these changes Feb 7, 2022

View reviewed changes

anuraaga added 5 commits February 8, 2022 01:51

Better export clearing

a0f6cf0

Remove hacky forceFlush

2f9b809

Only use one server for collection and suppress noisy logs

c470ad7

Update test

1cfc741

Cleanup

34bb257

anuraaga mentioned this pull request Feb 8, 2022

Add max measurements to Micrometer Timer & DistributionSummary #5303

Merged

trask approved these changes Feb 8, 2022

View reviewed changes

Don't flush metrics when resetting

dc4d91a

anuraaga force-pushed the testing-agent-armeria branch from e06c774 to dc4d91a Compare February 8, 2022 04:17

Slow down metric export a bit

390b772

anuraaga force-pushed the testing-agent-armeria branch from 7059ef3 to 390b772 Compare February 8, 2022 05:49

anuraaga added 2 commits February 9, 2022 07:51

Merge branch 'main' of github.com:open-telemetry/opentelemetry-java-i…

e2907b5

…nstrumentation into testing-agent-armeria

Fix merge

d22ca7e

anuraaga added 2 commits February 10, 2022 02:36

Delay metrics export to post-agent initialization.

d9d8e1c

Batch export requests

1d47a04

anuraaga force-pushed the testing-agent-armeria branch from 1b1db74 to 1d47a04 Compare February 10, 2022 05:21

anuraaga mentioned this pull request Feb 10, 2022

Remove separate exporter configuration which we always have in distro… #5331

Merged

anuraaga marked this pull request as ready for review February 10, 2022 06:08

trask reviewed Feb 10, 2022

View reviewed changes

trask approved these changes Feb 10, 2022

View reviewed changes

anuraaga mentioned this pull request Feb 10, 2022

Use internal Marshaler in testing exporters instead of gRPC. #5332

Merged

anuraaga marked this pull request as draft February 10, 2022 06:54

anuraaga closed this Feb 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate from gRPC to armeria for testing agent in memory exporter #5314

Migrate from gRPC to armeria for testing agent in memory exporter #5314

anuraaga commented Feb 7, 2022

mateuszrzeszutek Feb 7, 2022

anuraaga Feb 8, 2022 •

edited

Loading

trask left a comment

anuraaga commented Feb 8, 2022 •

edited

Loading

trask commented Feb 8, 2022

laurit commented Feb 9, 2022

mateuszrzeszutek commented Feb 9, 2022

anuraaga commented Feb 10, 2022

trask Feb 10, 2022

anuraaga commented Feb 10, 2022

laurit commented Feb 10, 2022

Migrate from gRPC to armeria for testing agent in memory exporter #5314

Migrate from gRPC to armeria for testing agent in memory exporter #5314

Conversation

anuraaga commented Feb 7, 2022

mateuszrzeszutek Feb 7, 2022

Choose a reason for hiding this comment

anuraaga Feb 8, 2022 • edited Loading

Choose a reason for hiding this comment

trask left a comment

Choose a reason for hiding this comment

anuraaga commented Feb 8, 2022 • edited Loading

trask commented Feb 8, 2022

laurit commented Feb 9, 2022

mateuszrzeszutek commented Feb 9, 2022

anuraaga commented Feb 10, 2022

trask Feb 10, 2022

Choose a reason for hiding this comment

anuraaga commented Feb 10, 2022

laurit commented Feb 10, 2022

anuraaga Feb 8, 2022 •

edited

Loading

anuraaga commented Feb 8, 2022 •

edited

Loading