Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protobuf changes test plan #11049

Closed
Tracked by #11039
kruskall opened this issue Jun 22, 2023 · 5 comments
Closed
Tracked by #11039

protobuf changes test plan #11049

kruskall opened this issue Jun 22, 2023 · 5 comments

Comments

@kruskall
Copy link
Member

APM Server version (apm-server version): 8.9

Description of the problem including expected versus actual behavior:

The protobuf work has been split into many PRs. Adding each one of them to the test plan is not ideal as they wouldn't be able to be tested separately.
I'm opening this to track testing. I wrote some of the things to look out for here (elastic/apm-data#52 (comment)) but I'll list them down:

  • models are no longer using map[string]any directly, instead the data from APM Agents is mapped to structpb.Struct and then converted to map[string]any in modeljson. There is an additional point of failure in this approach because the creation of structpb.Struct could fail. If that happens, we lose data.
  • protobuf models are using pointers everywhere. Look out for nil pointer dereference. event.Foo.Bar was safe before but now it's not.
  • IPs are stored as strings as opposed to netip.Addr. Make sure we are not converting an invalid IP (invalidNetip.String() will produce "invalid ip", we probably shouldn't encode that).
  • protobuf models are passing by reference (they are not safe to copy), if we need a copy we should use proto.Clone

What should be tested and what changed from the previous version ?

  • we're still ingesting JSON (input models), we need to make sure the intermediate mapping to protobuf models and modeljson is not causing panics or losing data.
  • tail-based sampling is using protobuf instead of JSON
  • validate performance
@axw
Copy link
Member

axw commented Jul 4, 2023

  • tested with the Python agent, with a simple Flask application + custom context with a mixture of integers, floats, and strings
  • tested with the Go agent, with a fake net/http.Request.RemoteAddr set to both valid and invalid IP addresses
  • tested with canned RUMv2 data (sending to /intake/v2/rum/events)
  • tested with apmsoak, sending a variety of canned data
  • tested with sendotlp

Tail-based sampling is known to be working, as we have system tests that cover this. The codec interface is very straightforward, so nothing really to do here; I opened a minor enhancement PR: #11137.

I have previously run some fuzz testing, and I know @kruskall has too, so I'm going to put the rest of my energy into poring over code.

@kyungeunni
Copy link
Contributor

kyungeunni commented Jul 4, 2023

Created 8.8.2 and 8.9.0-SNAPSHOT deployments and generated events by running:

  • RUM agent with a simple website
  • Go Agent with a simple server running on my machine

Then compared documents indexed in the two deployments. There are no differences(except ids and timestamps) and all the fields seem to be identical, no odd differences were observed.

I haven't tested Tail-based sampling, but as Andrew mentioned above that we have system tests

@axw
Copy link
Member

axw commented Jul 5, 2023

Found a panic related to translating Jaeger spans, fixed by elastic/apm-data#104

@axw
Copy link
Member

axw commented Jul 5, 2023

I've been through apm-data inputs, modeldecoders; tail-based sampling; apm-server aggregation code. The above panic is the only issue I've found. "Absence of Evidence does not mean Evidence of Absence", but I think we can call this done.

@axw
Copy link
Member

axw commented Jul 5, 2023

@carsonip has also done some benchmarking, and preliminary results show both a significant speedup and significant reduction in allocations compared to previous releases. Marking this as done.

@axw axw closed this as completed Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants