-
Notifications
You must be signed in to change notification settings - Fork 509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vparquet4 encoding #3368
Add vparquet4 encoding #3368
Conversation
6ff9743
to
feb53f0
Compare
At this point vParquet3 and vParquet4 are identical
* Count dropped attributes and store them in separate column * Remove current support for arrays and kv-lists
* Store per-trace service statistics in vparquet4 blocks * Update pkg/traceql/combine.go * Update modules/frontend/search_progress.go * Update pkg/traceql/engine.go * Pre-allocate finalSpanset.ServiceStats * drop MaxServiceStats limit --------- Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com> Co-authored-by: A. Stoewer <adrian@stoewer.me>
* Fix skipping of dropped attributes The attribute dropping introduced in grafana#3123 wans't implemented correctly * Implement single values as arrays and introduce type column * Add support for array attributes * Tests more array values and fix attribute conversion * Replace individual pointer helper functions * Use a combination of snappy and delta for ValueType * Vendor module google/go-cmp/cmp/cmpopts * Use all types of array attributes in TestTraceToParquet * Improve TestFieldsAreCleared
* links and events schema changes * update tests * update test * lint * fixed tests after rebase * lint * addressed comments * lint version * reduce span to just start time for events conversion
* Add attributes to instrumentation scope Attributes are not filled as they are not supported in current proto * Populate instrumentation scope in test data * Use delta encoding for dropped attributes count columns * More consistent naming
72c01b2
to
580d50f
Compare
The test use values are tuned to current blocks from ops in order to have correct matching / non-matching queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding comments, but I'm not 100% done. I'm quite close and only have the second half of schema.go left
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a full pass of the PR now and all concerns are currently in unresolved conversations.
* Fix combining service stats by using max() * update comment Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
@andreasgerstmayr thanks for submitting the PRs |
#3) * Recalculate ServiceStats in combiner by moving it to assignNestedSetModelBounds() * add additional tests Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
* Move serviceStats iterator to second pass * fix nil map assignment Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
a1bbc66
to
9ab5514
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was well reviewed a month or so back. A few questions and thoughts but this is quite close.
|
||
func attrToParquetTypeUnsupported(a *v1.KeyValue, p *Attribute) { | ||
jsonBytes := &bytes.Buffer{} | ||
_ = jsonMarshaler.Marshal(jsonBytes, a.Value) // deliberately marshalling a.Value because of AnyValue logic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we do proto marshalling here for perf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use json for ValueKVList
and ValueArray
in vParquet3. Those columns are mostly empty and therefore I don't expect a big performance impact here.
Using JSON has the advantage that it is a bit easier to inspect those columns with parquet-cli
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we swapped span links and event attributes to proto but left others as json?
https://github.com/grafana/tempo/blob/main/tempodb/encoding/vparquet3/schema.go#L558
https://github.com/grafana/tempo/blob/main/tempodb/encoding/vparquet3/schema.go#L671
i'm fine with leaving these as json. likely they will be rare. we should keep an eye on profiles as we roll this out to see if there is significant impact on resource usage.
@stoewer @andreasgerstmayr this is very close. thank you for your work. there are 3 threads left to resolve. two of which are covered by andreas's PR: stoewer#5 if we push on this i think we can get it quite soon. let me know when you're ready for a final review 🙏 |
* make ServiceStats an intrinsic * skip allocating ServiceStats in rebatchIterator if already allocated Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
The regression was introduced by making ServiceStats an intrinsic
Use the same queries in vp3 and vp4
* Copy vparquet3 to vparquet4 folder * Create initial vParquet4 encoding * vParquet4: handle unsupported / dropped attributes (grafana#3123) * Count dropped attributes and store them in separate column * Remove current support for arrays and kv-lists * Store per-trace service statistics in vparquet4 blocks (grafana#2941) * Add support for attributes with array values (grafana#3221) * Implement single values as arrays and introduce type column * Add support for array attributes * Tests more array values and fix attribute conversion * Use a combination of snappy and delta for ValueType * Vendor module google/go-cmp/cmp/cmpopts * Use all types of array attributes in TestTraceToParquet * Improve TestFieldsAreCleared * links and events schema changes (grafana#3163) * Add attributes to instrumentation scope (grafana#3322) * Fix typo in vParquet4 event name enconding (grafana#3336) * Precalculate and reuse the vParquet4 schema before opening blocks * Convert block in vparquet4/test-data directory * CHANGELOG.md * Update test cases in BenchmarkBackendBlockTraceQL * Unsupported attribute values are no longer dropped * Replace ValueType column with IsArray column * Skip service stats map allocation when none are present --------- Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com> Co-authored-by: Andreas Gerstmayr <agerstmayr@redhat.com> Co-authored-by: Jennie Pham <94262131+ie-pham@users.noreply.github.com> Co-authored-by: Andreas Gerstmayr <andreas@gerstmayr.me>
What this PR does:
This adds a new encoding vParquet4 to Tempo (the default encoding remains unchanged). The new encoding improves support for links, events, and attributes with array values. This is a prerequisite better support of links, events, and arrays in TraceQL as proposed by the TraceQL Extensions design proposal.
The vParquet4 encoding is considered experimental for now.
There are still some things missing from this PR, those will be subject to separate PRs:
Hints for reviewers: This PR was created from the commits to the
vparquet3
directory in the branchmain-vparquet4
(diff). Since this PR adds a whole new directoryvparquet4
, it can be helpful to review the individual PRs in order to see the differences between vParquet3 and vParquet4.Which issue(s) this PR fixes:
Fixes grafana/tempo-squad#384
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]