Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge descriptor type and data points into Data. #202

Merged
merged 1 commit into from
Aug 21, 2020

Conversation

bogdandrutu
Copy link
Member

@bogdandrutu bogdandrutu commented Aug 18, 2020

This PR does not change any semantics, it just re-organize the protos:

  • Removes MetricDescriptor and adds fields to the Metric directly
  • Combines MetricDescriptor.Type with specific DataPoints.

This has the following advantages:

  • Require only one allocation more than the initial version (v0.4.0). When using gogo proto.
  • In-memory representation is smaller than the initial version (v0.4.0).
  • Extensible: allows to add new data types in the Metric.
  • Groups together metadata and points about different aggregation types.
  • Removes possibility for receiving multiple data points types, which was an undefined behavior.

Fixes #189

Signed-off-by: Bogdan Drutu bogdandrutu@gmail.com

@bogdandrutu bogdandrutu requested review from a team August 18, 2020 23:57
@bogdandrutu
Copy link
Member Author

@tigrannajaryan in the Attributes you decided to use oneof instead of the list of all possibilities, I did the same here because most likely we will have a list of more than 5 items as possibilities for the data type, and I remember that you said is ok.

oneof type {
// TODO: Determine if encoding all possible values in a uint64 bitset
// improves performance significantly and propose that if that is the case.
oneof data {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are paying the cost of oneof here I would suggest to split Gauge into 2 separate messages, one per data type and add both to this oneof, i.e.:

oneof data {
    GaugeIn64 gauge_int = 4;
    GaugeDouble gauge_double = 5;
    ...
}

Same for Sum.

By doing so we will eliminate the need to store the empty unused slice for each Gauge and each Sum. It will reduce memory usage.

You can likely also eliminate MeasurementValueType. It is redundant since that information will be encoded in oneof.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts here:

  1. MeasurementValueType cannot be completely removed because still needed in places like Histogram, unless we double that type as well. There may be other aggregations like that that will not have to expose different types based on the measurement value type (e.g. Summary that will most likely be re-added).
  2. I do agree that having a GaugeInt/GaugeDouble is simple enough, for Sum I am a bit worried to duplicate all the fields.

In general case we will have >5 data points per metrics (I remember this from a previous backend that I used), so indeed we may argue that splitting into int/double data points may save some memory for the cost of maintenance of duplicate code.

Another option is to merge #172 and simply don't care about the in-memory extra 8 bytes per point (no effect on the marshal/unmarshal or network size).

Any ideas?

My preference would be to simplify the model and merge #172 and keep the current MeasurementValueType as part of the Sum/Gauge/Histogram (not in the metric because it may not apply to some aggregation)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MeasurementValueType cannot be completely removed

Correct. What I wanted to say is to remove the measurement_value_type field from Gauge/GaugeInt, not remove the MeasurementValueType enum definition.

I do agree that having a GaugeInt/GaugeDouble is simple enough, for Sum I am a bit worried to duplicate all the fields.

Sum is currently 5 fields and I suggest to remove 2. Only 3 fields will remain, 2 of which will be the same (duplicate) for SumInt64 and SumDouble. Not big deal unless we think more common fields will be added in the future.

Another option is to merge #172 and simply don't care about the in-memory extra 8 bytes per point (no effect on the marshal/unmarshal or network size).

That works too. Did you do any benchmarking to see if it can hint to a more performant approach? Other than performance I don't have a strong preference for one of the discussed options, we can use either, both have their own benefits and drawbacks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I promise a follow-up immediately when this is merged to resolve this with benchmark numbers. For the moment the main change proposed here is to unify metadata about aggregation with the data so I want to move forward with that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #206 to track this.

opentelemetry/proto/metrics/v1/metrics.proto Outdated Show resolved Hide resolved
opentelemetry/proto/metrics/v1/metrics.proto Outdated Show resolved Hide resolved
opentelemetry/proto/metrics/v1/metrics.proto Outdated Show resolved Hide resolved
opentelemetry/proto/metrics/v1/metrics.proto Outdated Show resolved Hide resolved

// Numerical value of the measurement that was recorded. Only one of these
// two fields is used for the data, based on MeasurementValueType.
double double_value = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to Tigran's question above, should we distinguish raw integer and raw double points?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just moved from the top of the file. But I tend to agree with #172 proposal you had, and just ignore 8bytes in memory overhead in order to simplify and not duplicate all the fields.

Copy link
Contributor

@jkwatson jkwatson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me. Just one tiny suggestion on a clarification on the excellent diagram.

This PR does not change any semantics, it just re-organize the protos:
* Removes MetricDescriptor and adds fields to the Metric directly
* Combines MetricDescriptor.Type with specific DataPoints.

This has the following advantages:
* Require only one allocation more than the initial version (v0.4.0). When using gogo proto.
* In-memory representation is smaller than the initial version (v0.4.0).
* Extensible: allows to add new data types in the Metric.
* Groups together metadata and points about different aggregation types.

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
@bogdandrutu bogdandrutu merged commit 8b80e3f into open-telemetry:master Aug 21, 2020
@bogdandrutu bogdandrutu deleted the mergetypeanddata branch August 21, 2020 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Is performance important enough for the MetricDescriptor Type to encode it as a unit64?
5 participants