Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spike] Research how to persist agg data when switching Vis Types for Vis Builder #2900

Closed
1 task
ashwin-pc opened this issue Nov 18, 2022 · 3 comments
Closed
1 task
Assignees

Comments

@ashwin-pc
Copy link
Member

ashwin-pc commented Nov 18, 2022

This task will be broken into smaller tasks:

@ashwin-pc ashwin-pc changed the title [Spike] Research how to persis agg data when switching Vis Types for Vis Builder [Spike] Research how to persist agg data when switching Vis Types for Vis Builder Nov 23, 2022
@ashwin-pc
Copy link
Member Author

All vis types define the data they can display as schemas. Each schema definition defines the group, aggFilters and min and max number of values that the field can aggregated field is allowed to contain. Each field that is already selected in the previous visualization has a schema group and aggregation property set. One simple solution is to simply map these aggregated fields to the closest available schema. If a field cannot be mapped to a schema, it should be dropped.

Open questions:

  • If a field's agg type is not supported but an alternate agg type is, should we persist that field and change the agg type?
  • If we dont drop or alter a fields agg type, when the vis type is switched back to the original vis type, should we get the original visualization.

@abbyhu2000 abbyhu2000 added the v2.5.0 'Issues and PRs related to version v2.5.0' label Dec 15, 2022
@BSFishy BSFishy removed the v2.5.0 'Issues and PRs related to version v2.5.0' label Dec 28, 2022
@abbyhu2000
Copy link
Member

abbyhu2000 commented Jan 12, 2023

Add artifacts (design doc) and timeline @abbyhu2000

@abbyhu2000
Copy link
Member

abbyhu2000 commented Feb 18, 2023

Aggregation Persistence in Vis Builder Design Doc

Problem Statement

In Vis Builder, when a user creates a visualization, all the fields value and aggregations they added will be cleared after they switch between different visualization types. After implementing global query persistence and app persistence, Vis Builder should also be able to persist values across compatible visualization types, and ideally between incompatible visualizations to a possible degree.

Background

Currently, all visualization types define the data they can display as schemas. Each schema definition defines the group, aggFilters , min and max number of values that the field is allowed to contain. Each field that is already selected in the previous visualization has a schema group and aggregation property set.

The current five visualization types with their schemas:

Screen Shot 2023-02-17 at 4 31 03 PM

Here are the schema information for each visualization type with more details and an example:

  1. Line

Screen Shot 2023-02-17 at 4 37 04 PM

Screen Shot 2023-02-17 at 4 38 51 PM

  1. Histogram

Screen Shot 2023-02-17 at 4 39 49 PM

Screen Shot 2023-02-17 at 4 40 10 PM

  1. Area

Screen Shot 2023-02-17 at 4 40 36 PM

Screen Shot 2023-02-17 at 4 40 53 PM

  1. Metric

Screen Shot 2023-02-17 at 4 41 33 PM

Screen Shot 2023-02-17 at 4 41 51 PM

  1. Table

Screen Shot 2023-02-17 at 4 42 17 PM

Screen Shot 2023-02-17 at 4 42 42 PM

Schema Example

Each schema field is defined with 7 properties:

  1. Group: Metric/Bucket
  2. Name: The name of the field; each vis type schema can define their own name
  3. Title: The title of the field that will be shown on the UI
  4. Min/Max: the amount of aggregations that can be added to the field
  5. AggFilter: Define a list of filters that can be applied to the aggregation; “!” means the filter can not be applied
  6. Defaults: Other default configs
      schemas: new Schemas([
          {
            group: AggGroupNames.Metrics,
            name: 'metric',
            title: i18n.translate('visTypeVislib.line.metricTitle', {
              defaultMessage: 'Y-axis',
            }),
            min: 1,
            max: 3,
            aggFilter: ['!geo_centroid', '!geo_bounds'],
            defaults: { aggTypes: ['median'] },
          },
          {
            group: AggGroupNames.Buckets,
            name: 'segment',
            title: i18n.translate('visTypeVislib.line.segmentTitle', {
              defaultMessage: 'X-axis',
            }),
            min: 0,
            max: 1,
            aggFilter: ['!geohash_grid', '!geotile_grid', '!filter', '!filters'],
            defaults: { aggTypes: ['date_histogram', 'terms'] },
          },
          {
            group: AggGroupNames.Buckets,
            name: 'group',
            title: i18n.translate('visTypeVislib.line.groupTitle', {
              defaultMessage: 'Split series',
            }),
            min: 0,
            max: 3,
            aggFilter: ['!geohash_grid', '!geotile_grid', '!filter'],
            defaults: { aggTypes: ['terms'] },
          },
          {
            group: AggGroupNames.Buckets,
            name: 'split',
            title: i18n.translate('visTypeVislib.line.splitTitle', {
              defaultMessage: 'Split chart',
            }),
            min: 0,
            max: 1,
            aggFilter: ['!geohash_grid', '!geotile_grid', '!filter'],
            defaults: { aggTypes: ['terms'] },
          },
          {
            group: AggGroupNames.Metrics,
            name: 'radius',
            title: i18n.translate('visTypeVislib.line.radiusTitle', {
              defaultMessage: 'Dot size',
            }),
            min: 0,
            max: 1,
            aggFilter: ['count', 'avg', 'sum', 'min', 'max', 'cardinality'],
            defaults: { aggTypes: ['count'] },
          },
        ]),

Visualization schema render flow:

Screen Shot 2023-02-07 at 12 16 07 PM

Possible Approaches

1. Strict Mapping

One simple solution is to simply define a mapping that maps each aggregated fields to the closest available schema for every single visualization type combination. If a field cannot be mapped to a schema, it will be dropped. A proposed mapping is shown below:

metric/Y-axis → metric/Y-axis | metric/Metric | bucket/split rows | group/split group
segment/X-axis → segment/X-axis | bucket/split rows | group/split group
group/Split series → group/Split series | bucket/split rows | group/split group
split/Split chart → split/Split chart | split_row/split table in rows | group/split group
radius/dot size → split/Split chart | split_row/split table in rows | group/split group
metric/Metric → metric/Y-axis | segment/X-axis | split/Split chart | split_row/split table in rows | group/split group
group/split group → group/split group | bucket/split rows | split_row/split table in rows
bucket/split rows → group/split group
split_row/split table in rows → split/Split chart | group/split group
split_column/split table in columns → split/Split chart | group/split group

Implementation idea:

A draft PR that implements the mappings among Area, Line and Histogram: #3158

Pros:

  • These mapping works well among Area, Line and Histogram since they have very similar schema definition. They all have Y axis, X axis, split series and split chart. Sometimes user might just wish to conveniently switch among these three similar types to see which one fit their need the most, and it will save them a lot of time and effort if we have persistence for the aggregation fields.

Cons:

  • This approach is not scalable since it defines specific mapping for each individual visualization type. It will need to be reconfigured every time a new visualization type is added. The mapping rules will become more and more complicated.
  • While it works well among Area, Line and Histogram visualization types, it is less ideal when switching among non-compatible visualization types. For example, switching between table with other visualization type may not make sense for users.

Update schema to group similar visualization types, then strict mappings within each group

Since approach one works well among similar visualization types but it is not scalable, another solution is to update the schema for each visualization type to include a group property. Each group have their unique identifier, and only similar visualization types belong to the same group. When there is new visualization type added, we should determine and assign the new type to a specific group as developers. Within each group, the schemas of those visualization types should be fairly similar so that a direct mapping among the aggregations will return us a new visualization that makes sense. If the new visualization type is too different from any other existing types, we could also have a group where it just won’t persist any aggregations.

Implementation idea:

For example, we can assign Area, Line and Histogram to the same group, Metric and Table Vis to another group. We can update their schema as:

createMetricConfig = (): VisualizationTypeOptions<MetricOptionsDefaults> => ({
  name: 'metric',
  ...
  ui: {
    containerConfig: {
      data: {
        schema-group: SchemaGroupNames.Graph,
        schemas: new Schemas([
        ...
createAreaConfig = (): VisualizationTypeOptions<AreaOptionsDefaults> => ({
  name: 'area',
  ...
  ui: {
    containerConfig: {
      data: {
        schema-group: SchemaGroupNames.Table,
        schemas: new Schemas([

Pros:

  • Since this approach only does strict mappings of aggregations for highly similar visualization types, the new visualization after persistence will most likely make sense and look accurate.
  • It is more scalable since developers will only need to determine which group the new visualization type belongs to based on the schema features.

Cons:

  • It could be hard to determine which group it should be assigned for some visualization type.
  • It might be a confusing user experience since some visualizations might have aggregation persistence while others don’t.

###X axis to X axis, Y axis to Y axis from expression
Since all the visualization are 2D graphs, another solution is to map X axis fields to X axis fields, and Y axis field to Y axis field.

Implementation Idea

The X axis and Y axis information are stored in ‘expression’. Expression contains everything that is needed to render that specific visualization and it is obtained by calling function toExpression(). It is then being passed into component to render onto the workspace canvas. Expression is in a format of string, and here is an example expression for a visualization with one field on X-axis and one field on Y-axis.

As shown below, the expression contains a lot of information for every single field. For example, order date is the field on X-axis, and it also contains other properties such as format id, param pattern, param intervalEsValue and intervalEsUnit, min and max bound, format and aggType. Since each field can have different aggregation properties, and the expression string can get really nested if we add more fields to the visualization, the attempt of mapping X-axis and Y-axis directly from the expression might be complicated.

opensearchDashboards
| opensearchaggs 
  index="ff959d40-b880-11e8-a6d9-e546fe2bba5f" 
  metricsAtAllLevels=false 
  partialRows=false 
  aggConfigs="[{\"id\":\"1\",\"enabled\":true,\"type\":\"date_histogram\",
                \"params\":{\"field\":\"order_date\",
                \"timeRange\":{\"from\":\"now-24h\",\"to\":\"now\"},
                \"useNormalizedOpenSearchInterval\":true,
                \"scaleMetricValues\":false,\"interval\":\"auto\",
                \"drop_partials\":false,\"min_doc_count\":1,
                \"extended_bounds\":{}},\"schema\":\"segment\"},
                {\"id\":\"2\",\"enabled\":true,\"type\":\"median\",
                \"params\":{\"field\":\"products.base_price\",
                \"percents\":[50]},\"schema\":\"metric\"}]" 
  includeFormatHints=false
| vislib 
  type="line" 
  visConfig="{\"addLegend\":true,\"legendPosition\":\"right\",
              \"addTimeMarker\":false,\"addTooltip\":true,
              \"dimensions\":{
              \"x\":{\"accessor\":0,
              \"format\":{\"id\":\"date\",
              \"params\":{\"pattern\":\"HH:mm\"}},
              \"params\":{\"date\":true,\"interval\":\"PT30M\",
              \"intervalESValue\":30,\"intervalESUnit\":\"m\",
              \"format\":\"HH:mm\",\"bounds\":{\"min\":\"2023-02-06T17:29:20.602Z\",
              \"max\":\"2023-02-07T17:29:20.602Z\"}},
              \"label\":\"order_date per 30 minutes\",
              \"aggType\":\"date_histogram\"},
              \"y\":[{\"accessor\":1,
              \"format\":{\"id\":\"number\",
              \"params\":{\"parsedUrl\":{\"origin\":\"http://localhost:5603\",
              \"pathname\":\"/ifz/app/vis-builder/\",\"basePath\":\"/ifz\"}}},
              \"params\":{},
              \"label\":\"Median products.base_price\",
              \"aggType\":\"median\"}]},
              \"valueAxes\":[{\"id\":\"ValueAxis-1\",\"labels\":{\"show\":true},
              \"name\":\"ValueAxis-1\",\"position\":\"left\",
              \"scale\":{\"type\":\"linear\",\"mode\":\"normal\"},
              \"show\":true,\"style\":{},
              \"title\":{\"text\":\"Median products.base_price\"},
              \"type\":\"value\"}]}"

Pros:

  • This approach is simple to understand, and it is less likely to cause confusion for end users.
  • It is scalable since all future visualization types will be 2D, and they will all have a X-axis and a Y-axis so there is no need to introduce new mappings.

Cons:

  • The implementation is more complicated and more involved as illustrated above.

###Metric data to metric data, bucket data to bucket data
All schema are divided into two categories: metric and bucket. Metric field means the data is numerical, and bucket field means the data is categorical. Since numerical and categorical data tend to serve different purposes in a visualization, another approach is to map all the metric field to metric field, bucket field to bucket field.

Implementation Idea

Since each schema field has a property group, and it will either be AggGroupNames.Metrics or AggGroupNames.Buckets . We can collect a list of aggregation that belongs to metrics group and another list of bucket group, and map them to the new visualization type’s metrics group and bucket group.

schemas: new Schemas([
          {
            group: AggGroupNames.Metrics,
            ...
            min: 1,
            max: 3,
          },
          {
            group: AggGroupNames.Buckets,
            ...
            min: 0,
            max: 1,
          },

export const AggGroupNames = Object.freeze({
  Buckets: 'buckets' as 'buckets',
  Metrics: 'metrics' as 'metrics',
  None: 'none' as 'none',
});

Pros:

  • The rules are simple to follow and it is scalable since all schema fields will belong to either one of the group.

Cons:

  • Some aggregation mappings might not make sense when switch to a new visualization type. It might introduce confusing user experience.
  • There need to be further mappings rules introduced since each schema might have multiple metrics group and multiple bucket groups. Each metric or bucket group might have different bounds for min and max number as well. We need to define a rule on the order of mappings, and what happened if we there are more fields than what can be mapped.

No Mapping

Another option is to not introduce any persistence on the aggregations when switching visualization types. User experience for Vis Builder remains the same.

Pros:

  • User experience stays the same, and avoid the situation of over-engineering that may introduce extra confusions.
  • Some users may expect an empty workspace and prefer to not have any persistence when they switch visualization types.

Cons:

  • User may lose their existing progress when accidentally switching visualization types.
  • User will not have the experience of persisting their previous aggregations and displaying in another visualization type.

Determine the best fit visualization type based on existing aggregations (Out of scope)

The most ideal solution is that based on the existing aggregations selected by the users, Vis Builder automatically determines and recommends the visualization types that can best illustrate the aggregations. It shows users all the options and user can simply just choose the illustration that best fits their needs.

Pros:

  • Best user experience, and it eases user’s need to switch between visualization types to see which one is the best fit.

Cons:

  • This approach has a high level of technical complexity. It is out of scope for now.

Proposals

To avoid over-engineering and introducing confusing user flow, I propose that we should keep the mapping rule simple and scalable, with the addition of giving users option to either have this aggregation persistence feature or not.

UI/UX proposal:

  • Persist on default, and have a mechanism/button for users to reset the page.
  • Add a toggle button on the Vis Builder page to let users indicate either to have this feature on or off
  • On the pop up window(as shown below) after user switch the visualization type, add another button or toggle saying Change type and persisting current aggregations.
  • drop down with some reminder

Mapping proposal:

For the mapping rules, I propose that we use approach #4 which is simply mapping all the metric group to metric group, bucket group to bucket group.

Since metric field is mostly for displaying numerical data, and bucket field is mostly for separating data into groups depending on how a visualization graph can be split up, i think it makes sense to map the aggregations to the fields according to their functionalities. For the ones that previously in a metric group, the user’s intent is probably to just display those data against some type of units. So if we map them into a new metric group in another visualization type, those data will still be displayed but just in a different format. For the ones that previously in a bucket group, the user’s intent is probably to break the global data into separate groups and observe if there will be any patterns existing in each group. If we map those into a new bucket group, we are still following the user’s intent of separating global data into groups.

Here are the mapping rules:

  • Collect a list of aggregations that are in metric group, and a list of aggregations that are in bucket group.
  • For aggregations that previously belonged to metric group, starting adding them to the new metric field that have the most max count allowed, and drop the ones that can no longer be mapped to any metric field.
  • For aggregations that previously belonged to bucket group, starting adding them to the new bucket field that have the most max count allowed, and drop the ones that can no longer be mapped to any bucket field.

Questions & Concerns

  • One major concern is that this aggregation persistence feature should not introduce any confusing user experience. As each visualization type has different properties and features, it might not make sense for some visualization type to share aggregations at all.
  • Need to consult with PM and UX more to see if which one is the best user experience.

Reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants