-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SL-1235] [Feature] Ambiguous Group-By-Item Resolution #887
Labels
enhancement
New feature or request
High priority
Created by Linear-GitHub Sync
linear
triage
Tasks that need to be triaged
Comments
plypaul
added
enhancement
New feature or request
triage
Tasks that need to be triaged
labels
Nov 17, 2023
Jstein77
changed the title
[WIP] [Feature] Ambiguous Group-By-Item Resolution
[SL-1235] [WIP] [Feature] Ambiguous Group-By-Item Resolution
Nov 21, 2023
Jstein77
added
High priority
Created by Linear-GitHub Sync
Metricflow
Created by Linear-GitHub Sync
Metricflow Gap
Created by Linear-GitHub Sync
labels
Nov 21, 2023
merged 2 PRs, a few more to cut this week. |
plypaul
changed the title
[SL-1235] [WIP] [Feature] Ambiguous Group-By-Item Resolution
[SL-1235] [Feature] Ambiguous Group-By-Item Resolution
Nov 30, 2023
courtneyholcomb
pushed a commit
that referenced
this issue
Nov 30, 2023
These pattern classes are used to model user inputs when group-by items are specified through the query interface or specified in a filter. These patterns allow for ambiguous user inputs of group-by items e.g. a time dimension with an unknown grain. For that case, the ambiguous input makes it easier for the user to author queries as figuring out the time grain requires inspection of the configs. For more details, please see: #887
Jstein77
removed
Metricflow
Created by Linear-GitHub Sync
Metricflow Gap
Created by Linear-GitHub Sync
labels
Feb 8, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
enhancement
New feature or request
High priority
Created by Linear-GitHub Sync
linear
triage
Tasks that need to be triaged
Is this your first time submitting a feature request?
Describe the feature
Background
Previously, group-by-items were input by the user in a relatively specific form. For example, the group-by-item:
refers to the
created_at
time dimension at a month grain that is resolved by joining the measure source to the dimension sources by theguest
andlisting
entities.We have since migrated from that interface to allow for additional naming formats for specifying group-by-items, and to allow for a more ambiguous specification.
Specification Updates
Additional Naming Format (Object Builder)
The object builder format uses notation similar to the creation of an object in Python (or similar language) using the builder pattern. For example:
specifies the
metric_time
dimension at the month grain.Ambiguous Specification of Group-By-Items
Challenges
{{ Dimension('listing__country') }}
) is used.listing__country
) is used.Proposed Approach
To support these changes, we need a query resolver that can figure out the mapping between an ambiguous group-by-item that the user has specified to a concrete dimension in a semantic model. The proposal for building the query resolver is to:
Model Group-By-Item Inputs as Patterns / Filters
The first part of the proposed solution is to introduce a set of pattern classes that capture the desired request from the user. With the pattern classes, we can map varied user input (which can be strings in different naming formats or interface objects) to a single type of input into the resolver.
This layer of indirection provides a single type of input into the query resolver. This reduces the conditionals required and therefore, reduces cases that need to be written and tested. As suggested, these patterns would describe how to select a group-by-item from a list of available ones.
For example:
Once the user input is mapped to pattern instances, resolution of ambiguous group-by-items can be handled via a DAG that models the resolution behavior.
Model Group-By-Item Resolution As A DAG
The resolution of available / valid group-by-items for a metric query can be modeled as a DAG. For a node in the DAG, the parent nodes represent the set of objects that provide the valid group-by-items that are then intersected to determine the available group-by-items for that node.
More specifically, the available group-by-items for a query is the intersection of the available group-by-items for the metrics in a query. Likewise, following the recursive definition, the available group-by-items for a metric are the intersection of the available group-by-items for the constituent metrics. For a base metric, the available group-by-items are the intersection of the group-by-items available for the constituent measures.
The DAG helps to guide writing the recursive code to handle resolution and aids development debugging by providing a visualization of the call stack.
As an example, consider the set of metric definitions below and the associated query:
The query above would result in the resolution DAG:
Resolve Ambiguous Group-By-Items as a Push-Down Process
As alluded to earlier, the group-by-items available for a given node in the resolution DAG is the intersection of the group-by-items available for each of the parent nodes. Resolving ambiguous group-by-items can be modeled as a push-down process where the candidate group-by-items are pushed down from root nodes to the leaf node, and the candidates are intersected along the way.
During the push-down process, if the intersection of the candidates from the parent nodes produces an empty set, an error can be generated that includes the path from the leaf node to help the user better diagnose the issue.
In the current proposal, the root nodes represent the measures used to compute metrics, and the leaf node is the query containing the metrics requested by the user.
Following this setup, the various conditions for ambiguous resolution can be realized by tweaking the initial set of candidate group-by-items in the root nodes (measures), and the selection behavior at the leaf node (the query).
Ambiguous Group-By-Item Specified in a Query:
A representation of the process for an ambiguous group-by-item named
metric_time
for metrics['simple_metric_0', 'simple_metric_1']
in a query is shown below.measure_0
is matched to the pattern formetric_time
. The candidates at this node are[TimeDimension(‘metric_time’, ‘day’), TimeDimension(‘metric_time’, ‘month’)]
.measure_1
is matched to the pattern formetric_time
. The candidates at this node are[TimeDimension(‘metric_time’, ‘month’), TimeDimension(‘metric_time’, ‘year’)]
.simple_metric_0
are the same as the parent candidates as there is only 1 parent and no intersection is required.['simple_metric_0', 'simple_metric_1']
. Intersecting the candidates from the parent nodes results inTimeDimension(‘metric_time’, ‘month’)
- the resolution of the ambiguous group-by-itemmetric_time
for this query.Ambiguous Group-By-Item Specified in a Where-Filter:
For a query, a where-filter can occur in a few places:
The proposed approach is to collect and resolve all where-filters during the query parsing / query resolution phase, and then pass a lookup object to subsequent stages. The lookup object will ensure that the correct items will be rendered / retrieved.
Future Work
Since the resolution DAG represents how metrics are constructed from other items, there are other operations that can be be performed using the DAG and would be easier to implement due to the simpler nature of the resolution DAG as compared to the dataflow DAG.
Common Metric Computation Optimization
Since derived metrics are be computed from other metrics, it's possible that a given metric appears multiple times in a resolution DAG. It's desireable to compute a metric only once in a query for efficiency. The optimization to re-use common metric computation can be more easily implemented by representing the re-use in the resolution DAG as a common parent instead of implementing the optimization using the dataflow DAG as it is done now.
Add ERD Nodes
To better represent the available group-by-items that can be retrieved for a measure, the resolution DAG can be updated to include nodes that model the entity-relationship diagram. This will aid implementation of entity roles and other related features.
Input Metric Alias Generation
There are some cases with derived metrics where using the same metric with different time offsets can produce an ambiguous column in the generated SQL if metric aliases are not provided by the user. The resolution DAG may allow for easier automatic generation of aliases in such cases.
Describe alternatives you've considered
A recursive implementation that does not create the resolution DAG. This ended up being hard to follow.
From SyncLinear.com | SL-1235
The text was updated successfully, but these errors were encountered: