Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SL-1292] Cumulative metric validation allows queries which will not render correctly #825

Closed
tlento opened this issue Oct 30, 2023 · 1 comment · Fixed by #857
Closed
Assignees
Labels
bug Something isn't working High priority Created by Linear-GitHub Sync In Progress linear Metricflow Created by Linear-GitHub Sync

Comments

@tlento
Copy link
Contributor

tlento commented Oct 30, 2023

Updated task scope

See sections below for original context.

This task is now to fix the broken validation that allows users to issue cumulative metric queries with time dimensions that our rendering logic does not properly support.

There are three other follow-ups which we need to track:

  1. Allowing queries for cumulative and derived offset metrics where metric_time is not specified OR where metric_time is specified by its underlying agg_time_dimension name
  2. Allowing queries for cumulative and derived offset metrics to be grouped by other time dimensions even if the metric_time equivalent is not requested directly in the query. This is a much larger adjustment than it seems, as it effectively opens up the question about allowing for query-time agg_time_dimension overrides
  3. Providing users with a mechanism for giving us structured information about time ranges in order to enable query optimization via more aggressive initial filtering (for engines which benefit from such things)

Original issue details

Cumulative metrics in MetricFlow have two different time axes that they need to consider:

  1. The metric_time axis, which can be subject to a filter applied by a derived metric or a query-time where input
  2. The cumulative window time axis, which is defined in the cumulative metric spec itself

Today, there are three ways to specify a metric_time filter constraint:

  1. As a raw filter expression assigned to one of the filter parameters available for a derived metric that uses one or more cumulative metrics as input
  2. As a raw where expression at query time
  3. via the deprecated and all but inaccessible start_time and end_time query inputs

Of these, only the third one has any awareness of the relationship between the cumulative metric and the metric_time axis - we will automatically expand the time range filter such that all relevant inputs are included. For the other two approaches - which are the current and future standard - we truncate input data according to the filter range specified. This results in confusing outputs.

Imagine a 7-day cumulative metric adding up events per day for the first week of January, 2023. For simplicity, we will assume that there is 1 event per day, which means the 7 day cumulative value for 2023-01-07 should be 7.

Cumulative metrics using raw filter expressions

Input: mf query --metrics event_count_7d --where "{{TimeDimension('metric_time', 'day')}} = '2023-01-07'"
Output:

metric_time event_count_7d
2023-01-07 1

Cumulative metrics using start-time/end-time

Input: `mf query --metrics event_count_7d --start-time '2023-01-07' --end-time '2023-01-07'
Output:

metric_time event_count_7d
2023-01-01 1
2023-01-02 2
2023-01-03 3
2023-01-04 4
2023-01-05 5
2023-01-06 6
2023-01-07 7

Since this is confusing and not in accordance with any reasonable user expectation, we are treating this as a high priority bug for our 1.7 release. The plan for the fix is:

  1. Ensure correctness in all cases. This will be expensive, but we are going to adjust the query plan to use a post-aggregation filter for cumulative metrics by default. End users should still be able to get the input range expansion and pre-aggregation filter behavior with the start-time/end-time parameters, but derived metric filter expressions will trigger full input data scans.
  2. Update our filter handling (mechanism still TBD) to allow us to do pre-aggregation filtering on expanded ranges for raw filter expressions in a manner that works with derived metrics. This will be tracked in a separate issue.

SL-1292

@tlento tlento added bug Something isn't working High priority Created by Linear-GitHub Sync In Progress labels Oct 30, 2023
@tlento tlento self-assigned this Oct 30, 2023
@tlento tlento added linear Metricflow Created by Linear-GitHub Sync labels Oct 30, 2023
@tlento
Copy link
Contributor Author

tlento commented Nov 7, 2023

Update:

Step 1 is half done, as we already do this if and only if the metric_time dimension is requested using the name metric_time. See #806 for some specific CLI-level repro steps of this issue.

The real bug here, at least at the moment, is that we have an invariant to assert that users include metric_time, but it only triggers if they include NO time dimensions in the query. We will update that validation one way or another.

Step 2 will be tracked independently, as it's a larger change.

@tlento tlento changed the title Cumulative and derived offset metric inputs are left-censored by where filters Cumulative metric validation allows queries which will not render correctly Nov 7, 2023
@Jstein77 Jstein77 added linear and removed linear labels Nov 28, 2023
@Jstein77 Jstein77 changed the title Cumulative metric validation allows queries which will not render correctly [SL-1292] Cumulative metric validation allows queries which will not render correctly Nov 28, 2023
mirnawong1 added a commit to dbt-labs/docs.getdbt.com that referenced this issue Nov 29, 2023
removing `metric_time` limitation as this was resolved in dbt-labs/metricflow#825
mirnawong1 added a commit to dbt-labs/docs.getdbt.com that referenced this issue Nov 29, 2023
removing `metric_time` limitation as this was resolved in
dbt-labs/metricflow#825

## What are you changing in this pull request and why?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working High priority Created by Linear-GitHub Sync In Progress linear Metricflow Created by Linear-GitHub Sync
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants