-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support TimeZone-aware time dimension types #733
Comments
Thanks for logging an issue to keep track of the progress. 👍 |
Just to add a bit of context for some of our usecases related to timezone. For a distributed organization that have multiple Geo location, they would still store their business metrics data in a data warehouse as a single source of truth. The data stored in the data warehouse is likely normalized to a single timezone such as GMT. However, when individual teams that consumes the data from the data warehouse from their own Geo, they would localize the query to their own timezone such as EST or PST. As a result, when sending a metrics query with a time granularity (for example Hypothetically speaking, if we are able to send a query with If this usecase were to be handled directly in the data warehouse and the semantic model, we would have to create multiple time dimensions, one for each timezone that needs to be handled. This will make the semantic model quite complicated. Hopefully this make sense. |
Adding some timezone type notes here from investigations into other issues. Most engines support the equivalent of a
Additional notes:
As a general rule, converting from ntz to tz types results in a change in the value, as some form of time zone adjustment may be implicitly applied. Snowflake uses local time zone, BigQuery and Databricks assume UTC, etc. Similarly, converting from tz -> ntz effectively truncates the time zone. Most engines also allow for a mechanism to convert from tz -> ntz with an adjustment, but the mechanism varies. |
Based on the above:
Note: this assumes all engines support the relevant conversion semantics in the manner we require, which needs further investigation. Snowflake and postgres appear to have particularly funky tz -> ntz conversion semantics, although it's not clear how much of this is a documentation issue. |
@dbeatty10 has spent more time thinking about timezone interoperability than anyone I've ever met - he might be a helpful person to talk to about different platforms' oddities! |
Is this your first time submitting a feature request?
Describe the feature
MetricFlow currently uses timezone-agnostic types for all date/time operations, but forcing people to coerce to UTC and then back-convert on render is cumbersome at best and impossible at worst. We have real-world examples of people running into date boundary issues where they have customer data stored in the customer-local timezone, and they want to compute daily customer-specific metrics with the boundaries set for that customer's local day (rather than UTC or whatever).
For situations like that it's natural to store the date/time information in local time with timezone annotations intact, as this makes common query types more natural and also allows for things like audits against local time values.
Currently , the only work-around is to normalize the time stamp in the warehouse (via the underlying dbt model) ahead of time, or else to use the
expr
field to do it on the fly when the semantic model is constructed. This is somewhat limiting, as it does not allow for re-use of the same measure against different time zones.This issue will only become more common when we add hourly granularities, so it's something we'll need to address.
Note - the existing use case as described here requires us to aggregate against local time - if we do so against UTC the daily aggregations will be offset against user local time - so any strategy involving doing everything in UTC proper and then offsetting the result set in its entirety won't work, as different rows will need to be offset independently. Whether we provide support for this kind of approach via MetricFlow or some other mechanism involving dbt date/time plugins that allow MetricFlow to do simpler aggregations is an open question.
Describe alternatives you've considered
No response
Who will this benefit?
No response
Are you interested in contributing this feature?
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: