Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] standardize the dbt.type_timestamp macro #9729

Closed
2 tasks done
fivetran-reneeli opened this issue Mar 5, 2024 · 5 comments
Closed
2 tasks done

[Feature] standardize the dbt.type_timestamp macro #9729

fivetran-reneeli opened this issue Mar 5, 2024 · 5 comments
Labels
discussion enhancement New feature or request

Comments

@fivetran-reneeli
Copy link

fivetran-reneeli commented Mar 5, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Not sure if this is a bug, but I am wondering if there are plans to standardize the dbt.type_timestamp macro. I believe in most warehouses, it defaults to timestamp without timezone, so it would be helpful to confirm that this is the end result across all adapters.

Expected Behavior

Field is a timestamp without time zone

Steps To Reproduce

n/a

Relevant log output

na

Environment

- OS: mac
- Python: 3.12.2
- dbt: 1.7.9

Which database adapter are you using with dbt?

other (mention it in "Additional Context")

Additional Context

Using multiple adapters

@fivetran-reneeli fivetran-reneeli added bug Something isn't working triage labels Mar 5, 2024
@dbeatty10 dbeatty10 added enhancement New feature or request and removed bug Something isn't working labels Mar 6, 2024
@dbeatty10 dbeatty10 changed the title [Bug] standardizing type_timestamp [Feature] standardizing type_timestamp Mar 6, 2024
@dbeatty10 dbeatty10 changed the title [Feature] standardizing type_timestamp [Feature] standardize the dbt.type_timestamp macro Mar 6, 2024
@dbeatty10 dbeatty10 self-assigned this Mar 6, 2024
@dbeatty10
Copy link
Contributor

Thanks for reaching out about this @fivetran-reneeli 🧠

What are the use-case(s) that you are trying to solve for?

Is it doing casts like the following or something else?

cast({{ expression }} as {{ dbt.type_timestamp() }})

Short story

Our docs state that type_timestamp:

may or may not match the behavior of TIMESTAMP WITHOUT TIMEZONE from ANSI SQL-92

So even though it is inconvenient that it is not standardized across adapters, this is known and documented behavior rather than a bug.

I am wondering if there are plans to standardize the dbt.type_timestamp macro

We don't currently have plans to standardize this.

Longer story

Here are some other issues/comments that may be related:

Possibility: a new type_timestamp_ntz() macro

If we were to standardize this, one option would be to introduce a new macro in order to maintain backwards compatibility.

Let's suppose there's a new macro called type_timestamp_ntz. Here's the strings it might return on a per adapter basis to align with naive timestamps that align with TIMESTAMP WITHOUT TIME ZONE from the SQL standard (or java.time.LocalDateTime from Java 8):

Database Naive timestamp data type
Starburst/Trino TIMESTAMP WITHOUT TIME ZONE
Athena TIMESTAMP WITHOUT TIME ZONE
Postgres TIMESTAMP
Redshift TIMESTAMP
AlloyDB TIMESTAMP
Dremio TIMESTAMP
Oracle TIMESTAMP
Snowflake TIMESTAMP_NTZ
Spark SQL TIMESTAMP_NTZ
Databricks TIMESTAMP_NTZ
BigQuery DATETIME
Teradata N/A

Caveats

  • This is based off reading the documentation for each database; I didn't actually try them out to double-check each.
  • TIMESTAMP_NTZ is not available until Spark 3.4 & Databricks Runtime 13 [1, 2]
  • Teradata has separate TIMESTAMP and TIMESTAMP WITH TIME ZONE data types, but the former is described as "non-ANSI standard. Analytics Database stores a TIMESTAMP value in UTC." [3]

@dbeatty10 dbeatty10 removed their assignment Mar 11, 2024
@fivetran-reneeli
Copy link
Author

fivetran-reneeli commented Mar 12, 2024

Thanks for the thorough response @dbeatty10 ! It was helpful to have the related issues-- I found this one to be very in line with extra attention on your comment here.

It would restore the original literal string values returned bytype_timestamp() that existed in dbt-utils previously (e.g., TIMESTAMP_NTZ instead of TIMESTAMP for Snowflake, etc)

For more context, on our side our models leverage dbt.type_timestamp for the following warehouses:
Snowflake, Postgres, Databricks, Redshift, and BigQuery.

So like you listed, I believe each one has default TNZ except for BigQuery, where timestamp = with timezone, and datetime = without timezone.

The dbt team may have discussed this in the past, based on comments I found like this. And I dug around the old dbt_utils changelog and saw that the timestamp macro for redshift and postgres were explicitly cast once upon a time.

cc @fivetran-joemarkiewicz

@dbeatty10
Copy link
Contributor

@fivetran-reneeli All those links you included are golden info 🤩

For more context, on our side our models leverage dbt.type_timestamp for the following warehouses:
Snowflake, Postgres, Databricks, Redshift, and BigQuery.

Could you share more about how you are leveraging dbt.type_timestamp? e.g. could you share the end goal that you are trying to accomplish by using it? If you have links to source code, that would be handy.

The reason I ask is there are at least a few different uses I can think of:

  • comparing the data type from an information schema query (less common)
  • casting an expression to a naive timestamp (more common)
  • casting an expression to an aware timestamp (less common?)

@fivetran-reneeli
Copy link
Author

Sorry for the delay!

Sure thing, so our team develops data model packages that we make compatible with dbt, that are warehouse-agnostic (so currently the packages work across postgres, snowflake, bigquery, redshift, and databricks). Here is an example model with jira.

cast(created as {{ dbt.type_timestamp() }}) as created_at,

Because we want our models to work with all those mentioned warehouses, ideally the compiled code is the same for all.

@dbeatty10
Copy link
Contributor

No prob and thanks for that example @fivetran-reneeli !

There's some tricky things to consider here. For example, how would this example behave when created expression in is an aware timestamp vs. a naive timestamp?

We took a look at something similar a year or so ago when considering #5969, and we opted not to tackle it at that time due to the complexity of getting it right and the high probability of accidentally getting it wrong and creating churn for users and maintainers.

See #5935 and #7665 for a small sample of edge cases related to aware vs. naive that we've run into.

So I'm going to upgrade this issue to a Discussion so we can further discuss different design aspirations folks are looking for as well as edge cases people have come across.

I'd propose the main topic of the discussion to be this question:

  • How can we best support converting timestamp expressions to naive (and/or aware) timestamps in a cross-database manner?

@dbt-labs dbt-labs locked and limited conversation to collaborators Mar 21, 2024
@dbeatty10 dbeatty10 converted this issue into discussion #9790 Mar 21, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
discussion enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants