Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ANSI mode for ToUnixTimestamp, UnixTimestamp, GetTimestamp, DateAddInterval #5316

Merged
merged 3 commits into from
Apr 28, 2022

Conversation

res-life
Copy link
Collaborator

Contributes to #5119

Supports ANSI mode for:

  • ToUnixTimestamp
  • UnixTimestamp
  • GetTimestamp
  • DateAddInterval

Signed-off-by: Chong Gao res_life@163.com

Chong Gao added 2 commits April 26, 2022 16:36
…DateAddInterval`

Signed-off-by: Chong Gao <res_life@163.com>
@res-life
Copy link
Collaborator Author

ToUnixTimestamp, UnixTimestamp and GetTimestamp

They throw an exception if has any invalid value when parsing strings to timestamps in ANSI mode.
For LegacyTimeParserPolicy mode, do not support ANSI mode, see DateUtils.tagAndGetCudfFormat

          // LEGACY support has a number of issues that mean we cannot guarantee
          // compatibility with CPU
          // - we can only support 4 digit years but Spark supports a wider range
          // - we use a proleptic Gregorian calender but Spark uses a hybrid Julian+Gregorian
          //   calender in LEGACY mode
          if (SQLConf.get.ansiEnabled) {
            meta.willNotWorkOnGpu("LEGACY format in ANSI mode is not supported on the GPU")
          }

So only checks when it's not LegacyTimeParserPolicy

      if (getTimeParserPolicy == LegacyTimeParserPolicy) {
           ......
      } else {
        parseStringAsTimestamp(
          lhs,
          sparkFormat,
          strfFormat,
          DType.TIMESTAMP_MICROSECONDS,
          failOnError)     // check ANSI
      }

Will add more tests when fixing this PR:
#1556

DateAddInterval
Should specify spark.sql.legacy.interval.enabled to test DateAddInterval after Spark 3.2.0,
refer to https://issues.apache.org/jira/browse/SPARK-34896

[SPARK-34896][SQL] Return day-time interval from dates subtraction
1. Add the SQL config `spark.sql.legacy.interval.enabled` which will control when Spark SQL should use `CalendarIntervalType` instead of ANSI intervals.

The ANSI checking was added in the following PR
[SPARK-31527][SQL] date add/subtract interval only allow those day precision in ansi mode
Refer to
https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L773TOL774

// DateTimeUtils class
def dateAddInterval(
  require(interval.microseconds == 0,
    "Cannot add hours, minutes or seconds, milliseconds, microseconds to a date")

https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L1622TOL1623

  * When ansi mode is on, the microseconds part of interval needs to be 0, otherwise a runtime
  * [[IllegalArgumentException]] will be raised.
case class DateAddInterval
    ......
    if (ansiEnabled || itvl.microseconds == 0) {
    DateTimeUtils.dateAddInterval(start.asInstanceOf[Int], itvl)

@res-life
Copy link
Collaborator Author

build

@jlowe jlowe added this to the Apr 18 - Apr 29 milestone Apr 26, 2022
@sameerz sameerz added the feature request New feature or request label Apr 27, 2022
@res-life
Copy link
Collaborator Author

build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants