Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DayTimeIntervalType/YearMonthIntervalType support #3460

Merged
merged 6 commits into from
Sep 15, 2021

Conversation

wbo4958
Copy link
Collaborator

@wbo4958 wbo4958 commented Sep 13, 2021

This PR tries to add DayTimeIntervalType and YearMonthIntervalType for TypeSig. To fix part of #3415

This PR introduces a new TypeSig320 which extends TypeSig, and overrides some methods which has data type checks.

The expressions which supports DayTimeIntervalType and YearMonthIntervalType need to be moved to Spark 320 shims to add TypeSig.DAYTIME and TypeSig.YEARMONTH into the checks.

@revans2, Could you give me some comments on this solution?

@wbo4958 wbo4958 linked an issue Sep 13, 2021 that may be closed by this pull request
@wbo4958 wbo4958 marked this pull request as ready for review September 13, 2021 11:07
@wbo4958 wbo4958 marked this pull request as draft September 13, 2021 11:08
import org.apache.spark.sql.types.{DataType, DayTimeIntervalType, YearMonthIntervalType}

/** TypeSig for Spark 3.2.0+ which adds DayTimeIntervalType and YearMonthIntervalType support */
final class TypeSig320(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two possibilities on how to do this

  1. We do something like this were we have a TypeSigBase in the common code and then an actual TypeSig implementation class in shim layers. The problem with this is TypeSig.all and the other Spark helper values will not reflect "all" of the types. To do that we would need to move all or part of object TypeSig into the shim layer, or we would have to have the overrides for each of the things that touch TypeSig.all etc go into the shim layer.
  2. We turn it around and have a static TypeSigUtil object in the shim layer that TypeSig calls into to do checks with new types.

For me I am leaning towards the second approach because it is consistent between the static and non-static code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx @revans2, I just had a new commit for the second approach. Could you help to take a look at it? Thx

@sameerz sameerz added this to the Sep 13 - Sep 24 milestone Sep 13, 2021
@wbo4958 wbo4958 marked this pull request as ready for review September 14, 2021 11:13
@wbo4958
Copy link
Collaborator Author

wbo4958 commented Sep 14, 2021

build

@wbo4958 wbo4958 self-assigned this Sep 14, 2021
@@ -93,4 +93,6 @@ trait Spark30XShims extends SparkShims {
}

override def shouldFailDivOverflow(): Boolean = false

override def getTypeSigUtil(): TypeSigUtil = TypeSigUtilUntil320
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need anything in the Shim layer to make this happen. You can just have a single class named TypeSigUtil. Then you have different versions in the appropriate directories.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, You're correct. Thx. Done

}
}

override def getCastChecksAndSigs(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer it if we found a way to put daytimeChecks, sparkDaytimeSig yearmonthChecks and sparkYearmonthSig inside of the CastChecks class. This is just for consistency with where the rest of the checks are, and how they are overridden.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I just map the dataType to TypeEnum and invoke the existing function which accepts TypeEnum can resolve this issue.

Signed-off-by: Bobby Wang <wbo4958@gmail.com>
Signed-off-by: Bobby Wang <wbo4958@gmail.com>
Signed-off-by: Bobby Wang <wbo4958@gmail.com>
@wbo4958
Copy link
Collaborator Author

wbo4958 commented Sep 15, 2021

build

@wbo4958
Copy link
Collaborator Author

wbo4958 commented Sep 15, 2021

build

@revans2 revans2 merged commit 98427f1 into NVIDIA:branch-21.10 Sep 15, 2021
@@ -115,6 +157,8 @@ object TypeEnum extends Enumeration {
val MAP: Value = Value
val STRUCT: Value = Value
val UDT: Value = Value
val DAYTIME: Value = Value
val YEARMONTH: Value = Value
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we util ShimLoader.getSparkVersion to construct static version to set(TypeEnum) map?
Not know if feasible, if ok, just not need the spark 320 code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding more enums values dynamically is not something that scala supports as far as I can tell.

@wbo4958 wbo4958 deleted the daytime branch September 25, 2021 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Fix regressions in WindowFunctionSuite with Spark 3.2.0
4 participants