Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-36924][SQL] CAST between ANSI intervals and IntegralType #34494

Closed
wants to merge 4 commits into from

Conversation

Peng-Lei
Copy link
Contributor

@Peng-Lei Peng-Lei commented Nov 5, 2021

What changes were proposed in this pull request?

Add cast AnsiIntervalType to IntegralType

requirement:

  1. YearMonthIntervalType just have one unit
  2. DayTimeIntervalType just have one unit

cast rule:

  1. The value corresponding to the unit of YearMonthIntervalType is the value of the IntegralType after conversion.
  2. The value corresponding to the unit of DayTimeIntervalType is the value of the IntegralType after conversion.

Add cast IntegralType to AnsiIntervalType
requirement:

  1. YearMonthIntervalType just have one unit
  2. DayTimeIntervalType just have one unit

cast rule:

  1. The value of the IntegralTypeis the value of YearMonthIntervalType that with the single unit after conversion.
  2. The value of the IntegralTypeis the value of DayTimeIntervalType that with the single unit after conversion.

Why are the changes needed?

According to 2011 Standards
截图

  1. If TD is an interval and SD is exact numeric, then TD shall contain only a single .
  2. If TD is exact numeric and SD is an interval, then SD shall contain only a single .

Does this PR introduce any user-facing change?

Yes, user can use cast function between YearMonthIntervalType to NumericType

How was this patch tested?

add ut testcase

@github-actions github-actions bot added the SQL label Nov 5, 2021
@Peng-Lei
Copy link
Contributor Author

Peng-Lei commented Nov 5, 2021

@MaxGekk @cloud-fan Can you give me some comments. At now I just done the cast between YearMonthInterval and Numeric. Your guidance will help me during development of casting between DayTimeIntervalType and Numeric.

@SparkQA
Copy link

SparkQA commented Nov 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49402/

@SparkQA
Copy link

SparkQA commented Nov 5, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49402/

@SparkQA
Copy link

SparkQA commented Nov 5, 2021

Test build #144931 has finished for PR 34494 at commit bf9de6c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Peng-Lei Peng-Lei force-pushed the SPARK-36924 branch 2 times, most recently from faeffbc to 004cc67 Compare November 11, 2021 07:11
@Peng-Lei
Copy link
Contributor Author

@MaxGekk @cloud-fan If you have time, could you please review the code again?

@SparkQA
Copy link

SparkQA commented Nov 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49559/

Comment on lines 598 to 599
b => IntervalUtils.longToDayTimeInterval(
x.exactNumeric.asInstanceOf[Numeric[Any]].toLong(b), it.endField)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you handle overflow here like for year-month intervals?

Copy link
Contributor Author

@Peng-Lei Peng-Lei Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No overflow is possible that convert byte/short/int/long to long. It's not like year-month intervals, because there is a case that long to int.

@@ -598,6 +606,15 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit
IntervalUtils.castStringToYMInterval(s, it.startField, it.endField))
case _: YearMonthIntervalType => buildCast[Int](_, s =>
IntervalUtils.periodToMonths(IntervalUtils.monthsToPeriod(s), it.endField))
case x: IntegralType if it.startField == it.endField =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the error message look like when t.startField != it.endField. It would be nice if we say to users that according to the SQL standard the start and end fields should be the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, Maybe I made a mistake that we no need check it.startField == it.endField, because it has called the canCast and return false if it.startField != it.endField when go here. So I think no need add error message here. Do you think so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At now, if it.startField != it.endField the exception is as follow:
image
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think no need add error message here

+1

buildCast[Int](_, i => yearMonthIntervalToInt(i, e))
}

private[this] def dayTimeIntervalToInt(v: Long, endFiled: Byte): Int = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you embed it as this function is used only once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense.

buildCast[Int](_, i => yearMonthIntervalToShort(i, e))
}

private[this] def dayTimeIntervalToShort(v: Long, endFiled: Byte): Short = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same. Please, embed it.

@SparkQA
Copy link

SparkQA commented Nov 11, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49559/

@Peng-Lei Peng-Lei changed the title [SPARK-36924][SQL] CAST between ANSI intervals and numerics [SPARK-36924][SQL] CAST between ANSI intervals and IntegralType Nov 11, 2021
@SparkQA
Copy link

SparkQA commented Nov 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49564/

@SparkQA
Copy link

SparkQA commented Nov 11, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49564/

@SparkQA
Copy link

SparkQA commented Nov 11, 2021

Test build #145090 has finished for PR 34494 at commit faeffbc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49573/

@SparkQA
Copy link

SparkQA commented Nov 11, 2021

Test build #145095 has finished for PR 34494 at commit 004cc67.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 11, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49573/

@SparkQA
Copy link

SparkQA commented Nov 11, 2021

Test build #145104 has finished for PR 34494 at commit b170f78.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


case (_: DayTimeIntervalType, _: DayTimeIntervalType) => true
case (_: YearMonthIntervalType, _: YearMonthIntervalType) => true
case (DayTimeIntervalType(s, e), _: IntegralType) if s == e => true
case (YearMonthIntervalType(s, e), _: IntegralType) if s == e => true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we can cast INTERVAL DAY TO SECOND to INTERVAL SECOND, I think it's a reasonable and convenient extension to cast INTERVAL DAY TO SECOND to integral type directly. That said, we can allow casting any interval type to integral type. For casting integral to interval, we should still only allow single field.

Copy link
Contributor Author

@Peng-Lei Peng-Lei Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we can cast INTERVAL DAY TO SECOND to INTERVAL SECOND, I think it's a reasonable and convenient extension to cast INTERVAL DAY TO SECOND to integral type directly.

So clever. You're absolutely right. I didn't think of this use case.

@@ -598,6 +606,15 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit
IntervalUtils.castStringToYMInterval(s, it.startField, it.endField))
case _: YearMonthIntervalType => buildCast[Int](_, s =>
IntervalUtils.periodToMonths(IntervalUtils.monthsToPeriod(s), it.endField))
case x: IntegralType if it.startField == it.endField =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think no need add error message here

+1

case x: IntegralType if it.startField == it.endField =>
b =>
val intValue = try {
x.exactNumeric.asInstanceOf[Numeric[Any]].toInt(b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we let intToYearMonthInterval take long? then only intToYearMonthInterval need to do overflow check, not here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. It's a little weird intToYearMonthInterval take a parameter of long type. So I want to change the method name: intToYearMonthInterval => integralToYearMonthInterval and longToDayTimeInterval => integralToDayTimeInterval. Do you agree?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@Peng-Lei
Copy link
Contributor Author

Update. @cloud-fan @MaxGekk If you have time, I'd like to ask you to take a look at it again. Thank you very much.

@SparkQA
Copy link

SparkQA commented Nov 12, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49627/

@SparkQA
Copy link

SparkQA commented Nov 12, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49627/

@SparkQA
Copy link

SparkQA commented Nov 12, 2021

Test build #145156 has finished for PR 34494 at commit b624c2f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49685/

assert(it.startField == it.endField)
assert(x != LongType)
b => IntervalUtils.intToDayTimeInterval(
x.integral.asInstanceOf[Integral[Any]].toInt(b), it.endField)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge byte/short/int into one case branch to reduce the code and add assert(x != LongType) although I think it won't get here for LongType

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems better to unify the code more

case x: IntegralType =>
  assert(it.startField == it.endField)
  if (x == LongType) ... else ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

assert(it.startField == it.endField)
assert(x != LongType)
b => IntervalUtils.intToYearMonthInterval(
x.integral.asInstanceOf[Integral[Any]].toInt(b), it.endField)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

code"""
$evPrim = $util.intToDayTimeInterval($c, (byte)${it.endField});
"""
case LongType =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, we can unify the two cases and write if-else

ctx: CodegenContext,
endFiled: Byte,
integralType: String,
catalogType: String): CastFunction = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this parameter is not used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ctx: CodegenContext,
endFiled: Byte,
integralType: String,
catalogType: String): CastFunction = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this parameter is not used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

integralType: String,
catalogType: String): CastFunction = {
val util = IntervalUtils.getClass.getCanonicalName.stripSuffix("$")
integralType match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the caller side should just pass Int, Long, etc. and here we can simply write $util.dayTimeInterval$integralType

val vInt = yearMonthIntervalToInt(v, endFiled)
val vShort = vInt.toShort
if (vInt != vShort) {
throw QueryExecutionErrors.castingCauseOverflowError(v, ShortType.catalogString);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a blocker, but it will be great to print v with interval format, not just an integer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems more intuitive that print v with interval format. A small burden is the need to add a parameter startField to build the interval format. eg INTERVAL 'v' startField TO endField.

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49781/

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49781/

case x: DayTimeIntervalType =>
castDayTimeIntervalToIntegralTypeCode(x.endField, "Long")
case x: YearMonthIntervalType =>
castYearMonthIntervalToIntegralTypeCode(x.endField, "Int")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we use Int instead of Long, as we use yearMonthIntervaltoInt().toLong instead of define the function yearMonthIntervaltoLong. also I think no problem that assign int value to long result.

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Test build #145310 has finished for PR 34494 at commit 20c8e9f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49792/

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49793/

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49792/

@Peng-Lei
Copy link
Contributor Author

not a blocker, but it will be great to print v with interval format, not just an integer.

Although it is not a blocker. but with interval format is friendly and clear. So I update the PR.

intToYearMonthInterval(vInt, endField)
}

def yearMonthIntervalToInt(v: Int, startField: Byte, endField: Byte): Int = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, startField is not used. but in order to unify code in codegen $evPrim = $util.yearMonthIntervalTo$integralType($c, (byte)$startField, (byte)$endField);.

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49793/

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49798/

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49798/

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Test build #145322 has finished for PR 34494 at commit 2ac9d11.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Test build #145321 has finished for PR 34494 at commit 276673a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Test build #145327 has finished for PR 34494 at commit c974930.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

The GA failure is unrelated. Merging to master, thanks!

@cloud-fan cloud-fan closed this in 9553ed7 Nov 17, 2021
@Peng-Lei
Copy link
Contributor Author

@cloud-fan Thank you very much for your very patient review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants