-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36924][SQL] CAST between ANSI intervals and IntegralType #34494
Conversation
@MaxGekk @cloud-fan Can you give me some comments. At now I just done the cast between |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
Outdated
Show resolved
Hide resolved
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #144931 has finished for PR 34494 at commit
|
faeffbc
to
004cc67
Compare
@MaxGekk @cloud-fan If you have time, could you please review the code again? |
Kubernetes integration test starting |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
Show resolved
Hide resolved
b => IntervalUtils.longToDayTimeInterval( | ||
x.exactNumeric.asInstanceOf[Numeric[Any]].toLong(b), it.endField) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you handle overflow here like for year-month intervals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No overflow is possible that convert byte/short/int/long to long. It's not like year-month intervals, because there is a case that long to int.
@@ -598,6 +606,15 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit | |||
IntervalUtils.castStringToYMInterval(s, it.startField, it.endField)) | |||
case _: YearMonthIntervalType => buildCast[Int](_, s => | |||
IntervalUtils.periodToMonths(IntervalUtils.monthsToPeriod(s), it.endField)) | |||
case x: IntegralType if it.startField == it.endField => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the error message look like when t.startField != it.endField
. It would be nice if we say to users that according to the SQL standard the start and end fields should be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, Maybe I made a mistake that we no need check it.startField == it.endField
, because it has called the canCast
and return false if it.startField != it.endField
when go here. So I think no need add error message here. Do you think so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think no need add error message here
+1
buildCast[Int](_, i => yearMonthIntervalToInt(i, e)) | ||
} | ||
|
||
private[this] def dayTimeIntervalToInt(v: Long, endFiled: Byte): Int = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you embed it as this function is used only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense.
buildCast[Int](_, i => yearMonthIntervalToShort(i, e)) | ||
} | ||
|
||
private[this] def dayTimeIntervalToShort(v: Long, endFiled: Byte): Short = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same. Please, embed it.
Kubernetes integration test status failure |
004cc67
to
b170f78
Compare
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #145090 has finished for PR 34494 at commit
|
Kubernetes integration test starting |
Test build #145095 has finished for PR 34494 at commit
|
Kubernetes integration test status failure |
Test build #145104 has finished for PR 34494 at commit
|
|
||
case (_: DayTimeIntervalType, _: DayTimeIntervalType) => true | ||
case (_: YearMonthIntervalType, _: YearMonthIntervalType) => true | ||
case (DayTimeIntervalType(s, e), _: IntegralType) if s == e => true | ||
case (YearMonthIntervalType(s, e), _: IntegralType) if s == e => true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we can cast INTERVAL DAY TO SECOND
to INTERVAL SECOND
, I think it's a reasonable and convenient extension to cast INTERVAL DAY TO SECOND
to integral type directly. That said, we can allow casting any interval type to integral type. For casting integral to interval, we should still only allow single field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we can cast INTERVAL DAY TO SECOND to INTERVAL SECOND, I think it's a reasonable and convenient extension to cast INTERVAL DAY TO SECOND to integral type directly.
So clever. You're absolutely right. I didn't think of this use case.
@@ -598,6 +606,15 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit | |||
IntervalUtils.castStringToYMInterval(s, it.startField, it.endField)) | |||
case _: YearMonthIntervalType => buildCast[Int](_, s => | |||
IntervalUtils.periodToMonths(IntervalUtils.monthsToPeriod(s), it.endField)) | |||
case x: IntegralType if it.startField == it.endField => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think no need add error message here
+1
case x: IntegralType if it.startField == it.endField => | ||
b => | ||
val intValue = try { | ||
x.exactNumeric.asInstanceOf[Numeric[Any]].toInt(b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we let intToYearMonthInterval
take long? then only intToYearMonthInterval
need to do overflow check, not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense. It's a little weird intToYearMonthInterval
take a parameter of long type. So I want to change the method name: intToYearMonthInterval
=> integralToYearMonthInterval
and longToDayTimeInterval
=> integralToDayTimeInterval
. Do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Update. @cloud-fan @MaxGekk If you have time, I'd like to ask you to take a look at it again. Thank you very much. |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #145156 has finished for PR 34494 at commit
|
b624c2f
to
514d275
Compare
Kubernetes integration test starting |
assert(it.startField == it.endField) | ||
assert(x != LongType) | ||
b => IntervalUtils.intToDayTimeInterval( | ||
x.integral.asInstanceOf[Integral[Any]].toInt(b), it.endField) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge byte/short/int into one case branch to reduce the code and add assert(x != LongType)
although I think it won't get here for LongType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems better to unify the code more
case x: IntegralType =>
assert(it.startField == it.endField)
if (x == LongType) ... else ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
assert(it.startField == it.endField) | ||
assert(x != LongType) | ||
b => IntervalUtils.intToYearMonthInterval( | ||
x.integral.asInstanceOf[Integral[Any]].toInt(b), it.endField) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
code""" | ||
$evPrim = $util.intToDayTimeInterval($c, (byte)${it.endField}); | ||
""" | ||
case LongType => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, we can unify the two cases and write if-else
ctx: CodegenContext, | ||
endFiled: Byte, | ||
integralType: String, | ||
catalogType: String): CastFunction = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this parameter is not used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
ctx: CodegenContext, | ||
endFiled: Byte, | ||
integralType: String, | ||
catalogType: String): CastFunction = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this parameter is not used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
integralType: String, | ||
catalogType: String): CastFunction = { | ||
val util = IntervalUtils.getClass.getCanonicalName.stripSuffix("$") | ||
integralType match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the caller side should just pass Int
, Long
, etc. and here we can simply write $util.dayTimeInterval$integralType
val vInt = yearMonthIntervalToInt(v, endFiled) | ||
val vShort = vInt.toShort | ||
if (vInt != vShort) { | ||
throw QueryExecutionErrors.castingCauseOverflowError(v, ShortType.catalogString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a blocker, but it will be great to print v
with interval format, not just an integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems more intuitive that print v
with interval format. A small burden is the need to add a parameter startField
to build the interval format. eg INTERVAL 'v' startField TO endField
.
Kubernetes integration test starting |
Kubernetes integration test status failure |
20c8e9f
to
276673a
Compare
case x: DayTimeIntervalType => | ||
castDayTimeIntervalToIntegralTypeCode(x.endField, "Long") | ||
case x: YearMonthIntervalType => | ||
castYearMonthIntervalToIntegralTypeCode(x.endField, "Int") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we use Int
instead of Long
, as we use yearMonthIntervaltoInt().toLong
instead of define the function yearMonthIntervaltoLong
. also I think no problem that assign int value to long result.
276673a
to
2ac9d11
Compare
Test build #145310 has finished for PR 34494 at commit
|
Kubernetes integration test starting |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Although it is not a blocker. but with interval format is friendly and clear. So I update the PR. |
intToYearMonthInterval(vInt, endField) | ||
} | ||
|
||
def yearMonthIntervalToInt(v: Int, startField: Byte, endField: Byte): Int = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, startField is not used. but in order to unify code in codegen $evPrim = $util.yearMonthIntervalTo$integralType($c, (byte)$startField, (byte)$endField);
.
Kubernetes integration test status failure |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #145322 has finished for PR 34494 at commit
|
Test build #145321 has finished for PR 34494 at commit
|
Test build #145327 has finished for PR 34494 at commit
|
The GA failure is unrelated. Merging to master, thanks! |
@cloud-fan Thank you very much for your very patient review |
What changes were proposed in this pull request?
Add cast
AnsiIntervalType
toIntegralType
requirement:
YearMonthIntervalType
just have one unitDayTimeIntervalType
just have one unitcast rule:
YearMonthIntervalType
is the value of theIntegralType
after conversion.DayTimeIntervalType
is the value of theIntegralType
after conversion.Add cast
IntegralType
toAnsiIntervalType
requirement:
YearMonthIntervalType
just have one unitDayTimeIntervalType
just have one unitcast rule:
YearMonthIntervalType
that with the single unit after conversion.DayTimeIntervalType
that with the single unit after conversion.Why are the changes needed?
According to 2011 Standards
Does this PR introduce any user-facing change?
Yes, user can use cast function between YearMonthIntervalType to NumericType
How was this patch tested?
add ut testcase