-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-44214: [C++] JsonExtensionType equality check ignores storage type #44215
base: main
Are you sure you want to change the base?
Conversation
|
72bb26c
to
4d591e5
Compare
@@ -763,7 +763,7 @@ TEST_F(TestConvertParquetSchema, ParquetSchemaArrowExtensions) { | |||
props.set_arrow_extensions_enabled(true); | |||
auto arrow_schema = ::arrow::schema( | |||
{::arrow::field("json_1", ::arrow::extension::json(), true), | |||
::arrow::field("json_2", ::arrow::extension::json(::arrow::large_utf8()), | |||
::arrow::field("json_2", ::arrow::extension::json(::arrow::utf8()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of making the test easier, can you keep large_utf8
and ensure it passes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reconstruct storage type if it is not stored? In this case we know data is of JSON logical type, but we don't know what was it's storage type at write time. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(There was a test where arrow schema was available and wasn't used at read time. I've addressed that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reconstruct storage type if it is not stored? In this case we know data is of JSON logical type, but we don't know what was it's storage type at write time. Am I missing something?
No, I'm asking you to write the test so that it reflects that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this? e522efb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added another assertion to justify changing arrow_schema
https://github.com/apache/arrow/pull/44215/files/94006f92085377fc4b72c2a2da844e36fb4d86d3..77c964b04972fc08f04957190866fb475b453c3c.
Let me know if something else should be changed.
0c265ab
to
1ba3ba7
Compare
cpp/src/parquet/arrow/schema.cc
Outdated
if (ex_type.storage_type()->Equals(*inferred->field->type())) { | ||
if (ex_type.storage_type()->Equals(*inferred->field->type()) || | ||
(ex_type.extension_name() == "arrow.json" && | ||
!ex_type.storage_type()->Equals(*inferred->field->type()))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "types are not equal" check is superfluous? Because if they were equal, the first part of the if check would have passed and we wouldn't get here?
I assume that in case of JSON extension type, we either just have to assume that the data will correctly use string, or we check explicitly that the inferred type is string type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is superfluous indeed. Changed as per your suggestion: 3364c0f
cpp/src/arrow/extension/json.cc
Outdated
return std::make_shared<JsonExtensionType>(storage_type); | ||
} | ||
|
||
std::shared_ptr<DataType> json(std::shared_ptr<DataType> storage_type) { | ||
return JsonExtensionType::Make(storage_type).ValueOrDie(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return JsonExtensionType::Make(storage_type).ValueOrDie(); | |
return JsonExtensionType::Make(std::move(storage_type)).ValueOrDie(); |
Also, please ensure that JsonExtensionType::Make
takes a std::shared_ptr<DataType>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
cb763e0
to
fdf8fb6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remaining nits
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
@pitrou added your changes. |
cpp/src/arrow/extension/json.cc
Outdated
return std::make_shared<JsonExtensionType>(storage_type); | ||
Result<std::shared_ptr<DataType>> JsonExtensionType::Make( | ||
std::shared_ptr<DataType> storage_type) { | ||
if (storage_type->id() != Type::STRING && storage_type->id() != Type::STRING_VIEW && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add the following static function to JsonExtensionType
to remove repeated code here and in cpp/src/parquet/arrow/schema.cc below?
static bool IsSupportedStorageType(Type::type type_id) {
return type_id == Type::STRING || type_id == Type::STRING_VIEW || type_id == Type::LARGE_STRING;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed as suggested: 6dff127
1dd3ec2
to
a415aaf
Compare
a415aaf
to
6dff127
Compare
Rationale for this change
As noted in #13901 (review):
What changes are included in this PR?
This change introduces storage equality check into
JsonExtensionType
equality check.This also fixes a storage type check in
JsonExtensionType::Make
.Are these changes tested?
Yes.
Are there any user-facing changes?
No.