Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-34334: [Go][CSV] Support list fields #34343

Merged
merged 15 commits into from
Mar 8, 2023

Conversation

yevgenypats
Copy link
Contributor

@yevgenypats yevgenypats commented Feb 24, 2023

This PR only handles

  • list (of all supported type in recursive manner) for both reading and writing CSVs.
  • extensions

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@kou kou changed the title [GO][CSV] Support lists [Go][CSV] Support lists Feb 24, 2023
@yevgenypats yevgenypats changed the title [Go][CSV] Support lists GH-34334: [Go][CSV] Support lists Feb 26, 2023
@github-actions
Copy link

@github-actions
Copy link

⚠️ GitHub issue #34334 has been automatically assigned in GitHub to PR creator.

@yevgenypats yevgenypats changed the title GH-34334: [Go][CSV] Support lists GH-34334: [Go][CSV] Support lists and extensions Feb 26, 2023
go/arrow/csv/reader.go Outdated Show resolved Hide resolved
go/arrow/csv/reader.go Outdated Show resolved Hide resolved
go/arrow/csv/reader.go Outdated Show resolved Hide resolved
@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Feb 28, 2023
@yevgenypats yevgenypats changed the title GH-34334: [Go][CSV] Support lists and extensions GH-34334: [Go][CSV] Support list fields Mar 7, 2023
@yevgenypats
Copy link
Contributor Author

yevgenypats commented Mar 7, 2023

@zeroshade I've updated this PR to only include list support (and updated per review) as figured out it will be easier to do extensions in a follow-up PR potentially after #34454.

go/arrow/csv/reader.go Outdated Show resolved Hide resolved
Comment on lines +750 to +756
for _, str := range items {
r.initFieldConverter(valueBldr)(str)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a future enhancement might be to cache this somehow... but it's not needed for this PR

@zeroshade
Copy link
Member

@yevgenypats This looks good to me other than my last two comments. Please add a couple tests for checking error cases and an empty list case. After that this LGTM 😄 thanks again for this!

@yevgenypats
Copy link
Contributor Author

Thanks @zeroshade ! BTW - any idea why the GO / AMD64 macOS 11 Go 1.17/1.18 - CGO fails? seems unrelated to this PR.

@zeroshade
Copy link
Member

@yevgenypats it is unrelated to this PR, I'm addressing it in #34488

Comment on lines 738 to 739
str = strings.TrimPrefix(str, "{")
str = strings.TrimSuffix(str, "}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be condensed into a single call to strings.Trim(str, "{}")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change this but I feel the explicit way is clearer and potentially more efficient. But let me know if you want it the other way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the implementations from the strings package, I'm pretty sure that calling strings.Trim is more efficient than calling TrimPrefix and TrimSuffix separately. At minimum it reduces the number of code branches that get used. Personally I'd prefer using strings.Trim over the separate calls to prefix/suffix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@zeroshade
Copy link
Member

If you rebase in from master, that'll pick up the fix for the failing macos tests. Once you add a test for the error case, I'm happy to merge this :)

@yevgenypats
Copy link
Contributor Author

If you rebase in from master, that'll pick up the fix for the failing macos tests. Once you add a test for the error case, I'm happy to merge this :)

Awesome. Done!

@zeroshade
Copy link
Member

@yevgenypats Turns out there was a typo in the go workflow yaml in the fix for the cgo macos tests, please rebase main again to get the Go workflows running again (we fixed the typo) thanks!

@yevgenypats
Copy link
Contributor Author

@yevgenypats Turns out there was a typo in the go workflow yaml in the fix for the cgo macos tests, please rebase main again to get the Go workflows running again (we fixed the typo) thanks!

@zeroshade done! I think this should be ready to go 🚢

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks much!

@zeroshade zeroshade merged commit 2f3f41f into apache:main Mar 8, 2023
@ursabot
Copy link

ursabot commented Mar 9, 2023

Benchmark runs are scheduled for baseline = 88d39d5 and contender = 2f3f41f. 2f3f41f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.09% ⬆️0.03%] test-mac-arm
[Finished ⬇️1.79% ⬆️0.0%] ursa-i9-9960x
[Failed ⬇️0.06% ⬆️0.06%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 2f3f41f0 ec2-t3-xlarge-us-east-2
[Finished] 2f3f41f0 test-mac-arm
[Finished] 2f3f41f0 ursa-i9-9960x
[Finished] 2f3f41f0 ursa-thinkcentre-m75q
[Finished] 88d39d5d ec2-t3-xlarge-us-east-2
[Failed] 88d39d5d test-mac-arm
[Finished] 88d39d5d ursa-i9-9960x
[Failed] 88d39d5d ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Go] CSV not handling all extensions
3 participants