Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add arrow support for timestamp and bytea #724

Merged
merged 3 commits into from
Mar 6, 2023

Conversation

yevgenypats
Copy link
Member

@yevgenypats yevgenypats commented Mar 6, 2023

This should unblock cloudquery/filetypes#87

Add supports for timestamp to decode arrow string and for byte to decode base64 (backward compat is saved)

@codecov
Copy link

codecov bot commented Mar 6, 2023

Codecov Report

Patch coverage: 9.09% and project coverage change: -0.06 ⚠️

Comparison is base (e675cf0) 48.81% compared to head (ba1bd56) 48.75%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #724      +/-   ##
==========================================
- Coverage   48.81%   48.75%   -0.06%     
==========================================
  Files          70       70              
  Lines        6887     6895       +8     
==========================================
  Hits         3362     3362              
- Misses       3061     3069       +8     
  Partials      464      464              
Impacted Files Coverage Δ
schema/timestamptz.go 44.91% <0.00%> (-1.99%) ⬇️
schema/bytea.go 34.84% <16.66%> (-1.66%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@github-actions
Copy link

github-actions bot commented Mar 6, 2023

⏱️ Benchmark results

Comparing with e675cf0

  • DefaultConcurrencyDFS-2 resources/s: 10,649 ⬆️ 5.91% increase vs. e675cf0
  • DefaultConcurrencyRoundRobin-2 resources/s: 11,150 ⬇️ 11.82% decrease vs. e675cf0
  • Glob-2 ns/op: 195.7 ⬆️ 7.10% increase vs. e675cf0
  • TablesWithChildrenDFS-2 resources/s: 26,530 ⬇️ 11.39% decrease vs. e675cf0
  • TablesWithChildrenRoundRobin-2 resources/s: 29,373 ⬆️ 8.52% increase vs. e675cf0
  • TablesWithRateLimitingDFS-2 resources/s: 28.27 ⬇️ 0.32% decrease vs. e675cf0
  • TablesWithRateLimitingRoundRobin-2 resources/s: 797.7 ⬆️ 1.58% increase vs. e675cf0
  • BufferedScanner-2 ns/op: 9.397 ⬆️ 0.03% increase vs. e675cf0
  • LogReader-2 ns/op: 30.7 ⬇️ 0.07% decrease vs. e675cf0

Copy link
Member

@erezrokah erezrokah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, I have a couple of questions on the impact of the changes

@@ -183,7 +183,7 @@ func GenTestData(table *schema.Table) schema.CQTypes {
data[i] = uuidArrayColumn
case schema.TypeInet:
inetColumn := &schema.Inet{}
if err := inetColumn.Set("192.0.2.1/24"); err != nil {
if err := inetColumn.Set("192.0.2.0/24"); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's because the bitmask filters out the final .1, so it amounts to the same, but if Arrow rejects this type of IP we may need to handle it better

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 192.0.2.1./24 is basically the same as 192.0.2.0/24 after applying the mask.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yevgenypats Right, but was this change necessary for the tests to pass, or was it just some cleanup?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe Im confusing something about inet vs cidr . but I did need it for a test to pass because our own arrow inet converted it to 192.0.2.0/24 but I think this happened just from the original IPNet.String conversion so I suspect it should affect anything but maybe I missed something and didn't get to the root cause of this, not sure.

@@ -438,7 +438,7 @@ var calculateCQIDPrimaryKeyTestCases = []struct {
"MacAddressValue": "aa:bb:cc:dd:ee:ff",
"MacAddressArrayValue": []string{"aa:bb:cc:dd:ee:ff", "11:22:33:44:55:66"},
},
ExpectedValue: &UUID{Bytes: [16]uint8{0xa6, 0x76, 0x76, 0xfb, 0x4c, 0x95, 0x51, 0x2d, 0x9e, 0xcd, 0xf4, 0xc4, 0xae, 0xc1, 0x2a, 0xf5}, Status: 0x2},
ExpectedValue: &UUID{Bytes: [16]uint8{0x82, 0xaa, 0xa2, 0xc7, 0x75, 0x10, 0x5c, 0x6d, 0xb8, 0xef, 0x18, 0x49, 0x33, 0x99, 0x4a, 0x9d}, Status: 0x2},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this PR is backwards compatible I would expect existing tests to pass. Is this related to the change in testdata/testdata.go?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is backward compat the the binary can read older encoding but the encoding for the binary types moving forward will be base64. I don't think it should impact anything though.

@@ -41,7 +42,7 @@ func (dst *Bytea) Equal(src CQType) bool {

func (dst *Bytea) String() string {
if dst.Status == Present {
return hex.EncodeToString(dst.Bytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we use this String method? Is this not a breaking change as we return a different format of the string?
Trying to understand the impact of this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it is a breaking change (I did add a way for us to support both conversion in Set) but yes moving forward we will use base64 encoding for bytea to work like arrow. I can add a new type but I don't think we have anything that depends on that.

@erezrokah
Copy link
Member

Unrelated but I've disabled the codecov status checks in https://app.codecov.io/account/gh/cloudquery/yaml/

Copy link
Member

@erezrokah erezrokah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with the concern that we needed to update test data to make the tests pass. I would expect it to pass with existing data.
If we don't expect to receive data in the old format I guess this is ok

@kodiakhq kodiakhq bot merged commit c2e84c3 into main Mar 6, 2023
@kodiakhq kodiakhq bot deleted the fix/byte_timestamp_nits branch March 6, 2023 12:43
kodiakhq bot pushed a commit that referenced this pull request Mar 6, 2023
🤖 I have created a release *beep* *boop*
---


## [1.42.0](v1.41.0...v1.42.0) (2023-03-06)


### Features

* Add arrow support for timestamp and bytea ([#724](#724)) ([c2e84c3](c2e84c3))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
kodiakhq bot pushed a commit to cloudquery/filetypes that referenced this pull request Mar 6, 2023
Blocked by cloudquery/plugin-sdk#724

but ready for initial review and discussion. I can do a short walkthrough for anyone up for review.

Apache arrow fork is here (We use it until [this](apache/arrow#34454) is merged): https://github.com/cloudquery/arrow/tree/feat_extension_builder.

Some more notes:
- Right now we are just migrating the json writer/reader to use apache arrow so we can roll format by format and see if there are any real world issues before we roll this to everywhere instead of our own type system.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants