Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(tod.concatenate): Convert index map to unicode. #201

Merged
merged 1 commit into from
Jun 30, 2022

Conversation

tristpinsm
Copy link
Contributor

No description provided.

@jrs65
Copy link
Contributor

jrs65 commented Jun 29, 2022

@tristpinsm do we really need to deal with the case where the concatenation axis has a string type? I struggle to see when a time like axis that is fine for concatenation would have a string type.

@jrs65
Copy link
Contributor

jrs65 commented Jun 29, 2022

That would obviously simplify this PR as you could remove two of the chunks.

@tristpinsm
Copy link
Contributor Author

@tristpinsm do we really need to deal with the case where the concatenation axis has a string type? I struggle to see when a time like axis that is fine for concatenation would have a string type.

Yeah, I dunno. If you did pass it a dataset that fits that description with convert_dataset_strings=True wouldn't you expect it to do the conversion? I also can't think of an example where that happens but I don't see any reason why it's impossible.

In any case I'm fine with removing that part, just let me know.

Copy link
Contributor

@jrs65 jrs65 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to go for me. I think in the end it makes sense to keep the general behaviour for concatenating along string-types axes, there isn't anything which restricts it in the current implementation and it could be nice in the future.

caput/tod.py Outdated
@@ -350,13 +350,21 @@ def dataset_filter(d):
for axis, index_map in first_data.index_map.items():
if axis in concatenation_axes:
# Initialize the dataset.
dtype = index_map.dtype
if convert_dataset_strings and memh5.has_bytestring(index_map.dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory you could remove the memh5.has_bytestring clause as the conversion should be a no-op on non-bytestring types. Not sure if it's really worth it though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to say that at least leaving the clause in would save it some work on non- string types, but given that's not true as .has_bytestring does a very similar recursive crawl through the dtype structure, so you don't actually save anything. So probably remove it to simplify the implementation.

@tristpinsm tristpinsm merged commit 33316b2 into master Jun 30, 2022
@tristpinsm tristpinsm deleted the tpm/unicode_im branch June 30, 2022 18:12
@jrs65
Copy link
Contributor

jrs65 commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants