Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subject ID splits can get messed up if subjects are not simple int types. #114

Open
mmcdermott opened this issue Jun 22, 2024 · 4 comments

Comments

@mmcdermott
Copy link
Owner

If your raw subject IDs are, for example, uint64s, there can be some issues in downstream processing as subject IDs are implicitly converted to signed ints and back in the subject ID split conversion process.

@juancq
Copy link
Contributor

juancq commented Aug 5, 2024

Is the mapping preserved somewhere between the raw subject ids and the esgpt subject ids? In previous versions this was preserved in subjects_df.parquet, and that no longer seems to be the case in the dev branch.

@mmcdermott
Copy link
Owner Author

mmcdermott commented Aug 5, 2024 via email

@juancq
Copy link
Contributor

juancq commented Aug 6, 2024

The id is integer type. I thought they were not the same, but upon fixing a bug somewhere else, I can confirm now they are the same.

@mmcdermott
Copy link
Owner Author

Fantastic, glad to hear it. This issue should be pretty rare -- in cases where you have non-standard integral types the subject ID spaces can get misaligned, but for normal ints it should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants