Subject ID splits can get messed up if subjects are not simple int types. #114

mmcdermott · 2024-06-22T01:11:19Z

If your raw subject IDs are, for example, uint64s, there can be some issues in downstream processing as subject IDs are implicitly converted to signed ints and back in the subject ID split conversion process.

juancq · 2024-08-05T04:41:13Z

Is the mapping preserved somewhere between the raw subject ids and the esgpt subject ids? In previous versions this was preserved in subjects_df.parquet, and that no longer seems to be the case in the dev branch.

mmcdermott · 2024-08-05T04:48:12Z

It isn't, no. They are supposed to be the same, but clearly sometimes they aren't if type conversion causes issues. What data type are your subject ids? You may be able to make it store a mapping in a hacky way by adding your subject id as a static measurement of the single class classification modality? Though that may throw issues given the column would be used twice, I'm not sure off hand.

…

On Mon, Aug 5, 2024, 12:41 AM Juan Quiroz Aguilera ***@***.***> wrote: Is the mapping preserved somewhere between the raw subject ids and the esgpt subject ids? In previous versions this was preserved in subjects_df.parquet, and that no longer seems to be the case in the dev branch. — Reply to this email directly, view it on GitHub <#114 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADS5X7GHPML6V3KNAENKDLZP3677AVCNFSM6AAAAABJW64R7CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRYGE2TSOJTGU> . You are receiving this because you authored the thread.Message ID: ***@***.***>

juancq · 2024-08-06T02:42:33Z

The id is integer type. I thought they were not the same, but upon fixing a bug somewhere else, I can confirm now they are the same.

mmcdermott · 2024-08-06T03:31:33Z

Fantastic, glad to hear it. This issue should be pretty rare -- in cases where you have non-standard integral types the subject ID spaces can get misaligned, but for normal ints it should be fine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subject ID splits can get messed up if subjects are not simple int types. #114

Subject ID splits can get messed up if subjects are not simple int types. #114

mmcdermott commented Jun 22, 2024

juancq commented Aug 5, 2024

mmcdermott commented Aug 5, 2024 via email

juancq commented Aug 6, 2024

mmcdermott commented Aug 6, 2024

Subject ID splits can get messed up if subjects are not simple int types. #114

Subject ID splits can get messed up if subjects are not simple int types. #114

Comments

mmcdermott commented Jun 22, 2024

juancq commented Aug 5, 2024

mmcdermott commented Aug 5, 2024 via email

juancq commented Aug 6, 2024

mmcdermott commented Aug 6, 2024