add sample_id to distributed file splitting #84

oHunewald · 2020-02-04T08:09:58Z

Adding a sample_id column to the splitted files or in a separated vector in each worker to re-construct the relation between the training data after splitting.

laurentheirendt · 2020-03-09T14:03:59Z

@oHunewald, still relevant?

exaexa · 2020-03-24T15:19:47Z

I guess this is still relevant although not that pressing. A standalone sample_id vector can be constructed manually using a tiny modification of the current data loading code. Guess we could just add a helper function?

laurentheirendt self-assigned this Feb 4, 2020

exaexa mentioned this issue Apr 30, 2020

Implement easy computation of per-file stats #129

Merged

laurentheirendt closed this as completed Apr 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add sample_id to distributed file splitting #84

add sample_id to distributed file splitting #84

oHunewald commented Feb 4, 2020

laurentheirendt commented Mar 9, 2020

exaexa commented Mar 24, 2020

add sample_id to distributed file splitting #84

add sample_id to distributed file splitting #84

Comments

oHunewald commented Feb 4, 2020

laurentheirendt commented Mar 9, 2020

exaexa commented Mar 24, 2020