You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered the following error when finetuning OpenCLIP on my own data:
File "./src/training/data.py", line 281, in group_by_keys_nothrow
fname, value = filesample["fname"], filesample["data"]
KeyError: 'fname'
This issue has been previously proposed in Webdataset issue #384. The problem appears to stem from a conflict in loading datasets in WebDataset format within OpenCLIP:
The function tarfile_to_samples_nothrow is added to the pipeline;
When the data in this thread is exhausted, {} (the default value of eof_value) is returned, as implemented in webdataset.
When the given {} is passed to group_by_keys_nothrow , the error is triggered.
As I am new to working with WebDatasets and not fully familiar with the underlying principles, I would appreciate any guidance or solutions you can provide.
The text was updated successfully, but these errors were encountered:
This is a breaking change from webdataset and not sure if the right solution is to ask OpenCLIP to fix it or to ask webdataset to change the default value.
hmm, that's a bit of a pain. Looks like webdataset might have been trying to fix an issue that I ran into and why I added this extra bit of code (nowthrow) in the first place, some webdatasets had colliding filenames across shards and would cause a crash. I think this might prevent that and other possible issues (completeing a sample with bits from two different shards)...
To fix I'd have to force using the latest webdataset and remove these hacks. And the painful part is verifying it all works :/
@rwightman Thank you for the detailed explanation regarding the cause of the issue. I really appreciate your efforts in maintaining the code.
Given the situation, do you think @fartashf's method could be a viable solution to retain your hacks while ensuring compatibility with the latest version of webdataset?
I encountered the following error when finetuning OpenCLIP on my own data:
This issue has been previously proposed in Webdataset issue #384. The problem appears to stem from a conflict in loading datasets in WebDataset format within OpenCLIP:
tarfile_to_samples_nothrow
is added to the pipeline;tarfile_to_samples_nothrow
,tar_file_expander
is called to iterate over opened tar file.{}
(the default value ofeof_value
) is returned, as implemented in webdataset.{}
is passed togroup_by_keys_nothrow
, the error is triggered.As I am new to working with WebDatasets and not fully familiar with the underlying principles, I would appreciate any guidance or solutions you can provide.
The text was updated successfully, but these errors were encountered: