Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurations in yaml not working #6169

Open
tsor13 opened this issue Aug 23, 2023 · 4 comments
Open

Configurations in yaml not working #6169

tsor13 opened this issue Aug 23, 2023 · 4 comments

Comments

@tsor13
Copy link

tsor13 commented Aug 23, 2023

Dataset configurations cannot be created in YAML/README

Hello! I'm trying to follow the docs here in order to create structure in my dataset as added from here (#5331):

```yaml
---
configs:
- config_name: main_data
data_files: "main_data.csv"
- config_name: additional_data
data_files: "additional_data.csv"
---
```

I have the exact example in my config file for my data repo:

configs:
- config_name: main_data
  data_files: "main_data.csv"
- config_name: additional_data
  data_files: "additional_data.csv"

Yet, I'm unable to load different configurations:

from datasets import get_dataset_config_names
get_dataset_config_names('tsor13/test', use_auth_token=True)

returns a single split, ['tsor13--test']

Does anyone have any insights?

@polinaeterna thank you for adding this feature, it is super useful. Do you happen to have any ideas?

Steps to reproduce the bug

from datasets import get_dataset_config_names
get_dataset_config_names('tsor13/test')

Expected behavior

I would expect there to be two splits, main_data and additional_data. However, only ['tsor13--test'] test is returned.

Environment info

  • datasets version: 2.14.4
  • Platform: macOS-13.4-arm64-arm-64bit
  • Python version: 3.11.4
  • Huggingface_hub version: 0.16.4
  • PyArrow version: 12.0.1
  • Pandas version: 1.5.1
@mariosasko
Copy link
Collaborator

Unfortunately, I cannot reproduce this behavior on my machine or Colab - the reproducer returns ['main_data', 'additional_data'] as expected.

@tsor13
Copy link
Author

tsor13 commented Aug 23, 2023

Thank you for looking into this, Mario. Is this on my repository, or on another one that you have reproduced? Would you mind pointing me to it if so?

@tsor13
Copy link
Author

tsor13 commented Aug 23, 2023

Whoa, in colab I received the correct behavior using my dataset. It must have something to do with my local copy of datasets (which again just failed).

I've tried uninstalling/reinstnalling to no avail

@polinaeterna
Copy link
Contributor

hi @tsor13 , I haven't been able to reproduce your issue on tsor13/test dataset locally either. reinstalling doesn't help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants