Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for deposit of files with unicode names #202

Open
mih opened this issue Mar 10, 2023 · 5 comments
Open

Add test for deposit of files with unicode names #202

mih opened this issue Mar 10, 2023 · 5 comments
Milestone

Comments

@mih
Copy link
Member

mih commented Mar 10, 2023

We know that we need directoryLabel mangeling. We do not know about files yet.

@mih
Copy link
Member Author

mih commented Mar 13, 2023

Needs a solution to #232 first.

@mih mih added this to the 1.1 release milestone Mar 16, 2023
@christian-monch
Copy link
Contributor

File names have certain restrictions in dataverse. The restrictions are described in issue #209. PR #211 adds dataverse-compatible encoding of file names.

With PR #240 all file names in a datalad dataset will be mapped on a unique dataverse-compatible name.

A remaining non-technical issue is that file names which contain Unicode characters will not be easily readable, which leads to the idea of using the Unidecode package to create dataverse-names that are "kind of readable", but not guaranteed to be unique, i.e. different datalad dataset file names might be mapped onto the same dataverse name.

It should also be mentioned that directory and file name restrictions due exclude a large number of ASCII characters from being used. These characters still need to be replaced. That can either be done "properly, but nasty" with PR #211 (which I would opt for), or "visually appealing, but faulty" by replacing all characters that are not allowed in dataverse-names with a "placeholder" character, for example with _.

@mih
Copy link
Member Author

mih commented Mar 17, 2023

All referenced PRs are merged now.

test_dataset.py seems to be the most appropriate layer for such a test.

@mih mih modified the milestones: 1.1 release, 1.0 Release Mar 17, 2023
@mih
Copy link
Member Author

mih commented Mar 17, 2023

Will not be ready in time. Reassigning milestone.

@pdurbin
Copy link

pdurbin commented Jun 3, 2024

Hi! We just opened this related issue:

It's not prioritized or anything, but we do want to let you know that we hear you! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants