Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto generate unicode property tests. #67

Merged
merged 1 commit into from
Dec 31, 2021
Merged

Auto generate unicode property tests. #67

merged 1 commit into from
Dec 31, 2021

Conversation

zherczeg
Copy link
Collaborator

I have created a new test (test26) which is auto-generated by a script in maint. This test should be recreated after a unicode update to a new version.

@zherczeg
Copy link
Collaborator Author

I wanted to generate negative tests, but I got unexpected (I could says disturbing) results.

For example Scripts.txt says:
A8F2..A8F7 ; Devanagari # Lo [6] DEVANAGARI SIGN SPACING CANDRABINDU..DEVANAGARI SIGN CANDRABINDU AVAGRAHA

And ScriptExtensions.txt says:
A8F3 ; Deva Taml # Lo DEVANAGARI SIGN CANDRABINDU VIRAMA

My question is: if a character is defined as Devanagari, why it is defined as a script extension for Devanagari? This is completely illogical and looks like a bug in the data files. Does anybody know the reason? If we cannot trust the Script Extension data, we need to rewrite the generator completely :(

@zherczeg zherczeg force-pushed the ucp_test branch 2 times, most recently from 14d338e to 32ff5bb Compare December 30, 2021 08:03
@zherczeg
Copy link
Collaborator Author

I think the script overlapping problem does not cause any practical problem at the moment, but it is good to keep in mind. The generator removes the not-working negative tests by parsing the Scripts.txt. It does not try to find another negative test though, so currently there is no negative test for "Cyrillic" script.

@zherczeg
Copy link
Collaborator Author

Reworked the script to include normal scripts. This way we can check that our hand maintained datasets are correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants