Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use more ascii characters for more efficient uft-8 encoding #2

Open
KeinNiemand opened this issue Aug 26, 2024 · 1 comment
Open

Use more ascii characters for more efficient uft-8 encoding #2

KeinNiemand opened this issue Aug 26, 2024 · 1 comment

Comments

@KeinNiemand
Copy link

If you want your base256 encoding to be as efficient as possible you should use every available printable ascii character. In UTF-8 Ascii character is represented in 1 byte while anything beyond ascii is 2 bytes, so using all 94 (without space) or 95 (with space) and only using Unicode characters in the alphabet after the from 95-255 or 96-255..

@fleschutz
Copy link
Owner

fleschutz commented Aug 26, 2024

Yes, using more ASCII characters would be more efficient (1 byte vs 2 bytes). However, I decided against it for 2 reasons:

  1. Just small data is typically represented in Base256, e.g. 8/16/32/64/128 bytes for passwords/hashes/etc. Therefore, efficiency is not top priority.
  2. Please note that non-terminal characters have been used only for Base256 to support double-clicking for copy&paste. With terminal characters (+-/*.,%&=?...) this would be much more complicated and error-prone. Just think of period or commata at the end of Base256 data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@fleschutz @KeinNiemand and others