Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary size - compress word lists? #31

Open
Samyak2 opened this issue May 7, 2022 · 3 comments
Open

Binary size - compress word lists? #31

Samyak2 opened this issue May 7, 2022 · 3 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@Samyak2
Copy link
Owner

Samyak2 commented May 7, 2022

What?

As new word lists get added (#17, 7c049c5, #27), the size of the final release binary increases since these are embedded directly into it.

Binary size does not matter much, but it would be nice to have it as small as reasonably possible.

How?

This issue has been opened to discuss possible ways to reduce the binary size further.

Some of my ideas:

Compression

The word lists could be compressed using a lightweight compression algo (TBD) at compile time i.e., the word lists will stay the same as they are, but will be embedded into the binary after compressing them. At run time, they will need to be decompressed before using them.

The compression library needs to be lightweight because I wouldn't want to add more bloat from the library than what it saves in compression xD.

It would also be nice to keep the syntax of including word lists similar to the existing approach - include_str!("word_lists/top250") i.e., through a macro.

Host on the internet

A way to avoid embedding the word lists into the binary is to host them somewhere (probaby GitHub?) and have the app download them as required. This will reduce the baggage on the binary, but it's only useful when there are an extremely large number of large word lists. I'm also opposed to the idea of making what is a fully offline app into an internet connected app.

A custom format

This is a crazy one. What if toipe had its own custom format to store word lists in that is optimized only for storing a list of words. It will be a binary format, but I'm not sure of anything else.

Additional context

A change related to binary size was suggested before - #23 (fixed in #24)

To measure the binary size, use:

cargo build --release
ls -lah target/release/toipe
@Samyak2 Samyak2 added enhancement New feature or request help wanted Extra attention is needed labels May 7, 2022
@roysti10
Copy link

roysti10 commented Jun 9, 2022

How about storing it as a trie or Radix Tree

@Samyak2
Copy link
Owner Author

Samyak2 commented Jun 11, 2022

How about storing it as a trie or Radix Tree

@lucasace that's a good way to compress the data. Although, I'm not sure if it will help in selecting random words in a uniform manner from the list. Do you have any ideas about implementing that in a trie or a radix tree?

@roysti10
Copy link

roysti10 commented Jun 14, 2022

@Samyak2 I think my suggestion is a little wrong, even though a trie or radix tree might be faster in computation of the random word(not entirely sure of this), If im not wrong, what you want to do is find a way to store the word lists by reducing the size of the file

Taking inspiration from the trie maybe you could store words in the following fashion
where the number represents the number of letters taken from the 0th word and the letters represent the letters to be added
A example taken from below would be <5,t> which amounts to applet by taking the first 5 letters from apple
Note:- // is used a comment

0, apple
5, t  // applet
3, ear //appear 
0, ball
3, d //bald
...
...

Now one drawback in storing it this way would be the extra computation of finding the most recent '0'th word in the series, though it might be that heavy it would be nice to avoid it altogether. A variation of the above in storing it could be

0, 0, apple
5, 1, t
3, 2, ear
0, 0, ball
3, 1, d

Where the second integer represents number of steps to the 0th word

And as for random words in tries this might help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants