Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MIR_scan_string_s #341

Open
wants to merge 3 commits into
base: v0_master
Choose a base branch
from
Open

Add MIR_scan_string_s #341

wants to merge 3 commits into from

Conversation

iacore
Copy link
Contributor

@iacore iacore commented Jun 4, 2023

Pass all test.

I'm not sure what the best API would be. At least MIR_scan_string_s should be safer.

More improvement opportunities: Once it is sure that no \0 is in input string, maybe other code can be simplified too?

@vnmakarov
Copy link
Owner

I am not sure about this PR. I am trying not to change API w/o significant advantages. But I'll think more about this.

@rofl0r
Copy link

rofl0r commented Jun 6, 2023

it would appear this PR makes code slower for no good reason other than to appear as "safe" as Annex K

@iacore
Copy link
Contributor Author

iacore commented Jun 7, 2023

I made this PR because I don't want to append the final \0 for mmaped files.

\0 can appear inside a normal file. The current parser stop parsing at first \0. Maybe it should be an error, or ignore the \0?

@snej
Copy link

snej commented Jul 31, 2023

Not all strings are nul-terminated. Raw text files are a common case, as mentioned above. And this comes up a lot in FFI: I’ve worked with several languages whose strings are passed into C as unterminated (pointer, length) pairs. IIRC both Go and Python do this. And then there’s C++’s std::string_view which can hold arbitrary substrings.

If the overhead of calling strlen is a problem, that could be removed; instead, just keep the nul check, and set the max len to a huge value in the existing function so the nul byte will be hit first. (There’s no valid reason to have a 00 byte in either ASCII or UTF-8 text.)

On the other hand, I see the string API as mostly for debugging, so is it really important to save the overhead of copying the text into a nul-terminated buffer?

@iacore
Copy link
Contributor Author

iacore commented Nov 30, 2023

If the overhead of calling strlen is a problem, that could be removed; instead, just keep the nul check, and set the max len to a huge value in the existing function so the nul byte will be hit first. (There’s no valid reason to have a 00 byte in either ASCII or UTF-8 text.)

I have applied the suggestion in the last commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants