Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructured how commands like list, help, quit are parsed to implement save #555

Merged
merged 32 commits into from
Sep 24, 2024

Conversation

rben01
Copy link
Contributor

@rben01 rben01 commented Sep 7, 2024

Closes #187

To implement the save command, I implemented a simple command parser. This handles the following commands:

  • help
  • info
  • clear
  • quit / exit
  • list / ls
  • save (new)

These were formerly implemented as a bespoke command matcher, matching on e.g., "list functions" | "ls functions". Now they are fully parsed, which both enables more flexibility such as additional whitespace between arguments, and error reporting in the event that a command is misused. For instance, before, list foo would result in

>>> list foo
error: while type checking
  ┌─ <input:2>:1:1
  │
1 │ list foo
  │ ^^^^ unknown identifier
  │
  = Did you mean 'pint'?

Now it results in

>>> list foo
error: while parsing
  ┌─ <input:1>:1:6
  │
1 │ list foo
  │      ^^^ Invalid command: if provided, the argument to `list` or `ls` must be one of: functions, dimensions, variables, units

And

>>> help sin
error: while type checking
  ┌─ <input:4>:1:1
  │
1 │ help sin
  │ ^^^^ unknown identifier
  │
  = Did you mean 'hex'?

is now

>>> help sin
error: while parsing
  ┌─ <input:6>:1:6
  │
1 │ help sin
  │      ^^^ Invalid command: `help` takes 0 arguments; use `info <item>` for information about an item

Similar error reporting is implemented for the other commands.

But all of this was merely done to provide an easy way to add the save command — I didn't just want to continue down the path of ad-hoc string parsing. That said, currently the argument to save must have no whitespace characters in it, or else it will parse as multiple arguments. Nor does it process backslash-escape characters. I was debating whether to use Tokenizer to parse the argument the same way a String is parsed, but I figured I'd keep this PR small. That can be added later.

The save command now looks like

save hist.nbt

This will unconditionally overwrite the specified file if it exits. If writing fails, the error looks like

>>> save .
error: runtime error
 = Could not write to file: "."

Some improvements that might be desirable in the future:

  • Filenames with spaces, presumably using ordinary string syntax (which would also handle backslash escape sequences)
    • Interpolation? Not sure how much utility this would have but it would be natural if using strings as an argument
  • Confirmation when overwriting an existing file
  • Omitting lines that errored. IIUC this would require some changes to how history is stored, since it only stores strings and not results
  • Save per-session history, not entire history over all sessions ever

rben01 and others added 8 commits September 7, 2024 13:30
…ible parsing (eg tolerant of arbitrary whitespace)

Commands report `ParseError`s if the syntax is invalid (eg too many or not enough args)
This required making `Resolver::add_code_source` public and adding `Context::resolver_mut` to track source code positions of commands
Added some helper functions for commands
@sharkdp
Copy link
Owner

sharkdp commented Sep 8, 2024

Thank you very much for your contribution — great changeset!

I implemented a simple command parser

Thanks, this is something I had planned but never got around to it 😍. One huge benefit of your approach is that command parsing moves from the application (numbat-cli) to the library (numbat). This opens the possibility to offer command parsing / handling to other "frontends". For example, I think we could simplify this JS code as well:

if (input_trimmed == "clear") {
this.clear();
var output = "";
} else if (input_trimmed == "reset") {
numbat = create_numbat_instance();
numbat.interpret("use units::currencies");
combined_input = "";
updateUrlQuery(null);
this.clear();
} else if (input_trimmed == "list" || input_trimmed == "ls") {
output = numbat.print_environment();
} else if (input_trimmed == "list functions" || input_trimmed == "ls functions") {
output = numbat.print_functions();
} else if (input_trimmed == "list dimensions" || input_trimmed == "ls dimensions") {
output = numbat.print_dimensions();
} else if (input_trimmed == "list variables" || input_trimmed == "ls variables") {
output = numbat.print_variables();
} else if (input_trimmed == "list units" || input_trimmed == "ls units") {
output = numbat.print_units();
} else if (input_trimmed == "help" || input_trimmed == "?") {
output = numbat.help();
} else {
var result = { is_error: false };
if (input_trimmed.startsWith("info ")) {
var keyword = input_trimmed.substring(4).trim();
output = numbat.print_info(keyword);
} else {
result = numbat.interpret(input);
output = result.output;
}
if (!result.is_error) {
combined_input += input.trim() + "⏎";
updateUrlQuery(combined_input);
}
}

In order to reduce code duplication even more (among frontends), we could potentially even move part of the command handling to the backend (library) as well. The whole "list xyz" => call ctx.print_xyz() logic, for example.

One complication is the following: not all frontends support the same set of commands. save would not be supported by the Web frontend, for example. And some commands definitely need frontend-specific code. Like a future copy command (#394).

I'm not suggesting that this should happen in this PR, by the way. Only if you are interested.

Now they are fully parsed, which both enables more flexibility such as additional whitespace between arguments, and error reporting in the event that a command is misused.

Very nice

That said, currently the argument to save must have no whitespace characters in it, or else it will parse as multiple arguments. Nor does it process backslash-escape characters. I was debating whether to use Tokenizer to parse the argument the same way a String is parsed, but I figured I'd keep this PR small. That can be added later.

I agree.

This will unconditionally overwrite the specified file if it exits.

I think that's completely fine. And maybe even a better UX than asking (interactively) whether the file should be overwritten or not.

/// or kind of arguments, e.g. `list foobar`
pub fn parse_command<'a>(
input: &'a str,
resolver: &mut Resolver,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of passing in a Resolver reference, could we also pass in a code_source_id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely. That is much cleaner.

I realize however that there is a bug in this PR, which is that for non-command inputs we end up running resolver.add_code_source twice, which results in line numbers being off by a factor of ~2 for erroneous non-command inputs. (The precise factor depends on how many commands we've run, since those only trigger resolve.add_code_source once.) I'm running into a bit of a chicken and egg problem: to report errors in commands requires a code source up front, but we don't know if we're even parsing a command until we parse at all. An obvious solution is to double-parse — check if we have a command at all (probably quite cheaply, using split_whitespace to just grab the first word) and then if we do, fall back to a full parse with a code_source_id. This way we only call resolve.add_code_source at most once per input. Does this sound good?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize however that there is a bug in this PR, which is that for non-command inputs we end up running resolver.add_code_source twice, which results in line numbers being off by a factor of ~2 for erroneous non-command inputs. (The precise factor depends on how many commands we've run, since those only trigger resolve.add_code_source once.)

Minor: it's not really the line number that is wrong, but the "code source ID", right? The number in <input:…> angle brackets.

I'm running into a bit of a chicken and egg problem: to report errors in commands requires a code source up front, but we don't know if we're even parsing a command until we parse at all. An obvious solution is to double-parse — check if we have a command at all (probably quite cheaply, using split_whitespace to just grab the first word) and then if we do, fall back to a full parse with a code_source_id. This way we only call resolve.add_code_source at most once per input. Does this sound good?

Hm. That would be a good reason to keep your old solution maybe? Or could you pass in a get_code_source_id callable that would only be executed in the error case?

Copy link
Contributor Author

@rben01 rben01 Sep 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: it's not really the line number that is wrong, but the "code source ID", right? The number in input:… angle brackets.

Yes, that was the issue.

Hm. That would be a good reason to keep your old solution maybe? Or could you pass in a get_code_source_id callable that would only be executed in the error case?

My old solution never actually worked. What I've done in my latest commit is split the parser into a struct SourcelessCommandParser containing just the input, and then the CommandParser wraps that and a code_source_id. SourcelessCommandParser has a fallible initializer that only succeeds if we are in fact looking at a command line. Only if it succeeds do we go ahead and generate the code_source_id, and then we have everything we need to parse the command. The key is that you can't construct a CommandParser in the first place unless it's been proven that we have a valid command line (meaning it starts with a command word, not that all the arguments are ok), so we never unnecessarily generate a code_source_id.

In addition, now that I've stuck everything in CommandParser (thanks for that advice), initialization of the parser is split from parsing, which makes for an overall cleaner API (returning Result instead of Option<Result>)


/// For tracking spans. Contains `(start, start+len)` for each (whitespace-separated)
/// word in the input
fn get_word_boundaries(input: &str) -> Vec<(u32, u32)> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be implemented using str::split_whitespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think split_whitespace keeps indices, which is what I need. I thought of using char_indices and splitting on a predicate, but that would require two Vecs which seemed like a needless allocation.

numbat/src/command.rs Outdated Show resolved Hide resolved
Note that there is a bug where non-command inputs run `add_course_source` twice, which leads to incorrect line numbers
…and input

How: split `CommandParser` into a second struct, `SourcelessCommandParser`, that contains just the input, no `code_source_id`
`SourcelessCommandParser` has a fallible initializer that only returns `Some(Self)` if the line is indeed a command (ie starts with a command word)
Then we simply check if this initializer succeeded; if so then make a new `code_source_id` and construct the full `CommandParser`, then parse the command
…chable!`)

Made "list" report incorrect number of args before reporting invalid arg, in the event that both errors were present
numbat/src/command.rs Outdated Show resolved Hide resolved
numbat/src/command.rs Outdated Show resolved Hide resolved
numbat/src/command.rs Outdated Show resolved Hide resolved
Replaced `ensure_zero_args!` macro with function, now that this is possible (since we can use `format!` instead of `concat!)
…t them from being removed when removing trailing whitespace on save
@rben01
Copy link
Contributor Author

rben01 commented Sep 8, 2024

One complication is the following: not all frontends support the same set of commands. save would not be supported by the Web frontend, for example. And some commands definitely need frontend-specific code. Like a future copy command (#394).

I will think about how to implement frontend-specific code for a separate PR. Where is the frontend we're using stored at runtime?

@rben01
Copy link
Contributor Author

rben01 commented Sep 11, 2024

Also, regarding this:

This will unconditionally overwrite the specified file if it exits.

I think that's completely fine. And maybe even a better UX than asking (interactively) whether the file should be overwritten or not.

I was also considering having save and save!, the former asking for confirmation if overwriting and the latter overwriting without confirmation. If desirable, then I should probably change save to save! in this PR and the come back and implement the save command in the future. This might be a desirable change anyway just to indicate that we're going to overwrite (exclamation points indicate convey that something scary is happening) — users may exercise a tad more caution if it's made clear that they aren't getting a second chance.

@rben01
Copy link
Contributor Author

rben01 commented Sep 11, 2024

In a future PR I'd also like to implement better discoverability of commands. Either help commands or list commands would be a good addition to actually list them all similar to what you'd get from cmd --help on the command line, and shouldn't be hard to do if we're willing to hard-code them, which seems perfectly reasonable. I would lean toward help commands just because list seems to be for runtime-specific values, whereas commands live in a totally separate namespace. I also think it'd make sense to list commands, or at least how to get a list of them, in help’s output. But again, future PR.

@sharkdp
Copy link
Owner

sharkdp commented Sep 11, 2024

I was also considering having save and save!, the former asking for confirmation if overwriting and the latter overwriting without confirmation. If desirable, then I should probably change save to save! in this PR and the come back and implement the save command in the future. This might be a desirable change anyway just to indicate that we're going to overwrite (exclamation points indicate convey that something scary is happening) — users may exercise a tad more caution if it's made clear that they aren't getting a second chance.

I think I would vote for keeping things easy and just using save with a overwrite-default.

In a future PR I'd also like to implement better discoverability of commands. Either help commands or list commands would be a good addition to actually list them all similar to what you'd get from cmd --help on the command line, and shouldn't be hard to do if we're willing to hard-code them, which seems perfectly reasonable. I would lean toward help commands just because list seems to be for runtime-specific values, whereas commands live in a totally separate namespace. I also think it'd make sense to list commands, or at least how to get a list of them, in help’s output. But again, future PR.

Sounds good, but note that the commands depend on the frontend. Speaking of, we should also update the documentation:

https://numbat.dev/doc/cli-usage.html#commands
https://numbat.dev/doc/web-usage.html#commands

see the corresponding markdown files

@sharkdp
Copy link
Owner

sharkdp commented Sep 13, 2024

Save per-session history, not entire history over all sessions ever

Oh, I just found out about this after trying it. I think this is something we should implement before we merge this. Saving the entire history ever (1000 lines for me) is not very helpful.

Omitting lines that errored. IIUC this would require some changes to how history is stored, since it only stores strings and not results

I don't think we need to do that for the global history. But for the local one (for the save command), that would be useful. We do something similar in the web frontend:

if (!result.is_error) {
combined_input += input.trim() + "⏎";
updateUrlQuery(combined_input);
}

@rben01
Copy link
Contributor Author

rben01 commented Sep 14, 2024

Oh, I just found out about this after trying it. I think this is something we should implement before we merge this. Saving the entire history ever (1000 lines for me) is not very helpful.

Shouldn't be hard to implement a per-session history. Two questions:

  1. Should per-session history be a separate PR (to land before this one) or part of this one? This PR is already quite large, I'm thinking separate.
  2. Should the implementation go in numbat or numbat-cli?

@sharkdp
Copy link
Owner

sharkdp commented Sep 15, 2024

Sorry if that wasn't clear. I think ideally, the history as we have it right now on master should stay. This is helpful for Ctrl-R searching for commands from previous sessions.

But the save command should only store the commands of the current session. In order to turn a interactive session into a Numbat script that can be fine tuned in a text editor.

… `run_repl`

Therefore changed `parse_and_evalute` to return a struct of `control_flow, errored` to let session history know whether the command in question errored
Added tests to `session_history.rs`
Code seems overall not terrible now
numbat/src/command.rs Outdated Show resolved Hide resolved
numbat/src/command.rs Outdated Show resolved Hide resolved
numbat/src/command.rs Outdated Show resolved Hide resolved
numbat/src/command.rs Outdated Show resolved Hide resolved
return Err(self.err_through_end_from(2, "`list` takes at most one argument"));
}

let items = match items {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any value to making this case insensitive?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I'd leave it case sensitive for now.

@sharkdp sharkdp merged commit 21b09da into sharkdp:master Sep 24, 2024
15 checks passed
@sharkdp
Copy link
Owner

sharkdp commented Sep 24, 2024

Thank you very much! Great work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'save <filename>' command for the REPL
2 participants