-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop GSM8k generation at double new line #147
Conversation
I'm a little hesitant to merge this without checking if it changes our old evaluations. I wonder if a good solution would be to make the stop sequence configurable to avoid this issue. @yizhongw thoughts? |
Yeah, I agree the backwards compatibility is potentially awkward. We could put it behind a flag which would preserve earlier behavior. I'm not sure how strict you feel about not changing earlier evals. E.g., I noticed a typo in line 56 ( |
Yeah, I think in general this is an awkward thing. I don't want to make changes when papers are in motion to avoid having to re-compute old evals, but also its worth tracking and updating these things, while also trying to avoid making some large number of flags. Really, coming up with some sort of 'release process'/versioning setup for this would be nice... |
I need this PR for evals I'm running now. Instead of changing the existing gsm task, could we just add another one that allows newlines? |
I added an option now so that this shouldn't change existing behavior unless the option
|
Thanks, thats fine! Probably not an ideal solution for all scenarios, but seems like a decent stopgap for now. |
Perfect, thanks! I'll rerun evals with this. |
This loosens the stopping criteria for GSM8k generation, to allow for a single newline which is sometimes generated at the start of an output, but stops at a double newline (as compatible with the prompt).
It's a bit hacky, but should work for tokenizers where "\n\n" is either one or two tokens. Often "\n\n" is indeed a single token, so this should actually be more effective than the previous "\n" stopping condition if the model actual does generate a double new line. FYI @dwadden