Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop GSM8k generation at double new line #147

Merged
merged 2 commits into from
Apr 27, 2024

Conversation

OyvindTafjord
Copy link
Contributor

@OyvindTafjord OyvindTafjord commented Apr 12, 2024

This loosens the stopping criteria for GSM8k generation, to allow for a single newline which is sometimes generated at the start of an output, but stops at a double newline (as compatible with the prompt).

It's a bit hacky, but should work for tokenizers where "\n\n" is either one or two tokens. Often "\n\n" is indeed a single token, so this should actually be more effective than the previous "\n" stopping condition if the model actual does generate a double new line. FYI @dwadden

@hamishivi
Copy link
Collaborator

I'm a little hesitant to merge this without checking if it changes our old evaluations. I wonder if a good solution would be to make the stop sequence configurable to avoid this issue. @yizhongw thoughts?

@OyvindTafjord
Copy link
Contributor Author

Yeah, I agree the backwards compatibility is potentially awkward. We could put it behind a flag which would preserve earlier behavior. I'm not sure how strict you feel about not changing earlier evals. E.g., I noticed a typo in line 56 ("Quesion: " + example["question"]), is that something you also wouldn't want to change now?

@hamishivi
Copy link
Collaborator

Yeah, I think in general this is an awkward thing. I don't want to make changes when papers are in motion to avoid having to re-compute old evals, but also its worth tracking and updating these things, while also trying to avoid making some large number of flags. Really, coming up with some sort of 'release process'/versioning setup for this would be nice...

@dwadden
Copy link
Contributor

dwadden commented Apr 26, 2024

I need this PR for evals I'm running now. Instead of changing the existing gsm task, could we just add another one that allows newlines?

@OyvindTafjord
Copy link
Contributor Author

I added an option now so that this shouldn't change existing behavior unless the option --stop_at_double_newline is added to the run_eval call. E.g., in submit_eval_jobs.py would do this to get the new behavior:

    elif experiment_group == "gsm_cot":
        task_spec['arguments'][0] = '''
            python -m eval.gsm.run_eval \
            --data_dir /data/gsm/ \
            --max_num_examples 200 \
            --save_dir /output/ \
            --use_vllm \
            --model_name_or_path /model \
            --tokenizer_name_or_path /model \
            --n_shot 8 \
            --use_chat_format \
            --chat_formatting_function eval.templates.create_prompt_with_tulu_chat_format \
            --stop_at_double_newline
        ''' 

@hamishivi
Copy link
Collaborator

Thanks, thats fine! Probably not an ideal solution for all scenarios, but seems like a decent stopgap for now.

@hamishivi hamishivi merged commit f342459 into allenai:main Apr 27, 2024
1 check passed
@dwadden
Copy link
Contributor

dwadden commented Apr 27, 2024

Perfect, thanks! I'll rerun evals with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants