Skip to content

Commit

Permalink
Add completion options for evaluation with multi-choice problem datasets
Browse files Browse the repository at this point in the history
  • Loading branch information
kooyunmo committed Aug 11, 2023
1 parent 73e5689 commit 0ac46bd
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 0 deletions.
4 changes: 4 additions & 0 deletions docs/docs/sdk/api/completion.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,8 @@ following schema.
| `bad_words` | `Optional[List[str]]` | `None` |
| `bad_word_tokens` | `Optional[List[TokenSequence]]` | `None` |
| `include_output_logits` | `Optional[bool]` | `None` |
| `include_output_logprobs` | `Optional[bool]` | `None` |
| `forced_output_tokens` | `Optional[List[int]]` | `None` |
| `eos_token` | `Optional[List[int]]` | `None` |

Followings are the descriptions for each field.
Expand Down Expand Up @@ -105,6 +107,8 @@ Followings are the descriptions for each field.
- **bad_words**: Text phrases that should not be generated. For a bad word phrase that contains N tokens, if the first N-1 tokens appears at the last of the generated result, the logit for the last token of the phrase is set to -inf. We recommend using `bad_word_tokens` because it is clearer (more details in the document for `stop` field). Defaults to empty list.
- **bad_word_tokens**: Same as the above `bad_words` field, but receives token sequences instead of text phrases. This is similar to Hugging Face's <a href="https://huggingface.co/docs/transformers/v4.26.0/en/main_classes/text_generation#transformers.GenerationConfig.bad_words_ids(List[List[int]]," target="_top">`bad_word_ids`</a> argument.
- **include_output_logits**: Whether to include the output logits to the generation output.
- **include_output_logprobs**: Whether to include the output logprobs to the generation output.
- **forced_output_tokens**: A token sequence that is enforced as a generation output. This option can be used when evaluating the model for the datasets with multi-choice problems (e.g., [HellaSwag](https://huggingface.co/datasets/hellaswag), [MMLU](https://huggingface.co/datasets/cais/mmlu)). When this option is used with `include_output_logits` or `include_output_logprobs` option, you can easily get the logits or logprobs for the evaluation.
- **eos_token**: A list of endpoint sentence tokens.

:::note
Expand Down
4 changes: 4 additions & 0 deletions periflow/schema/api/v1/completion.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,10 @@ class V1CompletionOptions(BaseModel):
bad_words: Optional[List[str]] = None # List of bad words.
bad_word_tokens: Optional[List[TokenSequence]] = None # List of bad word tokens.
include_output_logits: Optional[bool] = None # Include logits in the output.
include_output_logprobs: Optional[bool] = None # Include logprobs in the output.
forced_output_tokens: Optional[
List[int]
] = None # List of tokens enforced to be generated.
eos_token: Optional[List[int]] = None # List of EOS tokens.


Expand Down
4 changes: 4 additions & 0 deletions periflow/sdk/api/completion.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,8 @@ def create(
| `bad_words` | `Optional[List[str]]` | `None` |
| `bad_word_tokens` | `Optional[List[TokenSequence]]` | `None` |
| `include_output_logits` | `Optional[bool]` | `None` |
| `include_output_logprobs` | `Optional[bool]` | `None` |
| `forced_output_tokens` | `Optional[List[int]]` | `None` |
| `eos_token` | `Optional[List[int]]` | `None` |
Followings are the descriptions for each field.
Expand Down Expand Up @@ -128,6 +130,8 @@ def create(
- **bad_words**: Text phrases that should not be generated. For a bad word phrase that contains N tokens, if the first N-1 tokens appears at the last of the generated result, the logit for the last token of the phrase is set to -inf. We recommend using `bad_word_tokens` because it is clearer (more details in the document for `stop` field). Defaults to empty list.
- **bad_word_tokens**: Same as the above `bad_words` field, but receives token sequences instead of text phrases. This is similar to Hugging Face's <a href="https://huggingface.co/docs/transformers/v4.26.0/en/main_classes/text_generation#transformers.GenerationConfig.bad_words_ids(List[List[int]]," target="_top">`bad_word_ids`</a> argument.
- **include_output_logits**: Whether to include the output logits to the generation output.
- **include_output_logprobs**: Whether to include the output logprobs to the generation output.
- **forced_output_tokens**: A token sequence that is enforced as a generation output. This option can be used when evaluating the model for the datasets with multi-choice problems (e.g., [HellaSwag](https://huggingface.co/datasets/hellaswag), [MMLU](https://huggingface.co/datasets/cais/mmlu)). When this option is used with `include_output_logits` or `include_output_logprobs` option, you can easily get the logits or logprobs for the evaluation.
- **eos_token**: A list of endpoint sentence tokens.
:::note
Expand Down

0 comments on commit 0ac46bd

Please sign in to comment.