Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes infill incorrect tokenization #3508

Merged
merged 15 commits into from
Oct 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 33 additions & 4 deletions examples/infill/infill.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -233,10 +233,22 @@ int main(int argc, char ** argv) {
const bool add_bos = llama_vocab_type(model) == LLAMA_VOCAB_TYPE_SPM;
LOG("add_bos: %d\n", add_bos);

bool suff_rm_leading_spc = params.escape;
if (suff_rm_leading_spc && params.input_suffix.find_first_of(" ") == 0 && params.input_suffix.size() > 1) {
params.input_suffix.erase(0, 1);
suff_rm_leading_spc = false;
}
Comment on lines +236 to +240
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I could be missing something, but this whole section should not be needed.

If you keep everything else and just replace this block with:

Suggested change
bool suff_rm_leading_spc = params.escape;
if (suff_rm_leading_spc && params.input_suffix.find_first_of(" ") == 0 && params.input_suffix.size() > 1) {
params.input_suffix.erase(0, 1);
suff_rm_leading_spc = false;
}
const bool suff_rm_leading_spc = true;

Can you find a case that does not match the Python version?

Copy link
Contributor Author

@vvhg1 vvhg1 Oct 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I can find cases where it misbehaves:
with params.escape, if the suffix starts with a space, the tokenizer adds a space and THEN tokenizes the double space to 259 so we can't remove the surplus space anymore but would have to "translate" the 259 to 29871 to remove the space, same with more than one leading spaces...
without params.escape if the suffix starts with a space, we remove it so it does not reflect the original suffix

not sure what the python version does as I can't test it, but the above behaviour seems wrong

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense. Let's double check, just in case:
@kherud Could you please run the Python codellama with suffix with 2 leading spaces and see what tokens are produced for infill?

Copy link
Contributor

@kherud kherud Oct 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'<PRE> <SUF>': 1, 32007, 259, 32008, 32009
'<PRE> <SUF> ': 1, 32007, 259, 32008, 29871, 32009
'<PRE> <SUF>  ': 1, 32007, 259, 32008, 259, 32009
'<PRE> <SUF>   ': 1, 32007, 259, 32008, 1678, 32009
'<PRE> <SUF>    ': 1, 32007, 259, 32008, 268, 32009

std::vector<llama_token> embd_inp;
std::vector<llama_token> inp_pfx = ::llama_tokenize(ctx, params.input_prefix, add_bos);
std::vector<llama_token> inp_sfx = ::llama_tokenize(ctx, params.input_suffix, add_bos);
std::vector<llama_token> inp_pfx = ::llama_tokenize(ctx, params.input_prefix, false);
std::vector<llama_token> inp_sfx = ::llama_tokenize(ctx, params.input_suffix, false);
const int space_token = 29871;
if (suff_rm_leading_spc && inp_sfx[0] == space_token) {
inp_sfx.erase(inp_sfx.begin());
}
inp_pfx.insert(inp_pfx.begin(), llama_token_prefix(ctx));
if (add_bos) {
inp_pfx.insert(inp_pfx.begin(), llama_token_bos(ctx));
}
inp_sfx.insert(inp_sfx.begin(), llama_token_suffix(ctx));
embd_inp = inp_pfx;
embd_inp.insert(embd_inp.end(), inp_sfx.begin(), inp_sfx.end());
Expand Down Expand Up @@ -627,10 +639,27 @@ int main(int argc, char ** argv) {
buffer.clear();
// done taking input, reset color
console::set_display(console::reset);

if (params.escape) {
//process escape sequences, for the initial prompt this is done in common.cpp when we load the params, but for the interactive mode we need to do it here
process_escapes(params.input_prefix);
process_escapes(params.input_suffix);
}
suff_rm_leading_spc = params.escape;
if (suff_rm_leading_spc && params.input_suffix.find_first_of(" ") == 0 && params.input_suffix.size() > 1) {
params.input_suffix.erase(0, 1);
suff_rm_leading_spc = false;
}
// tokenize new prefix and suffix
std::vector<llama_token> inp_pfx = ::llama_tokenize(ctx, params.input_prefix, add_bos);
std::vector<llama_token> inp_sfx = ::llama_tokenize(ctx, params.input_suffix, add_bos);
std::vector<llama_token> inp_pfx = ::llama_tokenize(ctx, params.input_prefix, false);
std::vector<llama_token> inp_sfx = ::llama_tokenize(ctx, params.input_suffix, false);
if (suff_rm_leading_spc && inp_sfx[0] == space_token) {
inp_sfx.erase(inp_sfx.begin());
}
inp_pfx.insert(inp_pfx.begin(), llama_token_prefix(ctx));
if (add_bos) {
inp_pfx.insert(inp_pfx.begin(), llama_token_bos(ctx));
}
inp_sfx.insert(inp_sfx.begin(), llama_token_suffix(ctx));
embd_inp = inp_pfx;
embd_inp.insert(embd_inp.end(), inp_sfx.begin(), inp_sfx.end());
Expand Down
15 changes: 13 additions & 2 deletions examples/server/server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -344,9 +344,20 @@ struct llama_server_context

void loadInfill()
{
auto prefix_tokens = tokenize(params.input_prefix, true); // always add BOS
auto suffix_tokens = tokenize(params.input_suffix, true); // always add BOS
bool suff_rm_leading_spc = true;
if (params.input_suffix.find_first_of(" ") == 0 && params.input_suffix.size() > 1) {
params.input_suffix.erase(0, 1);
suff_rm_leading_spc = false;
}

auto prefix_tokens = tokenize(params.input_prefix, false);
auto suffix_tokens = tokenize(params.input_suffix, false);
const int space_token = 29871;
if (suff_rm_leading_spc && suffix_tokens[0] == space_token) {
suffix_tokens.erase(suffix_tokens.begin());
}
prefix_tokens.insert(prefix_tokens.begin(), llama_token_prefix(ctx));
prefix_tokens.insert(prefix_tokens.begin(), llama_token_bos(ctx)); // always add BOS
prefix_tokens.insert(prefix_tokens.end(), llama_token_suffix(ctx));
prefix_tokens.insert(prefix_tokens.end(), suffix_tokens.begin(), suffix_tokens.end());
prefix_tokens.push_back(llama_token_middle(ctx));
Expand Down