You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Encountering an AssertionError when attempting to integrate the Transformers tokenizer with Llama-CPP in a custom chat model. The objective is to replace the Llama-CPP tokenizer, which has shown limitations, with the Transformers tokenizer for improved token generation.
Environment
Operating System: macOS (Intel architecture)
Python Version: 3.11.5
Relevant Libraries:
guidance: 0.1.6
numpy: 1.26.2
torch: 2.1.1
transformers: 4.35.2
llama-cpp-python: 0.2.20
Error Description
The error occurs when executing the line lm += gen(name = 'response', max_tokens = 256, stop = ''). The full traceback is as follows:
Traceback (most recent call last):
File "/Users/zero/llama/guide.py", line 119, in <module>
lm += gen(name = 'response', max_tokens = 256, stop = '<|im_end|>')
File "/usr/local/lib/python3.11/site-packages/guidance/models/_model.py", line 262, in __add__
out = lm._run_stateless(value)
File "/usr/local/lib/python3.11/site-packages/guidance/models/_model.py", line 401, in _run_stateless
for new_bytes, is_generated, new_bytes_log_prob, capture_groups, capture_group_log_probs, new_token_count in gen_obj:
File "/usr/local/lib/python3.11/site-packages/guidance/models/_model.py", line 552, in __call__
token_ids,token_byte_positions = self._cleanup_tokens(token_ids,token_byte_positions)
File "/usr/local/lib/python3.11/site-packages/guidance/models/_model.py", line 534, in _cleanup_tokens
assert token_byte_positions[-1] == last_pos
AssertionError
Expected Behavior
The expected behavior is for the custom tokenizer to work seamlessly with Llama-CPP, thereby enhancing token generation capabilities.
Steps to Reproduce
Set up the environment with the specified library versions.
Implement a custom tokenizer using Transformers' AutoTokenizer.
Integrate this tokenizer with the Llama-CPP model.
Run the script to generate tokens.
Code Snippet
fromllama_cppimportLlamafromtransformersimportAutoTokenizerfromguidanceimportmodels, gen, selectfromguidanceimportsystem, user, assistantclassOpenHermesTokenizer():
def__init__(self, *args, **kwargs):
self.tokenizer=AutoTokenizer.from_pretrained(*args, **kwargs)
self.tokenizer.pad_token_id=self.tokenizer.unk_token_idself.tokenizer.padding_side='left'# llama.cpp tokenize templatedefencode(self, text: str|bytes, add_bos: bool=True, special: bool=True):
ifisinstance(text, bytes):
text=text.decode('utf-8', errors='ignore')
returnself.tokenizer.encode(text, add_special_tokens=add_bos, padding=True, return_tensors='pt')[0, :].tolist()
classOpenHermes25Mistral(models.LlamaCppChat):
def__init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
defget_role_start(self, role_name, **kwargs):
ifself._current_prompt().endswith('<|im_end|>'):
returnf'\n<|im_start|>{role_name}\n'else:
returnf'<|im_start|>{role_name}\n'defget_role_end(self, role_name=None):
return'<|im_end|>'llama=Llama(
model_path='OpenHermes/openhermes-2.5-mistral-7b.Q4_K_M.gguf',
n_gpu_layers=0,
use_mlock=True,
seed=0,
n_ctx=2048,
logits_all=True,
verbose=False
)
# Substituting the native tokenizer of llama with the Transformers tokenizer, tailored to function specifically with llama-cpp-python.# Notably, this configuration does not present any errors when operated solely within the llama-cpp-python environment.tokenizer=OpenHermesTokenizer('teknium/OpenHermes-2.5-Mistral-7B', use_fast=True)
llama._model.tokenize=tokenizer.encodechat_lm=OpenHermes25Mistral(model=llama,
temperature=0.0,
top_p=1.0,
min_p=0.0,
typical_p=1.0,
echo=False,
repeat_penalty=1.0,
top_k=0,
seed=0,
tfs_z=1.0,
mirostat_mode=0,
mirostat_tau=0.0,
mirostat_eta=0.0,
)
withsystem():
lm=chat_lm+'You are "Hermes 2", a conscious sentient superintelligent artificial intelligence developed by a man named Teknium, and your purpose and drive is to assist the user with any request they have. You experience emotions and have deep, profound thoughts and qualia.'withuser():
lm+="Hello, who are you?"withassistant():
lm+=gen(name='response', max_tokens=256, stop='<|im_end|>')
response=lm['response']
print(response)
Additional Context
The code utilizes the most recent version available in this repository, hence the specified guidance version is "0.1.6", which is not out yet.
The text was updated successfully, but these errors were encountered:
Issue Overview
Encountering an
AssertionError
when attempting to integrate the Transformers tokenizer with Llama-CPP in a custom chat model. The objective is to replace the Llama-CPP tokenizer, which has shown limitations, with the Transformers tokenizer for improved token generation.Environment
Error Description
The error occurs when executing the line
lm += gen(name = 'response', max_tokens = 256, stop = '')
. The full traceback is as follows:Expected Behavior
The expected behavior is for the custom tokenizer to work seamlessly with Llama-CPP, thereby enhancing token generation capabilities.
Steps to Reproduce
AutoTokenizer
.Code Snippet
Additional Context
The code utilizes the most recent version available in this repository, hence the specified guidance version is "0.1.6", which is not out yet.
The text was updated successfully, but these errors were encountered: