Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llamacpp has Attributeerror with token_eos for Hermest-Pro-7B when using structured grammar generation #771

Closed
maxtheman opened this issue Mar 27, 2024 · 7 comments

Comments

@maxtheman
Copy link

Describe the issue as clearly as possible:

When I try to use Hermes-Pro-7b with llama-cpp-python, I cannot use cfg to generate structured grammar

This is ONLY an issue with structured grammar generation via cfg. generate.json doesn't

Steps/code to reproduce the bug:

from outlines.grammars import json as json_lark
from outlines.models import llamacpp
MODEL_URL = "./Hermes-2-Pro-Mistral-7B.Q8_0.gguf"
model = llamacpp(MODEL_URL)
# fails on line below
generator = cfg(model, json_lark)
sequence = generator("Alice had 4 apples and Bob ate 2. Write an expression for Alice's apples:")
print(sequence)

Expected result:

I'd expect the Attributeerror to not occur, and instead the sequence should print to Stdout.

Error message:

---> [51](.venv/lib/python3.11/site-packages/outlines/generate/cfg.py:51) logits_processor = CFGLogitsProcessor(cfg_str, model.tokenizer)
     [52](.venv/lib/python3.11/site-packages/outlines/generate/cfg.py:52) generator = LlamaSequenceGenerator(logits_processor, model)
     [54](.venv/lib/python3.11/site-packages/outlines/generate/cfg.py:54) return generator
...
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[23], line 22
     21 # import pdb; pdb.set_trace()
---> 22 generator = cfg(model, json_lark)
     23 # generator = generate_json(model, Character)
     24 # generator = extract_json(model)
     25 sequence = generator("Alice had 4 apples and Bob ate 2. Write an expression for Alice's apples:")
---> [46](.venv/lib/python3.11/site-packages/outlines/integrations/llamacpp.py:46)     self.eos_token_id = model.token_eos()
     [47](venv/lib/python3.11/site-packages/outlines/integrations/llamacpp.py:47)     self.pad_token_id = self.eos_token_id
     [48](venv/lib/python3.11/site-packages/outlines/integrations/llamacpp.py:48)     self.special_tokens: Set[int] = set()

AttributeError: 'LlamaCppTokenizer' object has no attribute 'token_eos'


### Outlines/Python version information:

Version information
outlines version 0.0.37
Python 3.11.3
Managed by rye

### Context for the issue:

Hermes Pro 2 is Nous Research's newest and best model, and I suspect it's quite good at JSON schema creation because of its fine-tuning on tool calling.

I'd like to experiment with that to see if it's the case.
@maxtheman maxtheman added the bug label Mar 27, 2024
@sharanry
Copy link

sharanry commented Mar 29, 2024

Facing the same issue with Mistral Instruct:

from llama_cpp import Llama
from outlines import models, generate

arithmetic_grammar = """
    ?start: expression

    ?expression: term (("+" | "-") term)*

    ?term: factor (("*" | "/") factor)*

    ?factor: NUMBER
           | "-" factor
           | "(" expression ")"

    %import common.NUMBER
"""


llm = Llama.from_pretrained(
    repo_id="TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
    filename="mistral-7b-instruct-v0.1.Q4_K_S.gguf",
    verbose=True
)
model = models.LlamaCpp(llm)

generator = generate.cfg(model, arithmetic_grammar)

sequence = generator(
  "Alice had 4 apples and Bob ate 2. "
  + "Write an expression for Alice's apples:"
)

@rlouf
Copy link
Member

rlouf commented Mar 30, 2024

I am so sorry this is happening! I will investigate early next week.

@maxtheman
Copy link
Author

@rlouf I was just trying to use outlines again, this time with phi-3, still no luck, but I did make some progress.

It seems like the issue is that LlamaTokenizer in llama-cpp-python is fundamentally a different object than what you're expecting in SequenceGenerator, which appears to be.

You can patch on the attributes needed with something like the following:

class HackedLlamaTokenizer(LlamaTokenizer):
    def __init__(self, llama: Llama, eos_token_id: int):
        self._model = llama._model
        self.eos_token_id = eos_token_id
        
if __name__ == "__main__":
    model = Llama(model_path="./phi-3-4k/Phi-3-mini-4k-instruct-q4.gguf")
    model_tokenizer = HackedLlamaTokenizer(model, model._token_eos)
    test_str = "tresasdfasdf"
    print(model_tokenizer.encode(test_str))
    model.device = 'mps'
    # Patch the tokenizer method with the instantiated tokenizer object
    model.tokenizer = model_tokenizer
...

This doesn't work because tokenizer.encode is expecting a str, not a list. So I patched that:

    def encode(
        self, text: str, add_bos: bool = True, special: bool = True
    ) -> ([List[int], List[int]]):
        print(text)
        return self.tokenize(
            text[0].encode("utf-8", errors="ignore"), add_bos=add_bos, special=special
        )

But then I run into an error at line 176 of api.py in SequenceGenerator:
prompt_token_ids, attention_masks = self.tokenizer.encode(prompts)
Because the LLamaTokenizer doesn't return attention masks.

At this point I'm out of my depth. I don't quite understand what attention masks are — why would you want to ignore tokens? Why wouldn't this tokenizer return it if you're expecting it?

But, I decided to try to just say 'let's not ignore any of them' and patched it in:

   def tokenize(
        self, text: bytes, add_bos: bool = True, special: bool = True
    ) -> ([List[int], List[int]]):
        tokens = self._model.tokenize(text, add_bos=add_bos, special=special)
        ones_array = np.ones_like(tokens)
        return tokens, ones_array

Which resulted in this error:

  File ".venv/lib/python3.12/site-packages/outlines/generate/api.py", line 177, in __call__
    prompt_token_ids = prompt_token_ids.to(self.device)
                       ^^^^^^^^^^^^^^^^^^^

I'm pretty sure there's something broken with the llama-cpp integration. Even this, from the examples

# curl -L -o mistral-7b-instruct-v0.2.Q5_K_M.gguf https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q5_K_M.gguf
    model = outlines.models.llamacpp("./mistral-7b-instruct-v0.2.Q5_K_M.gguf")

Throws an error since you need to create LLama manually, which if you fix it leads again to the error:

AttributeError: 'function' object has no attribute 'eos_token_id'

So, something has probably changed in llama-cpp-python since the integration was created which broke this.

Thanks for taking a look!

lapp0 pushed a commit to lapp0/outlines that referenced this issue May 16, 2024
@lapp0
Copy link
Collaborator

lapp0 commented May 16, 2024

@maxtheman I couldn't reproduce the error in your script. Is it possible aacc633 fixed it? Please let me know if the issue still occurs on the latest version of outlines.

@rlouf
Copy link
Member

rlouf commented May 19, 2024

Yes, it would have fixed it. Please update outlines to use the latest version and give it another try.

@maxtheman
Copy link
Author

maxtheman commented May 19, 2024 via email

@rlouf
Copy link
Member

rlouf commented May 23, 2024

Appears to be solved. Please reopen if that’s not the case.

@rlouf rlouf closed this as completed May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants