-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Length constraint causes infinite looping of generation #690
Comments
This is not the first time this shows up, but the first time we have a report detailed enough and that we can work with. Thank you for putting the time, I will take a look! |
Thanks for your great debug details. This is an issue that we've run into a lot. It's not that the state machine has an infinite loop, it's that the language models suffer from a "Repetition Problem". I ran your reproduction script with no error after changing
to
@remic33 since this keeps coming up, we should consider setting the |
Fixes #839 #908 #690 #450 ## Problem A major problem, especially with smaller language models, is the repetition problem. For example, let's say a model is generating json and must provide 12 space tokens for indentation in json output. Often a language model will assign a high probability to a 13th space token, and do the same for a 14th space, and then enter an infinite space generation loop. This is a problem with NLG that has been known for half a decade, but only has mitigations (mirostat, repetition penalty, using hundreds of billions of weights, etc), no absolute solutions (except for **structured generation**) ## Solution For structured json generation, we set a sane default whitespace pattern of `r"[ ]?"`. This removes all newlines and indentation. It disallows any syntactic whitespace beyond a single space separator. Users can still set the argument `whitespace_pattern=` if they want different behavior
Should be solved by #916. Feel free to re-open if this problem persists. |
Describe the issue as clearly as possible:
I saw some generation irregularities similar to #450, so benchmarked it a bit further. It seems like there are certain classes of constraints that cause an infinite loop of the state machine, where the model keeps generating "valid" tokens but they don't contribute to the overall string content.
This script uses a pydantic schema that compiles to a valid regex and state machine, which has length constraints in place:
\{[\n ]*"poem"[\n ]*:[\n ]*"(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.){150,200}"[\n ]*\}
. Each line of the sequence_generator is logged separately in the format:Within the loop, runs 0 and 1 complete successfully. On run 2 it stalls with the same maximum length output (
Once upon a cosmic dance,\nIn the vast expanse of numeric romance,\nWhere digits intertwined in grand design,\nAnd infinity was but a linear line,\n\nNumbers whispered sweet secrets to the stars,\nYielded h
).Even as this generated text stays the same, the value.token_ids keep growing. The tokens evaluate to 28705 (
▁
) and 13 (<0x0A>
), over and over again:I've included a minimum reproducible case below.
Steps/code to reproduce the bug:
Expected result:
Error message:
Outlines/Python version information:
Version information:
Running on a H100 GPU.
Context for the issue:
No response
The text was updated successfully, but these errors were encountered: