You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We can steer the generation process so we only output floats. I propose to implement a Float subclass of Sequence that uses masking to restrict the generated tokens to floats. The mask is a function of the tokens that have already been generated: if we sequence generated so far contains a period we can only generate integers. We will need to add a create_proposal method to Sequence that applies the mask to the logits generated by the model.
We can also add constraints on the generated floats:
Enforce a given precision
Add an upper or lower bound on the value of the float
We will probably need SMC sampling to implement the constraints.
The text was updated successfully, but these errors were encountered:
If we're talking about regex-driven float parsing, we can also use the vocabulary pre-processing approach described in #131 and this Gist. That could possibly turn the process of determining valid next tokens (and/or the respective indices to be masked) into something closer to a simple dict look-up.
Looks like the current float masking occasionally produces strings like ".801.4" in test_hf_transformers.test_type_float: see the CI failure in a run of #172here.
The FSM-based pre-processing approach mentioned above and utilized in #166 should fix that.
We can steer the generation process so we only output floats. I propose to implement a
Float
subclass ofSequence
that uses masking to restrict the generated tokens to floats. The mask is a function of the tokens that have already been generated: if we sequence generated so far contains a period we can only generate integers. We will need to add acreate_proposal
method toSequence
that applies the mask to the logits generated by the model.We can also add constraints on the generated floats:
We will probably need SMC sampling to implement the constraints.
The text was updated successfully, but these errors were encountered: