-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the transformers
integration
#806
Closed
rlouf opened this issue
Apr 12, 2024
· 0 comments
· Fixed by #966 · May be fixed by lapp0/outlines#31
Closed
Update the transformers
integration
#806
rlouf opened this issue
Apr 12, 2024
· 0 comments
· Fixed by #966 · May be fixed by lapp0/outlines#31
Labels
Comments
This was referenced Jun 11, 2024
rlouf
pushed a commit
that referenced
this issue
Jun 30, 2024
….py (#998) A lot of these fixes were intended for #966 however that's blocked until there's a new `transformers` release. These improvements are general to all models and will enable PRs resolving #806 and #965 # Structure of `OutlinesLogitsProcessor` The goal is to create a base class which allows a logits processors to be implemented once and used for any `outlines.models` inference library. To accomplish this we must normalize the input array. It must have a consistent type (`torch.Tensor`) and consistent dimensionality (2). We can normalize both of these simply, and without any copy operations. `mlx.core.array`, `numpy.array`, and `torch.Tensor` all support [pythons array standard `__dlpack__`](https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack__.html). This standard allows for casting between array types without copying. `torch.Tensor` is the only input type which cannot always be cast to any other type because torch tensors may live in GPU memory. Therefore, we cast all arrays to `torch.Tensor`, implement logits processors using torch methods, and convert back to the original array type in `OutlinesLogitsProcessor`. See docstring of `OutlinesLogitsProcessor.__call__()` for more details. # Detailed Changes - Rename `BaseLogitsProcessor` to `OutlinesLogitsProcessor` - Ensure `OutlinesLogitsProcessor.process_logits()` is always passed a 2D batch request with `torch.Tensor` logits and `List` input_ids. Also clean up code to be more readable in `OutlinesLogitsProcessor__call__()` - Ensure `FSMLogitsProcessor` allows unstable sequence ordering (beam search in transformers and vLLM change the order of sequences) - Update `tests/generate/test_generate.py` to cover more permutations of - regex / text - batch / single - greedy / multinomial / beam search - `stream()` / `generate()` - Ensure performance stability with difference array libraries through `benchmark_processors.py`
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In the vein of #782 and #772 we should refactor the
transformers
integration to use logits processor.We will keep the custom sampling loop, but via a
Outlines
model that wrapstransformers
models. We should be able to removetorch
andtransformers
as default dependencies as a result.The text was updated successfully, but these errors were encountered: