Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduce new model:
outlines.models.mlxlm
Details
outlines.models.mlxlm
outlines.processors
logits processors forgenerate.regex
andgenerate.text
(only used formlxlm
for now, but will use the same logits processors fortransformers
in Update thetransformers
integration dottxt-ai/outlines#806)Tests:
model_mlxlm
tests are skipped if not on Apple Silicontests/generate/test_generate.py
which tests mlxlm generation (parametrized along-side transformers and llama-cpp)Performance
Using
mlx-community/Qwen1.5-1.8B-Chat-4bit
on a Mac Mini M2, all sampling is greedy:outlines.generate.text
: 44.0 tokens / secondoutlines.generate.regex(model, "a{200}")
: 51.68 tokens / secondoutlines.generate.regex(model, ".{200}")
: 27.5 tokens / secondThe core performance issue with
outlines.generate.regex(model, ".{200}")
is the need to convert a large (~150,000 integer) list into a tensor in the logits processorTo mitigate, we can create a separate issue to ensure the FSM index uses tensors of token IDs, not lists. This will result in
self.fsm.get_next_instruction(self._fsm_state).tokens
being a tensor of token IDs.Misc
Smoke test
Testing Without Apple
I don't own any Apple Silicon devices. Here are some instructions in case any one else wants to test with a cloud Mac Mini:
How to test outlines mlx
install homebrew
ensure we're using openssl in python
install outlines and mlx_lm