`outlines.models.mlxlm` #30

lapp0 · 2024-06-06T16:37:37Z

Introduce new model: `outlines.models.mlxlm`

Details

Implements outlines.models.mlxlm
Uses model-independent outlines.processors logits processors for generate.regex and generate.text (only used for mlxlm for now, but will use the same logits processors for transformers in Update the transformers integration dottxt-ai/outlines#806)

Tests:

model_mlxlm tests are skipped if not on Apple Silicon
Introduces tests/generate/test_generate.py which tests mlxlm generation (parametrized along-side transformers and llama-cpp)

Performance

Using mlx-community/Qwen1.5-1.8B-Chat-4bit on a Mac Mini M2, all sampling is greedy:

mlx-lm, no outlines: 52.7 tokens / second
outlines.generate.text: 44.0 tokens / second
outlines.generate.regex(model, "a{200}"): 51.68 tokens / second
outlines.generate.regex(model, ".{200}"): 27.5 tokens / second

The core performance issue with outlines.generate.regex(model, ".{200}") is the need to convert a large (~150,000 integer) list into a tensor in the logits processor

        allowed_tokens = self.fsm.get_next_instruction(self._fsm_state).tokens
        allowed_tokens = torch.tensor(allowed_tokens, device=logits.device)

To mitigate, we can create a separate issue to ensure the FSM index uses tensors of token IDs, not lists. This will result in self.fsm.get_next_instruction(self._fsm_state).tokens being a tensor of token IDs.

Misc

Smoke test

>>> import outlines
>>> model = outlines.models.mlxlm("mlx-community/Qwen1.5-1.8B-Chat-4bit")
Fetching 9 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 73728.00it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
>>> generator = outlines.generate.text(model, outlines.samplers.greedy())
>>> print(generator("hello", max_tokens=100))
不断地更新中
1. 2022年12月17日，中国共产党第十九届中央委员会第六次全体会议通过了《中共中央关于党的百年奋斗重大成就和历史经验的决议》。决议指出，中国共产党百年奋斗的历史经验是（）。
A. ���持人民至上
B. ���持理论创新
C. ���持中国道路
D. ���持制度自信
答案是ABCD。

>>> from mlx_lm import load, generate
>>> model, tokenizer = load("mlx-community/Qwen1.5-1.8B-Chat-4bit")
Fetching 9 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 22550.02it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
>>> generate(model, tokenizer, prompt="hello", verbose=True)
不断地更新中
1. 2022年12月17日，中国共产党第十九届中央委员会第六次全体会议通过了《中共中央关于党的百年奋斗重大成就和历史经验的决议》。决议指出，中国共产党百年奋斗的历史经验是（）。
A. 坚持人民至上
B. 坚持理论创新
C. 坚持中国道路
D. 坚持制度自信
答案是ABCD。

Testing Without Apple

I don't own any Apple Silicon devices. Here are some instructions in case any one else wants to test with a cloud Mac Mini:

How to test outlines mlx

Rent Mac Mini on https://console.scaleway.com/asaas/servers and ssh in

install homebrew


/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
(echo; echo 'eval "$(/opt/homebrew/bin/brew shellenv)"') >> /Users/m1/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"

ensure we're using openssl in python

brew install openssl
brew install python

# BAD
# python3 -c "import ssl; print(ssl.OPENSSL_VERSION)"
# LibreSSL 2.8.3

export PATH="/usr/local/opt/openssl/bin:$PATH"
export LDFLAGS="-L/usr/local/opt/openssl/lib"
export CPPFLAGS="-I/usr/local/opt/openssl/include"

python3 -m venv myenv
source myenv/bin/activate

# GOOD
# python -c "import ssl; print(ssl.OPENSSL_VERSION)"
# OpenSSL 3.3.1 4 Jun 2024

install outlines and mlx_lm

pip install setuptools
pip install outlines
pip install mlx_lm
pip install torch

lapp0 force-pushed the fix-918-mlx branch 30 times, most recently from 9e47885 to 1b3f8fe Compare June 7, 2024 14:06

lapp0 force-pushed the fix-918-mlx branch 13 times, most recently from a569925 to 2a58caf Compare June 11, 2024 18:51

introduce outlines.models.mlxlm

e2d8a5c

lapp0 force-pushed the fix-918-mlx branch from 2a58caf to e2d8a5c Compare June 11, 2024 18:52

lapp0 merged commit eadb1c3 into main Jun 12, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`outlines.models.mlxlm` #30

`outlines.models.mlxlm` #30

lapp0 commented Jun 6, 2024 •

edited

Loading

outlines.models.mlxlm #30

outlines.models.mlxlm #30

Conversation

lapp0 commented Jun 6, 2024 • edited Loading

Introduce new model: outlines.models.mlxlm

Details

Performance

Misc

Smoke test

Testing Without Apple

`outlines.models.mlxlm` #30

`outlines.models.mlxlm` #30

lapp0 commented Jun 6, 2024 •

edited

Loading

Introduce new model: `outlines.models.mlxlm`