Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

outlines.models.mlxlm #30

Merged
merged 1 commit into from
Jun 12, 2024
Merged

outlines.models.mlxlm #30

merged 1 commit into from
Jun 12, 2024

Conversation

lapp0
Copy link
Owner

@lapp0 lapp0 commented Jun 6, 2024

Introduce new model: outlines.models.mlxlm

Details

Tests:

  • model_mlxlm tests are skipped if not on Apple Silicon
  • Introduces tests/generate/test_generate.py which tests mlxlm generation (parametrized along-side transformers and llama-cpp)

Performance

Using mlx-community/Qwen1.5-1.8B-Chat-4bit on a Mac Mini M2, all sampling is greedy:

  • mlx-lm, no outlines: 52.7 tokens / second
  • outlines.generate.text: 44.0 tokens / second
  • outlines.generate.regex(model, "a{200}"): 51.68 tokens / second
  • outlines.generate.regex(model, ".{200}"): 27.5 tokens / second

The core performance issue with outlines.generate.regex(model, ".{200}") is the need to convert a large (~150,000 integer) list into a tensor in the logits processor

        allowed_tokens = self.fsm.get_next_instruction(self._fsm_state).tokens
        allowed_tokens = torch.tensor(allowed_tokens, device=logits.device)

To mitigate, we can create a separate issue to ensure the FSM index uses tensors of token IDs, not lists. This will result in self.fsm.get_next_instruction(self._fsm_state).tokens being a tensor of token IDs.

Misc

Smoke test

>>> import outlines
>>> model = outlines.models.mlxlm("mlx-community/Qwen1.5-1.8B-Chat-4bit")
Fetching 9 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 73728.00it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
>>> generator = outlines.generate.text(model, outlines.samplers.greedy())
>>> print(generator("hello", max_tokens=100))
不断地更新中
1. 2022年12月17日,中国共产党第十九届中央委员会第六次全体会议通过了《中共中央关于党的百年奋斗重大成就和历史经验的决议》。决议指出,中国共产党百年奋斗的历史经验是()。
A. ���持人民至上
B. ���持理论创新
C. ���持中国道路
D. ���持制度自信
答案是ABCD。
>>> from mlx_lm import load, generate
>>> model, tokenizer = load("mlx-community/Qwen1.5-1.8B-Chat-4bit")
Fetching 9 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 22550.02it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
>>> generate(model, tokenizer, prompt="hello", verbose=True)
不断地更新中
1. 2022年12月17日,中国共产党第十九届中央委员会第六次全体会议通过了《中共中央关于党的百年奋斗重大成就和历史经验的决议》。决议指出,中国共产党百年奋斗的历史经验是()。
A. 坚持人民至上
B. 坚持理论创新
C. 坚持中国道路
D. 坚持制度自信
答案是ABCD。

Testing Without Apple

I don't own any Apple Silicon devices. Here are some instructions in case any one else wants to test with a cloud Mac Mini:

How to test outlines mlx

install homebrew


/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
(echo; echo 'eval "$(/opt/homebrew/bin/brew shellenv)"') >> /Users/m1/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"

ensure we're using openssl in python

brew install openssl
brew install python

# BAD
# python3 -c "import ssl; print(ssl.OPENSSL_VERSION)"
# LibreSSL 2.8.3

export PATH="/usr/local/opt/openssl/bin:$PATH"
export LDFLAGS="-L/usr/local/opt/openssl/lib"
export CPPFLAGS="-I/usr/local/opt/openssl/include"

python3 -m venv myenv
source myenv/bin/activate

# GOOD
# python -c "import ssl; print(ssl.OPENSSL_VERSION)"
# OpenSSL 3.3.1 4 Jun 2024

install outlines and mlx_lm

pip install setuptools
pip install outlines
pip install mlx_lm
pip install torch

@lapp0 lapp0 force-pushed the fix-918-mlx branch 30 times, most recently from 9e47885 to 1b3f8fe Compare June 7, 2024 14:06
@lapp0 lapp0 force-pushed the fix-918-mlx branch 13 times, most recently from a569925 to 2a58caf Compare June 11, 2024 18:51
@lapp0 lapp0 merged commit eadb1c3 into main Jun 12, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant