Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perplexity : support using multiple sequences to allow larger batch sizes #5946

Merged
merged 3 commits into from
Mar 9, 2024

Conversation

slaren
Copy link
Collaborator

@slaren slaren commented Mar 8, 2024

Allows increasing the batch size with perplexity. The batch size must be a multiple of n_ctx.

There is a small improvement to the performance since the batching API allows extracting only the logits that are actually used, which reduces the amount of data that needs to be copied back from the GPU, and increasing the batch size can help with quantized models when using a small context, but mainly the goal is to allow using larger batch sizes with pipeline parallelism when using multiple GPUs.

examples/perplexity/perplexity.cpp Outdated Show resolved Hide resolved
examples/perplexity/perplexity.cpp Outdated Show resolved Hide resolved
@sorasoras
Copy link

This in theory should applied to imatrix as well

@slaren
Copy link
Collaborator Author

slaren commented Mar 9, 2024

Probably won't help with imatrix unless using very small context sizes. As it is, imatrix will also not benefit from pipeline parallelism because reading the activations forces a synchronization.

@slaren slaren merged commit d894f35 into master Mar 9, 2024
56 of 61 checks passed
@slaren slaren deleted the sl/ppl-batching branch March 9, 2024 18:55
hazelnutcloud pushed a commit to hazelnutcloud/llama.cpp that referenced this pull request Mar 10, 2024
…izes (ggerganov#5946)

* perplexity : support using multiple sequences to allow larger batch sizes

ggml-ci

* set cparams.n_parallel to the number of sequences

* print tested n_ctx, add assert
NeoZhangJianyu pushed a commit to NeoZhangJianyu/llama.cpp that referenced this pull request Mar 12, 2024
…izes (ggerganov#5946)

* perplexity : support using multiple sequences to allow larger batch sizes

ggml-ci

* set cparams.n_parallel to the number of sequences

* print tested n_ctx, add assert
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
…izes (ggerganov#5946)

* perplexity : support using multiple sequences to allow larger batch sizes

ggml-ci

* set cparams.n_parallel to the number of sequences

* print tested n_ctx, add assert
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
…izes (ggerganov#5946)

* perplexity : support using multiple sequences to allow larger batch sizes

ggml-ci

* set cparams.n_parallel to the number of sequences

* print tested n_ctx, add assert
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants