Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage of Open MP Barrier for thread sync to avoid atomic operations #8002

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

abhishek-rn
Copy link

Current ggml_compute thread requires exclusive sync operations for all the threads to sync after every stage of operation.
With recent introduction of openMP library into this repo, we can leverage the barrier function of openMP and also effectively remove atomic operations which introduces delay.

Command line:
./bin/llama-cli-m ../models/7B/ggml-model-q8_0.gguf -n 100 -s 01 -b 7 -t <num_of_threads> -p "AI is going to"
Running on Graviton 3 AWS machine with 64 cores

Threads Master
(tokens/sec)
PR Changes
(tokens/sec)
4 6.2 6.19
8 11.75 11.7
16 20.94 20.95
24 23.97 25.03
32 25.7 27.46
48 26.26 29.71

PR_Results

We see improvements in tokens/sec by as much as ~3 token/sec over the existing implementation.
This is a small change in line with openMP library usage.
Request review and feedback.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jun 19, 2024
@slaren
Copy link
Collaborator

slaren commented Jun 19, 2024

It is already being done in #7993

@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants