MPT : support GQA for replit-code-v1.5 #3627

cebtenzzre · 2023-10-14T22:08:02Z

The new replit-code-v1.5 model uses grouped-query attention with the MPT architecture. Tweak the conversion and loading code to support this model.

(cherry picked from commit 11bff29)

* 'master' of github.com:ggerganov/llama.cpp: fix embeddings when using CUDA (ggerganov#3657) llama : avoid fprintf in favor of LLAMA_LOG (ggerganov#3538) readme : update hot-topics & models, detail windows release in usage (ggerganov#3615) CLBlast: Fix temporary buffer size for f16 conversion (wsize) train-text-from-scratch : fix assert failure in ggml-alloc (ggerganov#3618) editorconfig : remove trailing spaces server : documentation of JSON return value of /completion endpoint (ggerganov#3632) save-load-state : fix example + add ci test (ggerganov#3655) readme : add Aquila2 links (ggerganov#3610) tokenizer : special token handling (ggerganov#3538) k-quants : fix quantization ranges (ggerganov#3646) llava : fix tokenization to not add bos between image embeddings and user prompt (ggerganov#3645) MPT : support GQA for replit-code-v1.5 (ggerganov#3627) Honor -ngl option for Cuda offloading in llava (ggerganov#3621)

MPT : support GQA for replit-code-v1.5

2c22221

ggerganov approved these changes Oct 15, 2023

View reviewed changes

ggerganov merged commit 11bff29 into ggerganov:master Oct 15, 2023
33 of 38 checks passed

cebtenzzre added a commit to nomic-ai/llama.cpp that referenced this pull request Oct 16, 2023

MPT : support GQA for replit-code-v1.5 (ggerganov#3627)

6997be9

(cherry picked from commit 11bff29)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPT : support GQA for replit-code-v1.5 #3627

MPT : support GQA for replit-code-v1.5 #3627

cebtenzzre commented Oct 14, 2023

MPT : support GQA for replit-code-v1.5 #3627

MPT : support GQA for replit-code-v1.5 #3627

Conversation

cebtenzzre commented Oct 14, 2023