Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
marella committed Aug 5, 2023
1 parent ad22ec5 commit c8c9dcf
Show file tree
Hide file tree
Showing 3 changed files with 48 additions and 25 deletions.
69 changes: 46 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Python bindings for the Transformer models implemented in C/C++ using [GGML](htt
- [Hugging Face Hub](#hugging-face-hub)
- [LangChain](#langchain)
- [GPU](#gpu)
- [GPTQ](#gptq)
- [Documentation](#documentation)
- [License](#license)

Expand Down Expand Up @@ -107,7 +108,7 @@ It is integrated into LangChain. See [LangChain docs](https://python.langchain.c

### GPU

> **Note:** Currently only LLaMA and Falcon models have GPU support.
> **Note:** Currently only LLaMA, MPT and Falcon models have GPU support.
To run some of the model layers on GPU, set the `gpu_layers` parameter:

Expand Down Expand Up @@ -154,30 +155,52 @@ To enable Metal support, install the `ctransformers` package using:
CT_METAL=1 pip install ctransformers --no-binary ctransformers
```

### GPTQ

> **Note:** This is an experimental feature and only LLaMA models are supported using [ExLlama](https://github.com/turboderp/exllama).
Install additional dependencies using:

```sh
pip install ctransformers[gptq]
```

Load a GPTQ model using:

```py
llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-GPTQ")
```

[Run in Google Colab](https://colab.research.google.com/drive/1SzHslJ4CiycMOgrppqecj4VYCWFnyrN0)

> If model name or path doesn't contain the word `gptq` then specify `model_type="gptq"`.
It can also be used with LangChain. Low-level APIs are not fully supported.

## Documentation

<!-- API_DOCS -->

### Config

| Parameter | Type | Description | Default |
| :------------------- | :---------- | :------------------------------------------------------- | :------ |
| `top_k` | `int` | The top-k value to use for sampling. | `40` |
| `top_p` | `float` | The top-p value to use for sampling. | `0.95` |
| `temperature` | `float` | The temperature to use for sampling. | `0.8` |
| `repetition_penalty` | `float` | The repetition penalty to use for sampling. | `1.1` |
| `last_n_tokens` | `int` | The number of last tokens to use for repetition penalty. | `64` |
| `seed` | `int` | The seed value to use for sampling tokens. | `-1` |
| `max_new_tokens` | `int` | The maximum number of new tokens to generate. | `256` |
| `stop` | `List[str]` | A list of sequences to stop generation when encountered. | `None` |
| `stream` | `bool` | Whether to stream the generated text. | `False` |
| `reset` | `bool` | Whether to reset the model state before generating text. | `True` |
| `batch_size` | `int` | The batch size to use for evaluating tokens. | `8` |
| `threads` | `int` | The number of threads to use for evaluating tokens. | `-1` |
| `context_length` | `int` | The maximum context length to use. | `-1` |
| `gpu_layers` | `int` | The number of layers to run on GPU. | `0` |

> **Note:** Currently only LLaMA, MPT, Falcon models support the `context_length` parameter and only LLaMA, Falcon models support the `gpu_layers` parameter.
| Parameter | Type | Description | Default |
| :------------------- | :---------- | :-------------------------------------------------------------- | :------ |
| `top_k` | `int` | The top-k value to use for sampling. | `40` |
| `top_p` | `float` | The top-p value to use for sampling. | `0.95` |
| `temperature` | `float` | The temperature to use for sampling. | `0.8` |
| `repetition_penalty` | `float` | The repetition penalty to use for sampling. | `1.1` |
| `last_n_tokens` | `int` | The number of last tokens to use for repetition penalty. | `64` |
| `seed` | `int` | The seed value to use for sampling tokens. | `-1` |
| `max_new_tokens` | `int` | The maximum number of new tokens to generate. | `256` |
| `stop` | `List[str]` | A list of sequences to stop generation when encountered. | `None` |
| `stream` | `bool` | Whether to stream the generated text. | `False` |
| `reset` | `bool` | Whether to reset the model state before generating text. | `True` |
| `batch_size` | `int` | The batch size to use for evaluating tokens in a single prompt. | `8` |
| `threads` | `int` | The number of threads to use for evaluating tokens. | `-1` |
| `context_length` | `int` | The maximum context length to use. | `-1` |
| `gpu_layers` | `int` | The number of layers to run on GPU. | `0` |

> **Note:** Currently only LLaMA, MPT and Falcon models support the `context_length` and `gpu_layers` parameters.
### <kbd>class</kbd> `AutoModelForCausalLM`

Expand Down Expand Up @@ -318,7 +341,7 @@ Computes embeddings for a text or list of tokens.
**Args:**

- <b>`input`</b>: The input text or list of tokens to get embeddings for.
- <b>`batch_size`</b>: The batch size to use for evaluating tokens. Default: `8`
- <b>`batch_size`</b>: The batch size to use for evaluating tokens in a single prompt. Default: `8`
- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`

**Returns:**
Expand All @@ -341,7 +364,7 @@ Evaluates a list of tokens.
**Args:**

- <b>`tokens`</b>: The list of tokens to evaluate.
- <b>`batch_size`</b>: The batch size to use for evaluating tokens. Default: `8`
- <b>`batch_size`</b>: The batch size to use for evaluating tokens in a single prompt. Default: `8`
- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`

---
Expand Down Expand Up @@ -374,7 +397,7 @@ Generates new tokens from a list of tokens.
- <b>`repetition_penalty`</b>: The repetition penalty to use for sampling. Default: `1.1`
- <b>`last_n_tokens`</b>: The number of last tokens to use for repetition penalty. Default: `64`
- <b>`seed`</b>: The seed value to use for sampling tokens. Default: `-1`
- <b>`batch_size`</b>: The batch size to use for evaluating tokens. Default: `8`
- <b>`batch_size`</b>: The batch size to use for evaluating tokens in a single prompt. Default: `8`
- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`
- <b>`reset`</b>: Whether to reset the model state before generating text. Default: `True`

Expand Down Expand Up @@ -488,7 +511,7 @@ Generates text from a prompt.
- <b>`repetition_penalty`</b>: The repetition penalty to use for sampling. Default: `1.1`
- <b>`last_n_tokens`</b>: The number of last tokens to use for repetition penalty. Default: `64`
- <b>`seed`</b>: The seed value to use for sampling tokens. Default: `-1`
- <b>`batch_size`</b>: The batch size to use for evaluating tokens. Default: `8`
- <b>`batch_size`</b>: The batch size to use for evaluating tokens in a single prompt. Default: `8`
- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`
- <b>`stop`</b>: A list of sequences to stop generation when encountered. Default: `None`
- <b>`stream`</b>: Whether to stream the generated text. Default: `False`
Expand Down
2 changes: 1 addition & 1 deletion ctransformers/llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ class Config:
stop="A list of sequences to stop generation when encountered.",
stream="Whether to stream the generated text.",
reset="Whether to reset the model state before generating text.",
batch_size="The batch size to use for evaluating tokens.",
batch_size="The batch size to use for evaluating tokens in a single prompt.",
threads="The number of threads to use for evaluating tokens.",
context_length="The maximum context length to use.",
gpu_layers="The number of layers to run on GPU.",
Expand Down
2 changes: 1 addition & 1 deletion scripts/docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
default = getattr(Config, param)
docs += f"| `{param}` | `{type_}` | {description} | `{default}` |\n"
docs += """
> **Note:** Currently only LLaMA, MPT, Falcon models support the `context_length` parameter and only LLaMA, Falcon models support the `gpu_layers` parameter.
> **Note:** Currently only LLaMA, MPT and Falcon models support the `context_length` and `gpu_layers` parameters.
"""

# Class Docs
Expand Down

0 comments on commit c8c9dcf

Please sign in to comment.