Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gguf.md: Add GGUF Naming Convention Section #822

Merged
merged 9 commits into from
May 17, 2024
47 changes: 47 additions & 0 deletions docs/gguf.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,53 @@ GGUF is a format based on the existing GGJT, but makes a few changes to the form

The key difference between GGJT and GGUF is the use of a key-value structure for the hyperparameters (now referred to as metadata), rather than a list of untyped values. This allows for new metadata to be added without breaking compatibility with existing models, and to annotate the model with additional information that may be useful for inference or for identifying the model.

### GGUF Naming Convention

GGUF follow a naming convention of `<Model>-<Version>-<ExpertsCount>x<Parameters>-<Quantization>.gguf`

The components are:
1. **Model**: A descriptive name for the model type or architecture.
2. **Version**: (Optional) Denotes the model version number, formatted as `v<Major>.<Minor>`
- If model is missing a version number then assume `v0.0` (Prerelease)
3. **ExpertsCount**: Indicates the number of experts found in a Mixture of Experts based model.
4. **Parameters**: Indicates the number of parameters and their scale, represented as `<count><scale-prefix>`:
- `T`: Trillion parameters.
- `B`: Billion parameters.
- `M`: Million parameters.
- `K`: Thousand parameters.
5. **Quantization**: This part specifies how the model parameters are quantized or compressed.
- Floating Representation:
- `BF16`: [16-bit bfloat16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) [Google Brain](https://en.wikipedia.org/wiki/Google_Brain) truncated form of 32-bit IEEE 754 (1 sign bit, 8 exponent bits, 7 fractional bits)
- `F32`: [32-bit IEEE 754](https://en.wikipedia.org/wiki/Single-precision_floating-point_format#IEEE_754_standard:_binary32) floats per weight (1 sign bit, 8 exponent bits, 23 fractional bits)
- `F16`: [16-bit IEEE 754](https://en.wikipedia.org/wiki/Half-precision_floating-point_format) floats per weight (1 sign bit, 5 exponent bits, 10 fractional bits)
- Quantization (Compression) formats:
- `Q<X>`: X bits per weight, where `X` could be `4` (for 4 bits) or `8` (for 8 bits) etc...
- Variants provide further details on how the quantized weights are interpreted:
- `_K`: k-quant models, which further have specifiers like `_S`, `_M`, and `_L` for small, medium, and large, respectively, if they are not specified, it defaults to medium.
- `_<num>`: Different approaches, with even numbers indicating the model weights as a scaling factor multiplied by the quantized weight and odd numbers indicating the model weights as a combination of an offset factor plus a scaling factor multiplied by the quantized weight.
- Even Number (0 or 2): `<model weights> = <scaling factor> * <quantised weight>`
- Odd Number (1 or 3): `<model weights> = <offset factor> + <scaling factor> * <quantised weight>`
mofosyne marked this conversation as resolved.
Show resolved Hide resolved

#### Parsing Above Naming Convention

To correctly parse a well formed naming convention based gguf filename, it is recommended to read from right to left using `-` as the delimiter. This strategy allow for the most flexibility in model name to include dashes if they so choose, while at the same time allowing for version string to be optional. This approach also gives some future proofing to extend the format if needed in the future.

For example:

* `mixtral-v0.1-8x7B-Q2_K.gguf`:
- Model Name: Mixtral
- Version Number: v0.1
- Expert Count: 8
- Parameter Count: 7B
- Quantization: Q2_K

* `Hermes-2-Pro-Llama-3-8B-F16.gguf`:
- Model Name: Hermes 2 Pro Llama 3
- Version Number: v0.0 (`<Version>-` missing)
- Expert Count: 0 (`<ExpertsCount>x` missing)
- Parameter Count: 8B
- Quantization: F16

### File Structure

![image](https://github.com/ggerganov/ggml/assets/1991296/c3623641-3a1d-408e-bfaf-1b7c4e16aa63)
Expand Down