Skip to content

Commit

Permalink
Documentation for Binary Quantization Support with KNN Vector Search
Browse files Browse the repository at this point in the history
Signed-off-by: VIKASH TIWARI <viktari@amazon.com>
  • Loading branch information
Vikasht34 committed Sep 16, 2024
1 parent 76486a4 commit 4ba0fd1
Showing 1 changed file with 146 additions and 1 deletion.
147 changes: 146 additions & 1 deletion _search-plugins/knn/knn-vector-quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ has_math: true

By default, the k-NN plugin supports the indexing and querying of vectors of type `float`, where each dimension of the vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors can be expensive because OpenSearch needs to construct, load, save, and search graphs (for native `nmslib` and `faiss` engines). To reduce the memory footprint, you can use vector quantization.

OpenSearch supports many varieties of quantization. In general, the level of quantization will provide a trade-off between the accuracy of the nearest neighbor search and the size of the memory footprint consumed by the vector search. The supported types include byte vectors, 16-bit scalar quantization, and product quantization (PQ).
OpenSearch supports many varieties of quantization. In general, the level of quantization will provide a trade-off between the accuracy of the nearest neighbor search and the size of the memory footprint consumed by the vector search. The supported types include byte vectors, 16-bit scalar quantization, product quantization (PQ) and Binary Quantization(BQ).

## Lucene byte vector

Expand Down Expand Up @@ -310,3 +310,148 @@ For example, assume that you have 1 million vectors with a dimension of 256, `iv
```r
1.1*((8 / 8 * 64 + 24) * 1000000 + 100 * (2^8 * 4 * 256 + 4 * 512 * 256)) ~= 0.171 GB
```
## Binary Quantization

Starting with the version 2.17, OpenSearch supports Binary Quantization (BQ) with binary vector support from the Faiss engine. Binary quantization compresses vectors into a binary format (0s and 1s), making it highly efficient in terms of memory usage. You can choose to represent each vector dimension using 1, 2, or 4 bits, depending on the precision you want. One of the advantages of using Binary Quantization (BQ) is that the training process is handled automatically during indexing. This means that no separate training step is required, unlike other quantization techniques such as Product Quantization (PQ).

Check warning on line 315 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.UnitsSpacing] Put a space between the number and the units in '0s '. Raw Output: {"message": "[OpenSearch.UnitsSpacing] Put a space between the number and the units in '0s '.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 315, "column": 189}}}, "severity": "WARNING"}

### Using Binary quantization

Check failure on line 317 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.HeadingCapitalization] 'Using Binary quantization' is a heading and should be in sentence case. Raw Output: {"message": "[OpenSearch.HeadingCapitalization] 'Using Binary quantization' is a heading and should be in sentence case.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 317, "column": 5}}}, "severity": "ERROR"}
To use Binary Quantization in your k-NN vector index, you can configure it with minimal effort. Below is an example of how you can define a k-NN vector field that utilizes Binary Quantization with the Faiss engine. This configuration provides an out-of-the-box setup with 1-bit binary quantization, using the default values for ef_search and ef_construction set to 100:

Check warning on line 318 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.DirectionAboveBelow] Use 'following or later' instead of 'Below' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions. Raw Output: {"message": "[OpenSearch.DirectionAboveBelow] Use 'following or later' instead of 'Below' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 318, "column": 97}}}, "severity": "WARNING"}
```json
PUT my-vector-index
{
"mappings": {
"properties": {
"my_vector_field": {
"type": "knn_vector",
"dimension": 8,
"space_type": "l2",
"data_type": "float",
"mode": "on-disk"
}
}
}
}

```
To further optimize the configuration, you can specify additional parameters such as the compression level and fine-tune the search parameters. For example, the ef_construction value can be overridden, and you can also define the compression level. The compression level corresponds to the number of bits used for quantization:

- **32x compression** for 1-bit quantization
- **16x compression** for 2-bit quantization
- **8x compression** for 4-bit quantization

This allows for greater control over memory usage and recall performance, providing flexibility to balance between precision and storage efficiency.
```json
PUT my-vector-index
{
"mappings": {
"properties": {
"my_vector_field": {
"type": "knn_vector",
"dimension": 8,
"space_type": "l2",
"data_type": "float",
"mode": "on-disk",
"compression_level": "16x", // Can also be 8x or 32x
"method": {
"params": {
"ef_construction": 16
}
}
}
}
}
}
```
To futher fine tune the configuration , below is an example of how to define it with specific parameters of ef_construction , encoder and number of bits.

Check failure on line 365 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: futher. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: futher. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 365, "column": 4}}}, "severity": "ERROR"}

Check warning on line 365 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.DirectionAboveBelow] Use 'following or later' instead of 'below' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions. Raw Output: {"message": "[OpenSearch.DirectionAboveBelow] Use 'following or later' instead of 'below' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 365, "column": 41}}}, "severity": "WARNING"}

```json
PUT my-vector-index
{
"mappings": {
"properties": {
"my_vector_field": {
"type": "knn_vector",
"dimension": 8,
"method": {
"name": "hnsw",
"engine": "faiss",
"space_type": "l2",
"params": {
"m": 16,
"ef_construction": 512,
"encoder": {
"name": "binary",
"parameters": {
"bits": 1 // Can be 1, 2, or 4
}
}
}
}
}
}
}
}
```
### Basic Search with k-NN and Binary Quantization

Check failure on line 395 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.HeadingCapitalization] 'Basic Search with k-NN and Binary Quantization' is a heading and should be in sentence case. Raw Output: {"message": "[OpenSearch.HeadingCapitalization] 'Basic Search with k-NN and Binary Quantization' is a heading and should be in sentence case.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 395, "column": 5}}}, "severity": "ERROR"}
You can perform a basic k-NN search on your index using a vector and specifying the number of nearest neighbors (k) to return:
```json
GET my-vector-index/_search
{
"size": 2,
"query": {
"knn": {
"my_vector_field": {
"vector": [1.5, 5.5, 1.5, 5.5, 1.5, 5.5, 1.5, 5.5],
"k": 10
}
}
}
}
```
You can also fine-tune the search process by adjusting the ef_search and oversample_factor parameters. Here's an example of how to do this:
- **`oversample_factor`**: This parameter controls the factor by which the search process oversamples the candidate vectors before ranking them. A higher oversample factor means more candidates will be considered before ranking, improving accuracy but also increasing search time. Essentially, it helps trade-off between accuracy and efficiency. For example, setting the `oversample_factor` to `2.0` will double the number of candidates considered during the ranking phase, which may help achieve better results.
```json
GET my-vector-index/_search
{
"size": 2,
"query": {
"knn": {
"my_vector_field": {
"vector": [1.5, 5.5, 1.5, 5.5, 1.5, 5.5, 1.5, 5.5],
"k": 10,
"method_params": {
"ef_search": 10
},
"rescore": {
"oversample_factor": 10.0
}
}
}
}
}

```

#### HNSW memory estimation

The memory required for the Hierarchical Navigable Small World (HNSW) graph can be estimated as `1.1 * (dimension + 8 * M)` bytes/vector, where `M` is the maximum number of bidirectional links created for each element during the construction of the graph.

As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows:

##### 1-bit Quantization (32x Compression)

Check failure on line 441 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.HeadingCapitalization] '1-bit Quantization (32x Compression)' is a heading and should be in sentence case. Raw Output: {"message": "[OpenSearch.HeadingCapitalization] '1-bit Quantization (32x Compression)' is a heading and should be in sentence case.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 441, "column": 7}}}, "severity": "ERROR"}
In this case, each dimension is represented using 1 bit, equivalent to a 32x compression factor.

```r
Memory = 1.1 * ((256 * 1 / 8) + 8 * 16) * 1,000,000
~= 0.176 GB
```
##### 2-bit Quantization (16x Compression)

Check failure on line 448 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.HeadingCapitalization] '2-bit Quantization (16x Compression)' is a heading and should be in sentence case. Raw Output: {"message": "[OpenSearch.HeadingCapitalization] '2-bit Quantization (16x Compression)' is a heading and should be in sentence case.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 448, "column": 7}}}, "severity": "ERROR"}
```r
Memory = 1.1 * ((256 * 2 / 8) + 8 * 16) * 1,000,000
~= 0.211 GB
```
##### 4-bit Quantization (8x Compression)

Check failure on line 453 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.HeadingCapitalization] '4-bit Quantization (8x Compression)' is a heading and should be in sentence case. Raw Output: {"message": "[OpenSearch.HeadingCapitalization] '4-bit Quantization (8x Compression)' is a heading and should be in sentence case.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 453, "column": 7}}}, "severity": "ERROR"}
```r
Memory = 1.1 * ((256 * 4 / 8) + 8 * 16) * 1,000,000
~= 0.282 GB
```

0 comments on commit 4ba0fd1

Please sign in to comment.