opensearch-project · hdhalter · Apr 30, 2024 · Apr 28, 2024 · Apr 28, 2024 · Apr 28, 2024
@@ -10,11 +10,12 @@
 Introduced 2.11
 {: .label .label-purple }
 
-Use the `neural_sparse` query for vector field search in [neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). 
+Use the `neural_sparse` query for vector field search in [neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). The query can use either raw text or sparse vector tokens.
 
 ## Request fields
 
 Include the following request fields in the `neural_sparse` query:
+### Example: Query by raw text
 
 ```json
 "neural_sparse": {
@@ -24,16 +25,26 @@
   }
 }
 ```
+### Example: Query by sparse vector
+```json
+"neural_sparse": {
+  "<vector_field>": {
+    "query_tokens": "<query_tokens>"
+  }
+}
+```
 
 The top-level `vector_field` specifies the vector field against which to run a search query. The following table lists the other `neural_sparse` query fields.
 
 Field | Data type | Required/Optional | Description
 :--- | :--- | :--- 
-`query_text` | String | Required | The query text from which to generate vector embeddings. 
-`model_id` | String | Required | The ID of the sparse encoding model or tokenizer model that will be used to generate vector embeddings from the query text. The model must be deployed in OpenSearch before it can be used in sparse neural search. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/) and [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
+`query_text` | String | Optional | The query text from which to generate sparse vector embeddings. 
+`model_id` | String | Optional | The ID of the sparse encoding model or tokenizer model that will be used to generate vector embeddings from the query text. The model must be deployed in OpenSearch before it can be used in sparse neural search. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/) and [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). For information on setting a default model ID in a neural sparse query, see [`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/).
+`query_tokens` | Map<String, Float> | Optional | The query tokens, sometimes referred to as sparse vector embeddings. Similarly to dense semantic retrieval, you can use raw sparse vectors generated by neural models or tokenizers to perform a semantic search query. Use either the `query_text` option for raw field vectors or the `query_tokens` option for sparse vectors. Must be provided in order for the `neural_sparse` query to operate.
 `max_token_score` | Float | Optional | (Deprecated) The theoretical upper bound of the score for all tokens in the vocabulary (required for performance optimization). For OpenSearch-provided [pretrained sparse embedding models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sparse-encoding-models), we recommend setting `max_token_score` to 2 for `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` and to 3.5 for `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1`. This field has been deprecated as of OpenSearch 2.12.
 
 #### Example request
+**Query by raw text**
 
 ```json
 GET my-nlp-index/_search
@@ -48,4 +59,25 @@
   }
 }
 ```
+**Query by sparse vector**
+
+```json
+GET my-nlp-index/_search
+{
+  "query": {
+    "neural_sparse": {
+      "passage_embedding": {
+        "query_tokens": {
+          "hi" : 4.338913,
+          "planets" : 2.7755864,
+          "planet" : 5.0969057,
+          "mars" : 1.7405145,
+          "earth" : 2.6087382,
+          "hello" : 3.3210192
+        }
+      }
+    }
+  }
+}
+```
 {% include copy-curl.html %}
@@ -16,7 +16,7 @@ Introduced 2.11
 When selecting a model, choose one of the following options:
 
 - Use a sparse encoding model at both ingestion time and search time (high performance, relatively high latency).
-- Use a sparse encoding model at ingestion time and a tokenizer model at search time (low performance, relatively low latency).
+- Use a sparse encoding model at ingestion time and a tokenizer at search time for relatively low performance and low latency. The tokenism doesn't conduct model inference, so you can deploy and invoke a tokenizer using the ML Commons Model API for a more consistent experience.
 
 **PREREQUISITE**<br>
 Before using neural sparse search, make sure to set up a [pretrained sparse embedding model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sparse-encoding-models) or your own sparse embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
@@ -29,7 +29,7 @@ To use neural sparse search, follow these steps:
 1. [Create an ingest pipeline](#step-1-create-an-ingest-pipeline).
 1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion).
 1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index).
-1. [Search the index using neural search](#step-4-search-the-index-using-neural-search).
+1. [Search the index using neural search](#step-4-search-the-index-using-neural-sparse-search).
 
 ## Step 1: Create an ingest pipeline
 
@@ -144,11 +144,11 @@ PUT /my-nlp-index/_doc/2
 
 Before the document is ingested into the index, the ingest pipeline runs the `sparse_encoding` processor on the document, generating vector embeddings for the `passage_text` field. The indexed document includes the `passage_text` field, which contains the original text, and the `passage_embedding` field, which contains the vector embeddings. 
 
-## Step 4: Search the index using neural search
+## Step 4: Search the index using neural sparse search
 
 To perform a neural sparse search on your index, use the `neural_sparse` query clause in [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) queries. 
 
-The following example request uses a `neural_sparse` query to search for relevant documents:
+The following example request uses a `neural_sparse` query to search for relevant documents using a raw text query:
 
 ```json
 GET my-nlp-index/_search
@@ -241,6 +241,27 @@ The response contains the matching documents:
 }
 ```
 
+You can also use the `neural_sparse` query with sparse vector embeddings:
+```json
+GET my-nlp-index/_search
+{
+  "query": {
+    "neural_sparse": {
+      "passage_embedding": {
+        "query_tokens": {
+          "hi" : 4.338913,
+          "planets" : 2.7755864,
+          "planet" : 5.0969057,
+          "mars" : 1.7405145,
+          "earth" : 2.6087382,
+          "hello" : 3.3210192
+        }
+      }
+    }
+  }
+}
+```
+
 ## Setting a default model on an index or field
 
 A [`neural_sparse`]({{site.url}}{{site.baseurl}}/query-dsl/specialized/neural-sparse/) query requires a model ID for generating sparse embeddings. To eliminate passing the model ID with each neural_sparse query request, you can set a default model on index-level or field-level.