From aa1dd8fbc37329ba560c64b37d09607523a8720c Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Tue, 24 Sep 2024 15:48:56 +0000 Subject: [PATCH] [DOC] Normalizers (#8192) * updating index page with normalisation Signed-off-by: leanne.laceybyrne@eliatra.com * Update _analyzers/index.md Signed-off-by: Melissa Vagi * Update _analyzers/index.md Signed-off-by: Melissa Vagi * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: leanneeliatra <131779422+leanneeliatra@users.noreply.github.com> --------- Signed-off-by: leanne.laceybyrne@eliatra.com Signed-off-by: Melissa Vagi Signed-off-by: leanneeliatra <131779422+leanneeliatra@users.noreply.github.com> Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower (cherry picked from commit 53b650f481834e30eafa7cc4b80a7a523dbc562a) Signed-off-by: github-actions[bot] --- _analyzers/index.md | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/_analyzers/index.md b/_analyzers/index.md index 95f97ec8ce..9b999e5c3d 100644 --- a/_analyzers/index.md +++ b/_analyzers/index.md @@ -170,6 +170,28 @@ The response provides information about the analyzers for each field: } ``` +## Normalizers +Tokenization divides text into individual terms, but it does not address variations in token forms. Normalization resolves these issues by converting tokens into a standard format. This ensures that similar terms are matched appropriately, even if they are not identical. + +### Normalization techniques + +The following normalization techniques can help address variations in token forms: +1. **Case normalization**: Converts all tokens to lowercase to ensure case-insensitive matching. For example, "Hello" is normalized to "hello". + +2. **Stemming**: Reduces words to their root form. For instance, "cars" is stemmed to "car", and "running" is normalized to "run". + +3. **Synonym handling:** Treats synonyms as equivalent. For example, "jogging" and "running" can be indexed under a common term, such as "run". + +### Normalization + +A search for `Hello` will match documents containing `hello` because of case normalization. + +A search for `cars` will also match documents containing `car` because of stemming. + +A query for `running` can retrieve documents containing `jogging` using synonym handling. + +Normalization ensures that searches are not limited to exact term matches, allowing for more relevant results. For instance, a search for `Cars running` can be normalized to match `car run`. + ## Next steps -- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/). \ No newline at end of file +- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/).