Skip to content

Commit

Permalink
[DOC] Normalizers (#8192)
Browse files Browse the repository at this point in the history
* updating index page with normalisation

Signed-off-by: leanne.laceybyrne@eliatra.com <leanne.laceybyrne@eliatra.com>

* Update _analyzers/index.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _analyzers/index.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: leanneeliatra <131779422+leanneeliatra@users.noreply.github.com>

---------

Signed-off-by: leanne.laceybyrne@eliatra.com <leanne.laceybyrne@eliatra.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: leanneeliatra <131779422+leanneeliatra@users.noreply.github.com>
Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
(cherry picked from commit 53b650f)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
  • Loading branch information
3 people committed Sep 24, 2024
1 parent 2535af9 commit aa1dd8f
Showing 1 changed file with 23 additions and 1 deletion.
24 changes: 23 additions & 1 deletion _analyzers/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,28 @@ The response provides information about the analyzers for each field:
}
```

## Normalizers
Tokenization divides text into individual terms, but it does not address variations in token forms. Normalization resolves these issues by converting tokens into a standard format. This ensures that similar terms are matched appropriately, even if they are not identical.

### Normalization techniques

The following normalization techniques can help address variations in token forms:
1. **Case normalization**: Converts all tokens to lowercase to ensure case-insensitive matching. For example, "Hello" is normalized to "hello".

2. **Stemming**: Reduces words to their root form. For instance, "cars" is stemmed to "car", and "running" is normalized to "run".

3. **Synonym handling:** Treats synonyms as equivalent. For example, "jogging" and "running" can be indexed under a common term, such as "run".

### Normalization

A search for `Hello` will match documents containing `hello` because of case normalization.

A search for `cars` will also match documents containing `car` because of stemming.

A query for `running` can retrieve documents containing `jogging` using synonym handling.

Normalization ensures that searches are not limited to exact term matches, allowing for more relevant results. For instance, a search for `Cars running` can be normalized to match `car run`.

## Next steps

- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/).
- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/).

0 comments on commit aa1dd8f

Please sign in to comment.