Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Normalizers #8192

Merged
24 changes: 23 additions & 1 deletion _analyzers/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,28 @@ The response provides information about the analyzers for each field:
}
```

## Normalizers
Tokenization divides text into individual terms, but it does not address variations in token forms. Normalization resolves these issues by converting tokens into a standard format. This ensures that similar terms are matched appropriately, even if they are not identical.

### Normalization techniques
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

The following normalization techniques can help address variations in token forms:
1. **Case normalization**: Converts all tokens to lowercase to ensure case-insensitive matching. For example, "Hello" is normalized to "hello".

2. **Stemming**: Reduces words to their root form. For instance, "cars" is stemmed to "car", and "running" is normalized to "run".

3. **Synonym handling:** Treats synonyms as equivalent. For example, "jogging" and "running" can be indexed under a common term, such as "run".

### Normalization

A search for `Hello` will match documents containing `hello` because of case normalization.

A search for `cars` will also match documents containing `car` because of stemming.

A query for `running` can retrieve documents containing `jogging` using synonym handling.

Normalization ensures that searches are not limited to exact term matches, allowing for more relevant results. For instance, a search for `Cars running` can be normalized to match `car run`.

## Next steps

- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/).
- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/).
Loading