Skip to content

Commit

Permalink
[DOCS] Adds pre-cleaning recommendation to ELSER docs. (#2796) (#2798)
Browse files Browse the repository at this point in the history
(cherry picked from commit 34a6c7b)

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
  • Loading branch information
mergify[bot] and szabosteve authored Sep 19, 2024
1 parent cc20c5a commit 4490619
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -440,6 +440,17 @@ per document to ingest.
To learn more about ELSER performance, refer to the <<elser-benchmarks>>.


[discrete]
[[pre-cleaning]]
== Pre-cleaning input text

The quality of the input text significantly affects the quality of the embeddings.
To achieve the best results, it's recommended to clean the input text before generating embeddings.
The exact preprocessing you may need to do heavily depends on your text.
For example, if your text contains HTML tags, use the {ref}/htmlstrip-processor.html[HTML strip processor] in an ingest pipeline to remove unnecessary elements.
Always review and clean your input text before ingestion to eliminate any irrelevant entities that might affect the results.


[discrete]
[[elser-adaptive-allocations]]
== Adaptive allocations
Expand Down

0 comments on commit 4490619

Please sign in to comment.