elastic · szabosteve · Oct 27, 2023 · Oct 27, 2023
diff --git a/docs/en/stack/ml/nlp/images/ml-nlp-bm25-elser-v2.png b/docs/en/stack/ml/nlp/images/ml-nlp-bm25-elser-v2.png
diff --git a/docs/en/stack/ml/nlp/images/ml-nlp-elser-average-ndcg.png b/docs/en/stack/ml/nlp/images/ml-nlp-elser-average-ndcg.png
diff --git a/docs/en/stack/ml/nlp/images/ml-nlp-elser-bm-summary.png b/docs/en/stack/ml/nlp/images/ml-nlp-elser-bm-summary.png
diff --git a/docs/en/stack/ml/nlp/images/ml-nlp-elser-ndcg10-beir.png b/docs/en/stack/ml/nlp/images/ml-nlp-elser-ndcg10-beir.png
diff --git a/docs/en/stack/ml/nlp/images/ml-nlp-elser-v1-v2.png b/docs/en/stack/ml/nlp/images/ml-nlp-elser-v1-v2.png
diff --git a/docs/en/stack/ml/nlp/images/ml-nlp-elser-v2-pa-bm-results.png b/docs/en/stack/ml/nlp/images/ml-nlp-elser-v2-pa-bm-results.png
diff --git a/docs/en/stack/ml/nlp/images/ml-nlp-elser-v2-ps-bm-results.png b/docs/en/stack/ml/nlp/images/ml-nlp-elser-v2-ps-bm-results.png
diff --git a/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc
@@ -386,6 +386,30 @@ hardwares and compares the model performance to {es} BM25 and other strong
 baselines such as Splade or OpenAI.
 
 
+[discrete]
+[[version-overview]]
+=== Version overview
+
+ELSER V2 has a **platform specific** version that is designed to run only on 
+Linux with an x86-64 CPU architecture and a **platform agnostic or portable** 
+version that can be run on any platform.
+
+
+[discrete]
+==== ELSER V2
+
+Besides the performance improvements, the biggest change in ELSER V2 is the 
+introduction of the first platform specific ELSER model - that is, a model that 
+can run only on Linux with an x86-64 CPU architecture. The platform specific 
+model is designed to work best on newer Intel CPUs, but it works on AMD CPUs as 
+well. It is recommended to use the new platform specific Linux-x86-64 model for 
+all new users of ELSER as it is significantly faster than the platform agnostic 
+(portable) model which can be run on any platform. ELSER V2 produces 
+significantly higher quality embeddings than ELSER V1. Regardless of which ELSER 
+V2 model you use (platform specific or platform agnostic (portable)), the 
+particular embeddings produced are the same.
+
+
 [discrete]
 [[elser-qualitative-benchmarks]]
 === Qualitative benchmarks
@@ -395,30 +419,11 @@ Discounted Cumulative Gain (NDCG) which can handle multiple relevant documents
 and fine-grained document ratings. The metric is applied to a fixed-sized list 
 of retrieved documents which, in this case, is the top 10 documents (NDCG@10).
 
-The table below shows the performance of ELSER v2 compared to ELSER v1. ELSER v2 
-has 10 wins, 1 draw, 1 loss and an average improvement in NDCG@10 of 2.5%.
-
-image::images/ml-nlp-elser-v1-v2.png[alt="ELSER v2 benchmarks compared to ELSER v1",align="center"]
-_NDCG@10 for BEIR data sets for ELSER v2 and ELSER v1  - higher values are better)_
-
-The next table shows the performance of ELSER v1 compared to {es} BM25 with an 
-English analyzer broken down by the 12 data sets used for the evaluation. ELSER 
-v1 has 10 wins, 1 draw, 1 loss and an average improvement in NDCG@10 of 17%.
-
-image::images/ml-nlp-elser-ndcg10-beir.png[alt="ELSER v1 benchmarks",align="center"]
-_NDCG@10 for BEIR data sets for BM25 and ELSER v1  - higher values are better)_
+The table below shows the performance of ELSER V2 compared to BM 25. ELSER V2 
+has 10 wins, 1 draw, 1 loss and an average improvement in NDCG@10 of 18%.
 
-The following table compares the average performance of ELSER v1 to some other 
-strong baselines. The OpenAI results are separated out because they use a 
-different subset of the BEIR suite.
-
-image::images/ml-nlp-elser-average-ndcg.png[alt="ELSER v1 average performance compared to other baselines",align="center"]
-_Average NDCG@10 for BEIR data sets vs. various high quality baselines (higher_ 
-_is better). OpenAI chose a different subset, ELSER v1 results on this set_ 
-_reported separately._
-
-To read more about the evaluation details, refer to 
-https://www.elastic.co/blog/may-2023-launch-information-retrieval-elasticsearch-ai-model[this blog post].
+image::images/ml-nlp-bm25-elser-v2.png[alt="ELSER V2 benchmarks compared to BM25",align="center"]
+_NDCG@10 for BEIR data sets for BM25 and ELSER V2  - higher values are better)_
 
 
 [discrete]
@@ -435,36 +440,33 @@ realistic view on the model performance for your use case.
 
 
 [discrete]
-==== ELSER v1
-
-Two data sets were utilized to evaluate the performance of ELSER v1 in different 
-hardware configurations: `msmarco-long-light` and `arguana`.
-
-|==============================================================================================================
-| **Data set**             ^| **Data set size**   ^| **Average count of tokens / query** ^| **Average count of tokens / document**
-| `msmarco-long-light`     ^| 37367 documents     ^| 9                                   ^| 1640                              
-| `arguana`                ^| 8674 documents      ^| 238                                 ^| 202                               
-|==============================================================================================================
-
-The `msmarco-long-light` data set contains long documents with an average of 
-over 512 tokens, which provides insights into the performance implications 
-of indexing and {infer} time for long documents. This is a subset of the 
-"msmarco" dataset specifically designed for document retrieval (it shouldn't be 
-confused with the "msmarco" dataset used for passage retrieval, which primarily 
-consists of shorter spans of text). 
-
-The `arguana` data set is a https://github.com/beir-cellar/beir[BEIR] data set. 
-It consists of long queries with an average of 200 tokens per query. It can 
-represent an upper limit for query slowness.
-
-The table below present benchmarking results for ELSER using various hardware 
-configurations.
-
-|==================================================================================================================================================================================
-|                                                         3+^| `msmarco-long-light`                                     3+^| `arguana`                                             | 
-|                                                         ^.^| inference     ^.^| indexing         ^.^| query latency   ^.^| inference      ^.^| indexing      ^.^| query latency  |  
-| **ML node 4GB - 2 vCPUs (1 allocation * 1 thread)**     ^.^| 581   ms/call ^.^| 1.7   doc/sec    ^.^| 713   ms/query  ^.^| 1200   ms/call ^.^| 0.8   doc/sec ^.^| 169   ms/query |  
-| **ML node 16GB - 8 vCPUs (7 allocation * 1 thread)**    ^.^| 568   ms/call ^.^| 12    doc/sec    ^.^| 689   ms/query  ^.^| 1280   ms/call ^.^| 5.4   doc/sec ^.^| 159   ms/query |  
-| **ML node 16GB - 8 vCPUs (1 allocation * 8 thread)**    ^.^| 102   ms/call ^.^| 9.7   doc/sec    ^.^| 164   ms/query  ^.^| 220    ms/call ^.^| 4.5   doc/sec ^.^| 40    ms/query | 
-| **ML node 32 GB - 16 vCPUs (15 allocation * 1 thread)** ^.^| 565   ms/call ^.^| 25.2  doc/sec    ^.^| 608   ms/query  ^.^| 1260   ms/call ^.^| 11.4  doc/sec ^.^| 138   ms/query | 
-|==================================================================================================================================================================================
+==== ELSER V2
+
+Overall the platform specific V2 model ingested at a max rate of 26 docs/s, 
+compared with the ELSER V1 max rate of 14 docs/s from the ELSER V1 benchamrk, 
+resulting in a 90% increase in throughput.
+
+The performance of virtual cores (that is, when the number of allocations is 
+greater than half of the vCPUs) has increased. Previously, the increase in 
+performance between 8 and 16 allocations was around 7%. It has increased to 17% 
+(ELSER V1 on 8.11) and 20% (for ELSER V2 platform specific). These tests were 
+performed on a 16vCPU machine, with all documents containing exactly 256 tokens.
+
+IMPORTANT: The length of the documents in your particular dataset will have a 
+significant impact on your throughput numbers.
+
+image::images/ml-nlp-elser-bm-summary.png[alt="Summary of ELSER V1 and V2 benchmark reports",align="center"]
+
+**The platform specific** results show a nearly linear growth up until 8 
+allocations, after which performance improvements become smaller. In this case, 
+the performance at 8 allocations was 22 docs/s, while the performance of 16 
+allocations was 26 docs/s, indicating a 20% performance increase due to virtual 
+cores.
+
+image::images/ml-nlp-elser-v2-ps-bm-results.png[alt="ELSER V2 platform specific benchmarks",align="center"]
+
+**The platform agnostic** model performance of 8 and 16 allocations are 
+respectively 14 docs/s and 16 docs/s, indicating a performance improvement due 
+to virtual cores of 12%.
+
+image::images/ml-nlp-elser-v2-pa-bm-results.png[alt="ELSER V2 platform agnostic benchmarks",align="center"]