Add MultipleNegativesRankingLoss for EmbeddingRetriever training with sentence-transformers #3136

bglearning · 2022-09-01T15:01:07Z

Currently training EmbeddingRetriever with sentence-transformer uses the MarginMSE loss. We want to also support MultipleNegativesRankingLoss which can work with a simpler data requirement, particularly it doesn't necessitate a "score" for the data pairs/tuples.

Related discussion: deepset-ai/haystack-tutorials#35

Background and Related work

#2388 added the ability to train EmbeddingRetriever (sentence-transformer variant) with Generative Pseudo Labeling (GPL). It uses MarginMSE loss with labels coming from a soft pseudo-labeling process. Example Colab notebook.

Input data is then of the format:

[
{”question”: …, “pos_doc”: …, “neg_doc”: …, “score”: …}, 
... 
]

It works well. However, there can be cases where users want to directly move onto the retriever training from their data without the pseudo-labeling or some other intermediate process to come up with the scores.

Supporting the use of MultipleNegativesRankingLoss(MNRL) can provide such an option.

Proposal

Provide an argument (maybe a string or directly the loss class from sentence-transformers) to the EmbeddingRetriever#train method to allow for selection between the two losses. Data checks could be added to make sure the loss choice and data formats are compatible.

For MNRL, the format would be as below (with the neg_doc also being optional).

[
{”question”: …, “pos_doc”: …, “neg_doc”: …}, 
... 
]

Next up

I'll open a draft PR and start working on this. Fine implementation details can be worked out there. In the meantime, if there are any thoughts, please drop them here.
cc: @mkkuemmel @mathislucka @vblagoje

The text was updated successfully, but these errors were encountered:

bglearning mentioned this issue Sep 1, 2022

Tutorial 09: Update to EmbeddingRetriever Training deepset-ai/haystack-tutorials#35

Closed

bglearning added topic:train journey:intermediate topic:retriever labels Sep 1, 2022

bglearning self-assigned this Sep 1, 2022

bglearning mentioned this issue Sep 5, 2022

feat: Add option to use MultipleNegativesRankingLoss for EmbeddingRetriever training with sentence-transformers #3164

Merged

6 tasks

bglearning closed this as completed in #3164 Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MultipleNegativesRankingLoss for EmbeddingRetriever training with sentence-transformers #3136

Add MultipleNegativesRankingLoss for EmbeddingRetriever training with sentence-transformers #3136

bglearning commented Sep 1, 2022 •

edited

Loading

Add MultipleNegativesRankingLoss for EmbeddingRetriever training with sentence-transformers #3136

Add MultipleNegativesRankingLoss for EmbeddingRetriever training with sentence-transformers #3136

Comments

bglearning commented Sep 1, 2022 • edited Loading

Background and Related work

Proposal

Next up

bglearning commented Sep 1, 2022 •

edited

Loading