Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Commit

Permalink
Merge branch 'docs-loss' of https://github.com/jina-ai/finetuner into…
Browse files Browse the repository at this point in the history
… docs-loss
  • Loading branch information
bwanglzu committed Jun 27, 2023
2 parents d944c36 + 0b9bd9a commit a4f3d2d
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions docs/concepts/loss-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,22 +18,22 @@ For more information, please checkout {doc}`/concepts/data-preparation` .
`TripletMarginLoss` is a *contrastive* loss function, meaning that the loss is calculated by comparing the embeddings of multiple documents (3 to be exact) documents to each other.
Each triplet of documents consists of an anchor document, a positive document and a negative document.
The anchor and the positive document belong to the same class, and the negative document belongs to a different class.
The goal of `TripletMarginLoss` is to maximise the difference between the distance from the anchor to the positive document, and the distance from the anchor to the negative document.
The goal of `TripletMarginLoss` is to minimize the difference between the distance from the anchor to the positive document, and maximize the distance from the anchor to the negative document.

## ArcFaceLoss and CosFaceLoss

SphereFace loss is a loss function that was first formulated for computer vision and face recognition tasks.
Finetuner supports two variations of this loss function, `ArcFaceLoss` and `CosFaceLoss`.
Instead of attempting to minimise the distance between positive pairs and maximise the distance between negative pairs, the SphereFace loss functions compare each sample with an estimate of the center point of each classes' embeddings.
and attempt to minimize the *angular distance* between the document and its class centroid, and maximise the angular distance between the document and the centroids of the other classes.
Instead of attempting to minimize the distance between positive pairs and maximize the distance between negative pairs, the SphereFace loss functions compare each sample with an estimate of the center point of each class's embeddings.
and attempt to minimize the *angular distance* between the document and its class centroid, and maximize the angular distance between the document and the centroids of the other classes.

The `ArcFaceLoss` and `CosFaceLoss` both deviate from the traditional SphereFace loss by including a margin and scaling parameter, which can be used to increase the boundary between each class.
If an item's embedding is within the boundary of the class it belongs to, then no loss is incurred. Choosing appropriate values for the margin and scaling parameter is very important for effective training.
For more information on how `ArcFaceLoss` and `CosFaceLoss` calculate loss, and how these parameters affect the output, see this article on [loss metrics for deep learning](https://hav4ik.github.io/articles/deep-metric-learning-survey#cosface).

`TripletMarginLoss` uses a `ClassSampler` to construct batches with an equal number of samples of each class in the batch. However, since only one sample is needed to calculate the loss with the `ArcFaceLoss` and `CosFaceLoss` functions, there are no constraints on what each batch needs to contain.
Therefore we can construct batches using random sampling, which is a much simpler and less time consuming method.
By default, runs created using `ArcFaceLoss` or `CosfaceLoss` will use random sampling, however you can specify which type of sampling method you would like to use like so:
By default, runs created using `ArcFaceLoss` or `CosfaceLoss` will use random sampling, however you can specify which type of sampling method you would like to use with the `sampler` parameter:

```diff
run = finetuner.fit(
Expand All @@ -47,7 +47,7 @@ run = finetuner.fit(
```

In cases where the chosen loss function is a form of contrastive loss, such as the default `TripletMarginLoss`, or the `ClipLoss` function (the loss function used for `text-to-image` tasks), a class sampler is needed to properly function.
In these cases, this `sampler` parameters is ignored and the `ClassSampler` is always used.
In these cases, this `sampler` parameter is ignored and the `ClassSampler` is always used.

## CLIPLoss

Expand All @@ -69,7 +69,7 @@ The goal of `CosineSimilarityLoss` is to minimize the MSE (mean squared error) b

```{admonition} Note
:class: hint
Use `MarginMSELoss` together with data synthesis job.
Use `MarginMSELoss` together with data synthesis.
Check out {doc}`/concepts/synthesis-function` for more information.
```

Expand Down

0 comments on commit a4f3d2d

Please sign in to comment.