Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All the reference sources should be non-silent #94

Open
DavidDiazGuerra opened this issue Jul 3, 2023 · 11 comments
Open

All the reference sources should be non-silent #94

DavidDiazGuerra opened this issue Jul 3, 2023 · 11 comments

Comments

@DavidDiazGuerra
Copy link
Contributor

Hello,

If you call to museval.eval_mus_track with at least one of the references being completely silent, you get a ValueError saying that:

All the reference sources should be non-silent (not all-zeros), but at least one of the reference sources is all 0s, which introduces ambiguity to the evaluation. (Otherwise we can add infinitely many all-zero sources.)

I do not understand the reasons for this. According to the error message ("Otherwise we can add infinitely many all-zero sources") it seems to be related to some kind of permutation invariance problem but, since both the references and the estimates are labeled when calling to museval.eval_mus_track, there is no need to solve any permutation to compute the metrics.

I've tried to comment this part of the code and let the museval.eval_mus_track run. The result is that the SDR is NaN in all the frames of the silent references and therefore the aggregated metrics for those references are also NaN so they will be ignored when computing the aggregated metrics of the dataset. This seems to be a quite reasonable behavior to me. Is there something that I'm missing here or the need for all the references to be non-silent is indeed not real?

I guess another option would be just removing the silent references before calling to museval.eval_mus_track. How would this affect the aggregated metrics for the whole dataset?

Thanks,
David

@faroit
Copy link
Member

faroit commented Jul 3, 2023

@DavidDiazGuerra I think you are right but we wanted to just warn users instead of creating NaN values.

But maybe we convert this into a warning then?

Feel free to provide a PR

@DavidDiazGuerra
Copy link
Contributor Author

Cool! I can do a PR, but just a couple of questions before:

  1. After having raised a warning, I guess you are okay with creating NaN values? Or should I try something different like removing the silent stems?
  2. What kind of message should contain the warning? it will need to be quite different than the original error message. Something like At least one of the reference sources is all 0s, this will generate NaN values in the metrics of the silent sources and this track will not be taken into account when computing the aggregated metrics of these sources. would be okay?

@faroit
Copy link
Member

faroit commented Jul 5, 2023

Thinking more about it, nans can't be stored in json, can it?

@DavidDiazGuerra
Copy link
Contributor Author

I think JSON doesn't support NaNs officially, but simplejson can read and write them. I think museval is already writing NaNs in JSON files because when I submitted #91 I was still working with the standard MUSDB18 dataset and I hadn't removed the error about the silent references yet.

@faroit
Copy link
Member

faroit commented Jul 7, 2023

Ha. Thanks for the reminder.

@DavidDiazGuerra
Copy link
Contributor Author

I was working in a PR about this and I realized that the function bss_eval has a parameter called compute_permutation, I guess that if this is true museval ignores the names of the references and the estimates and tries to find the best assignation between them? Should we keep raising an error in case compute_permutation is true and at least one of the references is completely silent?

@faroit
Copy link
Member

faroit commented Jul 14, 2023

Maybe @aliutkus can chime in?

@aliutkus
Copy link
Member

aliutkus commented Jul 14, 2023

The issue of evaluating separation when some target(s) are identically zero is ill defined with bsseval metrics and I have been having discussions forever with many people on how to address that, with no clear consensus emerging.

Removing the zero targets as I understand you suggest is creating a problem of its own due to corresponding estimates possibly not being zero and hence comprising interferences from other sources.

My opinion today is: museval/bsseval shouldnt make such design decision and should raise an error (as they do now) when some targets are identically zero. The message could maybe be better though indeed.

Then, a user may totally code some workaround of her/his own handling such cases the desired way.
If museval was to evolve on this, I think that the very least would be that thé change is supported by a peer reviewed paper/ challenge like MDX
@rabitt @Jonathan-LeRoux @ethman @bryan-pardo @pseeth

@DavidDiazGuerra
Copy link
Contributor Author

I was proposing just letting the library compute the metrics according to their definitions, so SDR and SIR are -inf/NaN (SAR is still a numerical value) and then excluding these frames when computing the aggregated metrics. This is what the library is already doing when a target is silent in a frame, so I'm basically proposing doing the same when all the frames are silent, so the NaN is propagated to the aggregated metric of the track and then excluded when computing the aggregated metric of the dataset. Adding some extra metrics to measure the interference, noise, and artifact levels in this kind of silent frames/tracks could be interesting since with the current definitions their metrics are always -inf no matter how well the system estimated the silence of the target, but I think this is a good starting point.

I think allowing silent targets is important when working on systems that try to separate more instruments than just the 4 stems of DMX or when working with other music genres, since in those situations allowing only tracks with all the instruments would dramatically limit the size of our datasets.

I'm not sure how easy coding a workaround is without modifying the code of the library itself. I think one of the most interesting things about this library is that authors working on music source separation can use a "standard" set of metrics with the same implementation so it is easier to compare their results, but having each author doing their own workaround would break this.

You can check the details of what I'm proposing in the last commit of my fork DavidDiazGuerra@353247e. I think this could be the default behavior of the library since it doesn't change any results computed before the change (it just allows computing results in situations that raised an error in the past) and it's just doing the same that the library already does when some frames of a target are silent, but another option could be adding some kind of allow_silent_targets argument to bss_eval that is False by default but can be set to True by the user.

@aliutkus
Copy link
Member

Hi, returning inf/nan in that case looks totally legit to me, in which case the error could be changed by a warning
Thanks a lot

@DavidDiazGuerra
Copy link
Contributor Author

Hi, sorry for the late reply, I went on holidays and then forgot about this.

I can do a PR with the accepted solution, but I still have this doubt:

I was working in a PR about this and I realized that the function bss_eval has a parameter called compute_permutation, I guess that if this is true museval ignores the names of the references and the estimates and tries to find the best assignation between them? Should we keep raising an error in case compute_permutation is true and at least one of the references is completely silent?

Just let me know what to do if compute_permutation==True and I'll submit a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants