You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After running through some test prompts there are many instances where there is nothing to separate the two answers (they're both wrong exactly the same amount or in the same way). There's probably something more statistically valid than picking at random in those cases.
The text was updated successfully, but these errors were encountered:
Thanks for the feedback! The concern is indeed valid. I will need to think about how tie should be handled, though.
I did not implement the "both wrong" "both ok" out of two reasons:
Tie conflicts with the elimination-based tournament process, which allow you to pick the better responses among the responses from each model first, before comparing responses from different models.
Personally, two-way decision feels less mentally taxing as compared to three or four.
I need to think about how to handle tie in elimination matches, or to replace elimination with something else.
After running through some test prompts there are many instances where there is nothing to separate the two answers (they're both wrong exactly the same amount or in the same way). There's probably something more statistically valid than picking at random in those cases.
The text was updated successfully, but these errors were encountered: