Wombat-7B，Wombat-7B-gpt4 and ChatGPT Results on Comparison based on Vicuna test set, evaluation by gpt-4. #18

onlyfish79 · 2023-04-24T11:23:02Z

Wombat-7B and ChatGPT Comparison based on Vicuna test set, score by GPT-4 Evaluation.

Wombat-7B: 599.0  average score: 7.5
ChatGPT: 710.5    average score: 8.9
wombat-7b / gpt35 = 84.31%

Wombat-7B-gpt4 and ChatGPT Comparison based on Vicuna test set, score by GPT-4 Evaluation.

Wombat-7B-gpt4: 577.0  average score: 7.2
ChatGPT: 734.5         average score: 9.2
wombat-7b-gpt4 / gpt35 = 78.13%

Wombat-7B and Wombat-7B-gpt4: use the script recover_wombat_7b.sh

According to the above results, Wombat-7B has better results than Wombat-7B-gpt4, does the result meet expectations?

The text was updated successfully, but these errors were encountered:

GanjinZero · 2023-04-24T11:26:11Z

Yes, it does meet our expectations, and we observe a similar score in Wombat-7B-gpt4 vs ChatGPT.
The reason is Wombat-7B uses 5 responses for one query to train RRHF.
Although Wombat-7B-gpt4 uses better responses, but it only contain 2 responses for one query.
We think more diverse responses are the most important point of training RRHF.

GanjinZero · 2023-04-24T11:29:40Z

Another possible thing is Wombat-7B use responses from its initial checkpoint, while Wombat-7B-gpt4 does not use the response from its initial checkpoint.
If RRHF is trying to improve based on itself, not using responses from its initial checkpoint worse RRHF's performance.

onlyfish79 · 2023-04-25T04:47:33Z

Understood, thank you for your response.
May I ask about the upcoming roadmap for RRHF?

GanjinZero · 2023-04-25T05:38:56Z

Understood, thank you for your response. May I ask about the upcoming roadmap for RRHF?

Chain of thought reasoning & Scaling to 13b, 30b, 65b llama / alpaca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wombat-7B，Wombat-7B-gpt4 and ChatGPT Results on Comparison based on Vicuna test set, evaluation by gpt-4. #18

Wombat-7B，Wombat-7B-gpt4 and ChatGPT Results on Comparison based on Vicuna test set, evaluation by gpt-4. #18

onlyfish79 commented Apr 24, 2023

GanjinZero commented Apr 24, 2023

GanjinZero commented Apr 24, 2023

onlyfish79 commented Apr 25, 2023

GanjinZero commented Apr 25, 2023

Wombat-7B，Wombat-7B-gpt4 and ChatGPT Results on Comparison based on Vicuna test set, evaluation by gpt-4. #18

Wombat-7B，Wombat-7B-gpt4 and ChatGPT Results on Comparison based on Vicuna test set, evaluation by gpt-4. #18

Comments

onlyfish79 commented Apr 24, 2023

GanjinZero commented Apr 24, 2023

GanjinZero commented Apr 24, 2023

onlyfish79 commented Apr 25, 2023

GanjinZero commented Apr 25, 2023