Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I also got slightly Lower Rouge score for the same code #128

Open
milktea0917 opened this issue Jul 15, 2022 · 0 comments
Open

I also got slightly Lower Rouge score for the same code #128

milktea0917 opened this issue Jul 15, 2022 · 0 comments

Comments

@milktea0917
Copy link

Hello!
I appreciate and inspire by your great work on Extractive Summarization.
So, i had run your script on your github and got Rouge following Rouge Scores:

For Transformer

On Paper : ROUGE-F(1/2/3/l): 43.25/20.24/39.63
When I ran, and the best score : ROUGE-F(1/2/3/l): 43.04/20.19/39.48
image
image
image

The values written here are the best scores I got from the model following the instructions and paper shows average of 3, which would be much lower in my case, too. This part is same as issue #100 .


What i have done for reproducing:

  1. CNN/DM's Data was from https://drive.google.com/file/d/1DN7ClZCCXsk2KegmC6t4ClBwtAf5galI/view , which is provided and without doing any preprocess on my own.
  2. ROUGE TEST has successfully pass.
  3. Both training and validation settings are same as https://github.com/ShehabMMohamed/PreSumm#readme

I have used one Nvidia 1080 GPU .

But if i used the upper bound of each rouge score , for example:
image
image
image

Than if we average those rouge score ->
ROUGE-F1: (43.278+43.058+43.187)/3 = 43.1743333333
ROUGE-F2: (20.441 + 20.414 + 20.297)/3 = 20.384
ROUGE-FL: (39.709 + 39.662 + 39.541)/3 = 39.6373333333
This score seems to be more acceptable than the futher result 43.04/20.19/39.48 which is the single model checkpoint result without averaging.
This new result is doing average and also close to the score on paper.


So, i got several questions:

Q1: Can the train setting on readme get the same score from paper? If not, may i ask for the settings in order to reproduce a better score?
Q2: May i ask for the three model checkpoint which you had selected for testing phase?
Q3: Is the score should be calculate as i futher discuss(using upper bound from rouge score range) ?

Hope to get your response !
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant