Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order inconsistency of output candidate file with original test.json when testing bertSum Extractive #129

Open
cece00 opened this issue Jul 21, 2022 · 1 comment

Comments

@cece00
Copy link

cece00 commented Jul 21, 2022

Under "test" mode, there will be two files output: xxx.candidate and xxx.gold.
The texts in above two files are in the same order, but do not consistent with the original test.json.
I have checked that "shuffle=False" in dataloader. So where is wrong?
Is there anyone who has encountered the same problem? Can anyone help!?

@ashokurlana
Copy link

ashokurlana commented Jul 29, 2022

@cece00 Modify the Line 89 src/model/data_loader.py
The following code fixed the similar issue for me

def atoi(text):
return int(text) if text.isdigit() else text

def natural_keys(text):
return [ atoi(c) for c in re.split(r'(\d+)', text) ]

pts = sorted(glob.glob(args.bert_data_path + 'cnndm.' + corpus_type + '.[0-9]*.bert.pt'))
pts.sort(key=natural_keys)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants