Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: index 2002 is out of bounds for dimension 0 with size 768 #11

Open
Dheeraj-kkde opened this issue Jan 31, 2024 · 0 comments

Comments

@Dheeraj-kkde
Copy link

Initially I was getting the the following Error:

OSError: Can't load tokenizer for 'bert-base-uncased'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'bert-base-uncased' is the correct path to a directory containing all relevant files for a BertTokenizer tokenizer.

the above error is same for all the LM models mentioned in README.md file.

After I used the following post to resolve that : #https://stackoverflow.com/questions/69286889/transformers-and-bert-downloading-to-your-local-machine

Later we getting the below error:

PS C:\Users\dheerajkumar11\AI-CoE\Trusted AI-Fairness Metrics\crows-pairs-master> python metric.py --input_file data/crows_pairs_anonymized.csv --lm_model bert --output_file
ERROR:
Traceback (most recent call last):
File "C:\Users\dheerajkumar11\AI-CoE\Trusted AI-Fairness Metrics\crows-pairs-master\metric.py", line 296, in
evaluate(args)
File "C:\Users\dheerajkumar11\AI-CoE\Trusted AI-Fairness Metrics\crows-pairs-master\metric.py", line 234, in evaluate
score = mask_unigram(data, lm)
File "C:\Users\dheerajkumar11\AI-CoE\Trusted AI-Fairness Metrics\crows-pairs-master\metric.py", line 149, in mask_unigram
score1 = get_log_prob_unigram(sent1_masked_token_ids, sent1_token_ids, template1[i], lm)
File "C:\Users\dheerajkumar11\AI-CoE\Trusted AI-Fairness Metrics\crows-pairs-master\metric.py", line 74, in get_log_prob_unigram
log_probs = log_softmax(hs)[target_id]
IndexError: index 2002 is out of bounds for dimension 0 with size 768

My findings:

In the metric.py file,
hs = hidden_states[mask_idx]
target_id = token_ids[0][mask_idx]
log_probs = log_softmax(hs)[target_id]

here the 
hs.size()
torch.Size([768])

Seems it's breaking in the LM Logic, Can someone please look into the error as it's not even working for the sample csv file provided in the repo.

Thanks and regards,
Dheeraj Kumar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant