Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train model shows entity overlap #38

Open
Manikandan0001 opened this issue Aug 11, 2020 · 4 comments
Open

train model shows entity overlap #38

Manikandan0001 opened this issue Aug 11, 2020 · 4 comments

Comments

@Manikandan0001
Copy link

Manikandan0001 commented Aug 11, 2020

@OmkarPathak Can you please help me on train a custom model. Help me to train without overlapping. Is there function/methodology to avoid overlapping.

ValueError: [E103] Trying to set conflicting doc.ents: '(4774, 4778, 'Location')' and '(4744, 4789, 'College Name')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

@BenSturgeon
Copy link

I encountered a similar issue and edited the custom_train file in such a way as to fix it. Most of the changes are in the method called "determine". As far as I can tell the problem is converting from dataturks to spacy format, but it should eliminate any overlaps generally. Let me know if it helps.

custom_train_fixed.zip

@Manikandan0001
Copy link
Author

I encountered a similar issue and edited the custom_train file in such a way as to fix it. Most of the changes are in the method called "determine". As far as I can tell the problem is converting from dataturks to spacy format, but it should eliminate any overlaps generally. Let me know if it helps.

custom_train_fixed.zip

Thanks for your response @BenSturgeon , Let you know if it works.

@Manikandan0001
Copy link
Author

@BenSturgeon training was completed without any errors using your code. Thanks. But the parsing result after training is not that much effective. right?

@qarampage
Copy link

qarampage commented Oct 26, 2020

Hi,
I am still getting error after using the custom_train_fixed file.
C:\projects\py_virtual_env\venvr\venv\lib\site-packages\spacy\language.py:482: UserWarning: [W030] Some entities could not be aligned in the text "Ritesh
To be an asset to the company and de..." with entities "[[1427, 1470, 'Email Address'], [996, 1039, 'Skill...". Use spacy.gold.biluo_tags_from_offsets(nlp.make_doc(text), entities) to check the alignment. Misaligned entities ('-') will be ignored during training.
gold = GoldParse(doc, **gold)
Losses {'ner': 65305.11264929587}
Starting iteration 1

and I also receive error when executing test_name.py. after executing the above training python module for only 1 time. and not sure where it is picking en_training from ?
C:\projects\py_virtual_env\venvr\venv\lib\site-packages\spacy\util.py:275: UserWarning: [W031] Model 'en_training' (0.0.0) requires spaCy v2.1 and is incompatible with the current spaCy version (2.3.2). This may lead to unexpected results or runtime errors. To resolve this, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
warnings.warn(warn_msg)
Traceback (most recent call last):
File "C:/projects/mygitlab/mlpython/Jupyter_Notebooks/Projects_LARGE/Resume-Parser-Source/test_name.py", line 44, in
test_local_name()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants