Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend support for token classification #2969

Closed
2 of 4 tasks
mathislucka opened this issue Aug 4, 2022 · 3 comments
Closed
2 of 4 tasks

Extend support for token classification #2969

mathislucka opened this issue Aug 4, 2022 · 3 comments
Labels
P3 Low priority, leave it in the backlog type:feature New feature or request wontfix This will not be worked on

Comments

@mathislucka
Copy link
Member

mathislucka commented Aug 4, 2022

Is your feature request related to a problem? Please describe.
It would be great if the support for token classification could be extended beyond what the Extractor currently offers. Specifically, we'd also need training and evaluation for token classification models. The node should also be able to support splitting and aggregation of longer texts to work around the 512 token limit present in most language models.

Describe the solution you'd like
Extension / re-implementation of the Extractor node to support the additional features.

  • training of token classification models
  • evaluation of token classification models
  • splitting and aggregation of longer texts
  • update postprocessing to work with models using SentencePiece tokenizers
@mathislucka mathislucka added the type:feature New feature or request label Aug 4, 2022
@sjrl
Copy link
Contributor

sjrl commented Aug 5, 2022

Additionally, we want to consider different postprocessing strategies when combining the predicted labels together. For example the prediction ["B-DEFENDER", "I-DEFENDER"] will be combined into one entity, but what should be done with a prediction like ["O", "I-DEFENDER", "O"]?

@masci
Copy link
Contributor

masci commented Nov 2, 2022

@sjrl was this resolved by #3154 ?

@sjrl
Copy link
Contributor

sjrl commented Nov 14, 2022

Hi @masci, PR #3154 partially resolves this issue. The PR did not add the training and evaluation of token classification models. I can edit the text of the main issue to better reflect the remaining tasks.

@masci masci added the P3 Low priority, leave it in the backlog label Apr 12, 2023
@masci masci unassigned sjrl Apr 12, 2023
@masci masci added the wontfix This will not be worked on label Feb 26, 2024
@masci masci closed this as completed Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 Low priority, leave it in the backlog type:feature New feature or request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants