Extend support for token classification #2969

mathislucka · 2022-08-04T15:25:37Z

Is your feature request related to a problem? Please describe.
It would be great if the support for token classification could be extended beyond what the Extractor currently offers. Specifically, we'd also need training and evaluation for token classification models. The node should also be able to support splitting and aggregation of longer texts to work around the 512 token limit present in most language models.

Describe the solution you'd like
Extension / re-implementation of the Extractor node to support the additional features.

training of token classification models
evaluation of token classification models
splitting and aggregation of longer texts
update postprocessing to work with models using SentencePiece tokenizers

The text was updated successfully, but these errors were encountered:

sjrl · 2022-08-05T13:45:52Z

Additionally, we want to consider different postprocessing strategies when combining the predicted labels together. For example the prediction ["B-DEFENDER", "I-DEFENDER"] will be combined into one entity, but what should be done with a prediction like ["O", "I-DEFENDER", "O"]?

masci · 2022-11-02T07:49:04Z

@sjrl was this resolved by #3154 ?

sjrl · 2022-11-14T15:48:37Z

Hi @masci, PR #3154 partially resolves this issue. The PR did not add the training and evaluation of token classification models. I can edit the text of the main issue to better reflect the remaining tasks.

mathislucka added the type:feature New feature or request label Aug 4, 2022

mathislucka assigned sjrl Aug 4, 2022

sjrl added the journey:advanced label Aug 4, 2022

This was referenced Sep 5, 2022

feat: Updated EntityExtractor to handle long texts and added better postprocessing #3154

Merged

EntityExtractor can't deal well with out-of-vocabulary words #1706

Closed

masci added the P3 Low priority, leave it in the backlog label Apr 12, 2023

masci unassigned sjrl Apr 12, 2023

masci removed the journey:advanced label Dec 6, 2023

masci added the wontfix This will not be worked on label Feb 26, 2024

masci closed this as completed Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend support for token classification #2969

Extend support for token classification #2969

mathislucka commented Aug 4, 2022 •

edited by sjrl

Loading

sjrl commented Aug 5, 2022

masci commented Nov 2, 2022

sjrl commented Nov 14, 2022

Extend support for token classification #2969

Extend support for token classification #2969

Comments

mathislucka commented Aug 4, 2022 • edited by sjrl Loading

sjrl commented Aug 5, 2022

masci commented Nov 2, 2022

sjrl commented Nov 14, 2022

mathislucka commented Aug 4, 2022 •

edited by sjrl

Loading