token-classification

The MERIT Dataset is a fully synthetic, labeled dataset created for training and benchmarking LLMs on Visually Rich Document Understanding tasks. It is also designed to help detect biases and improve interpretability in LLMs, where we are actively working. This repository is actively maintained, and new features are continuously being added.

biases synthetic-dataset-generation layoutlm synthetic-dataset layoutxlm token-classification layoutlmv3 layoutlmv2 llms-benchmarking

Updated Sep 6, 2024
Python

aditeyabaral / maple

Star

Implementation of the paper, MAPLE - MAsking words to generate blackout Poetry using sequence-to-sequence LEarning, ICNLSP 2021

natural-language-processing transformers summarization sequence-labeling token-classification blackout-poetry

Updated Sep 30, 2022
Python

pha123661 / NTU-2022Fall-ADL

Star

Applied Deep Learning 深度學習之應用 by Vivian Chen 陳縕儂 at NTU CSIE

natural-language-processing reinforcement-learning deep-learning question-answering policy-gradient summarization seq2seq ntu sequence-labeling sequence-classification adl multimodal-deep-learning token-classification prompt-tuning parameter-efficient-tuning

Updated Feb 1, 2023
Python

Babelscape / ID10M

Star

Data and code for the paper "ID10M: Idiom Identification in 10 Languages" (NAACL 2022).

nlp machine-learning pytorch dataset multiword-expressions figurative-language multilinguality idiomatic-expressions token-classification

Updated Feb 1, 2023
Python

AlexKly / Detailed-NER-Dataset-RU

Star

Labeled Russian text token-by-token for training models for NER task based samples got from parsing different resources and generated by ChatGPT.

nlp dataset text-processing ner ner-task token-classification

Updated Jun 20, 2023
Python

aditeyabaral / maple-v2

Star

MAPLEv2 - Multi-task Approach for generating blackout Poetry with Linguistic Evaluation

transformers grammar-checker perplexity token-classification blackout-poetry

Updated Apr 2, 2023
Python

naivenlp / rapidnlp-datasets

Star

Data pipelines for both TensorFlow and PyTorch!

tensorflow keras pytorch question-answering sequence-classification dataset-loader simcse masked-language-models token-classification

Updated Feb 1, 2022
Python

dsashulya / biobert-distillation

Star

summer internship project @ JetBrains Research

transformers ner distillation biobert token-classification

Updated Aug 27, 2021
Python

TiagoSanti / LID-token-classification

Star

Scrap, token classification and model deployment for a selective process.

scraping transformers bert fine-tuning huggingface wandb bert-fine-tuning token-classification label-studio-ml-backend

Updated Aug 8, 2024
Python

matteo-stat / transformers-nlp-ner-token-classification

Star

This repo provides scripts for fine-tuning HuggingFace Transformers, setting up pipelines and optimizing token classification models for inference. They are based on my experience developing a custom chatbot, I’m sharing these in the hope they will help others to quickly fine-tune and use models in their projects! 😊