pytorch-Text_Classification

Dataset : UCC - Unhealthy comments corpus :

The goal of this practical project is to implement state-of-the-art NLP models in pytorch to perform multi-label text classification on the high-quality UCC dataset. This dataset was published in 2020 in the paper Six Attributes of Unhealthy Conversation.

The dataset contains over 40 000 healthy comments and less than 3000 unhealthy comments. In addition to the binary labels, it also captures 6 unhealthy sub-attributes, such as (1) hostile, (2) insulting and trolling, (3) dismissive .... (6) unfair generalization. For some of these attributes, this was the first large publicly available dataset that captured them.

Model training :

The original paper aimed to present the dataset and they trained a BERT model on the text classification task. I used the BERT-base, T5 and roBERTa models. The latter had better scores in the classification of unhealthy labels.

Results :

Given that the original paper was published in 2020 and focused on the dataset, I was able to replicate the same performance measure the authors used and achieve better scores for all labels before any hyperparameter optimization steps.

The authors scored 50% for the classification of the label sarcasm and talked about the difficulty of detecting sarcasm. With the fine-tuned roBERTa model, I was able to achieve a score of 75% before any hyper parameter optimization.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.idea		.idea
images		images
README.md		README.md
RoBERTa_ucc.ipynb		RoBERTa_ucc.ipynb
roberta_ucc.py		roberta_ucc.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pytorch-Text_Classification

Dataset : UCC - Unhealthy comments corpus :

Model training :

Results :

About

Releases

Packages

Languages

jlacv/pytorch-Text_Classification

Folders and files

Latest commit

History

Repository files navigation

pytorch-Text_Classification

Dataset : UCC - Unhealthy comments corpus :

Model training :

Results :

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages