Home

Training improvement ideas

What else can be done to train a model that gives more accurate tag predictions?

Stratify data split
Force Tag (--forcetag 1) i.e. forbid predicting no tags for an excerpt. Revisit
Model creativity as a new input: allow model to suggest more tags and change the metric punishing missed tags much more than extra tags
Use smaller max.length: it definitely makes training faster, and may in principle also improve model quality due to reduction in training landscape complexity
Learning rate
Optimizer
Add extra info to excerpts (hazard, sector etc): Revisit
Add PER component description to training data
GAN to generate synthetic data
Predict Dimensions too and use them to update the choice of Subdimension