Skip to content
dshantsev edited this page Jan 28, 2022 · 3 revisions

Training improvement ideas

What else can be done to train a model that gives more accurate tag predictions?

  • Stratify data split

  • Force Tag (--forcetag 1) i.e. forbid predicting no tags for an excerpt. Revisit

  • Model creativity as a new input: allow model to suggest more tags and change the metric punishing missed tags much more than extra tags

  • Use smaller max.length: it definitely makes training faster, and may in principle also improve model quality due to reduction in training landscape complexity

  • Learning rate

  • Optimizer

  • Add extra info to excerpts (hazard, sector etc): Revisit

  • Add PER component description to training data

  • GAN to generate synthetic data

  • Predict Dimensions too and use them to update the choice of Subdimension

Clone this wiki locally