Skip to content

Commit

Permalink
[DOCS] Documents that models with one allocations might have downtime.
Browse files Browse the repository at this point in the history
  • Loading branch information
szabosteve committed Oct 17, 2023
1 parent 0d20ee9 commit c4a4300
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions docs/en/stack/ml/nlp/ml-nlp-deploy-models.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,11 @@ nodes. Model allocations are independent units of work for NLP tasks. To
influence model performance, you can configure the number of allocations and the
number of threads used by each allocation of your deployment.

IMPORTANT: If your deployed trained model only has one allocation, it's very
likely that you will experience some downtime in the service your trained model
performs. You can reduce or eliminate downtime by adding more allocations to
your trained models.

Throughput can be scaled by adding more allocations to the deployment; it
increases the number of {infer} requests that can be performed in parallel. All
allocations assigned to a node share the same copy of the model in memory. The
Expand Down

0 comments on commit c4a4300

Please sign in to comment.