Skip to content

Commit

Permalink
Update sentencetransformermodel.py
Browse files Browse the repository at this point in the history
Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com>
  • Loading branch information
thanawan-atc committed Aug 15, 2023
1 parent 1402bf4 commit f783502
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions opensearch_py_ml/ml_models/sentencetransformermodel.py
Original file line number Diff line number Diff line change
Expand Up @@ -791,8 +791,9 @@ def save_as_pt(
zip_file_name = str(model_id.split("/")[-1] + ".zip")
zip_file_path = os.path.join(model_output_path, zip_file_name)

# handle undefined model_max_length in model's tokenizer (e.g. "intfloat/e5-small-v2" )
if model.tokenizer.model_max_length == 1000000000000000019884624838656:
# handle when model_max_length is unproperly defined in model's tokenizer (e.g. "intfloat/e5-small-v2")
# (See PR #219 and https://github.com/huggingface/transformers/issues/14561 for more context)
if model.tokenizer.model_max_length > model.get_max_seq_length():
model.tokenizer.model_max_length = model.get_max_seq_length()
print(
f"The model_max_length is not properly defined in tokenizer_config.json. Setting it to be {model.tokenizer.model_max_length}"
Expand Down

0 comments on commit f783502

Please sign in to comment.