Duplicate definition of name (last_hidden_state) for float 16 onnx #16

MrRace · 2022-05-07T12:29:08Z

After training like that:

# For compression with a replacement scheduler
export GLUE_DIR=glue_script/glue_data
export TASK_NAME=MRPC

python ./run_glue.py \
  --model_name_or_path /home/bert-base \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --do_lower_case \
  --data_dir "$GLUE_DIR/$TASK_NAME" \
  --max_seq_length 128 \
  --per_gpu_train_batch_size 32 \
  --per_gpu_eval_batch_size 32 \
  --learning_rate 2e-5 \
  --save_steps 50 \
  --num_train_epochs 15 \
  --output_dir result/ \
  --evaluate_during_training \
  --replacing_rate 0.3 \
  --scheduler_type linear \
  --scheduler_linear_k 0.0006

I convert the result model to Hugging Face with convert_to_hf_ckpt.py.
And I try to export to onnx:

output = torch.onnx.export(model,
                               org_dummy_input,
                               MODEL_ONNX_PATH,
                               verbose=True,
                               operator_export_type=OPERATOR_EXPORT_TYPE,
                               opset_version=12,
                               input_names=['input_ids', 'attention_mask', 'token_type_ids'], 
                               output_names=['last_hidden_state', 'pooler_output'],  
                               do_constant_folding=True,
                               dynamic_axes={"input_ids": {0: "batch_size", 1: 'seq_length'},
                                             "token_type_ids": {0: "batch_size", 1: 'seq_length'},
                                             "attention_mask": {0: "batch_size", 1: 'seq_length'},
                                             "pooler_output": {0: "batch_size"},
                                             "last_hidden_state": {0: "batch_size", 1: 'seq_length'}}
                               )

The result is float32 ONNX. I try to convert to float 16 onnx:

python3 -m onnxruntime_tools.transformers.optimizer --input  onnx/bert_fp32.onnx --output onnx/bert_fp16.onnx --float16

However when do inference for float16 onnx, it comes errors:

  session = InferenceSession(model_file, providers=['CUDAExecutionProvider'])
  File "/usr/local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 283, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/usr/local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 310, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from /home/model_compression/BERT-of-Theseus/onnx/bert_fp16.onnx failed:This is an invalid model. Error: Duplicate definition of name (last_hidden_state).

Have you try convert the pytorch model to float 16 onnx ？ Why the half-precision operation makes the Duplicate definition of name (last_hidden_state). By the way, float32 onnx and run successfully without errors.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate definition of name (last_hidden_state) for float 16 onnx #16

Duplicate definition of name (last_hidden_state) for float 16 onnx #16

MrRace commented May 7, 2022

Duplicate definition of name (last_hidden_state) for float 16 onnx #16

Duplicate definition of name (last_hidden_state) for float 16 onnx #16

Comments

MrRace commented May 7, 2022