[Bug] --loadInputs not working: input name mismatch when Flatten is the input node #1990

lazycal · 2022-05-16T17:07:37Z

Description

I created a model with an input named i0. However, when I tried to pass an input with --loadInputs=i0:id.bin to trtexec to run the model, I got the following error: Cannot find input tensor with name "i0" in the engine bindings! Please make sure the input tensor names are correct. Below is my code snippet to create the model and input:

import onnx
from onnx import helper
from onnx import AttributeProto, TensorProto, GraphProto
import struct
N = 2

i0 = helper.make_tensor_value_info('i0', TensorProto.FLOAT, [N])
o0 = helper.make_tensor_value_info('o0', TensorProto.FLOAT, [1, N])

node = onnx.helper.make_node(
    'Flatten',
    inputs=['i0'],
    outputs=['o0'],
    axis=0,
)

graph_def = helper.make_graph(
    [node],        # nodes
    'test-model',      # name
    [i0],  # inputs
    [o0],               # outputs
)

# Create the model (ModelProto)
model_def = helper.make_model(graph_def, producer_name='onnx-example')

print('The model is:\n{}'.format(model_def))
onnx.checker.check_model(model_def, full_check=True)
print('The model is checked!')
onnx.save(model_def, "output.onnx")

with open("id.bin", "wb") as f:
    for i in range(N):
        d = float(i)
        f.write(struct.pack('<f', d))

I tried adding more layers afterwards and the issue still exists, but after replacing the Flatten with other op will not trigger this error. For example, the code below replaces it with Unsqueeze and passed:

import numpy as np
import onnx
from onnx import helper
from onnx import AttributeProto, TensorProto, GraphProto
import struct
N = 2

i0 = helper.make_tensor_value_info('i0', TensorProto.FLOAT, [N])
o0 = helper.make_tensor_value_info('o0', TensorProto.FLOAT, [1, N])
values = np.array([0], dtype=np.int64)

const = onnx.helper.make_node(
    'Constant',
    inputs=[],
    outputs=['axes'],
    value=onnx.helper.make_tensor(
        name='const_tensor',
        data_type=onnx.TensorProto.INT64,
        dims=values.shape,
        vals=values,
    ),
)

# node = onnx.helper.make_node(
#     'Flatten',
#     inputs=['i0'],
#     outputs=['o0'],
#     axis=0,
# )
node = helper.make_node(
    'Unsqueeze',
    inputs=['i0', 'axes'],
    outputs=['o0'],
)

graph_def = helper.make_graph(
    [const, node],        # nodes
    'test-model',      # name
    [i0],  # inputs
    [o0],               # outputs
)

# Create the model (ModelProto)
model_def = helper.make_model(graph_def, producer_name='onnx-example')

print('The model is:\n{}'.format(model_def))
onnx.checker.check_model(model_def, full_check=True)
print('The model is checked!')
onnx.save(model_def, "output.onnx")

with open("id.bin", "wb") as f:
    for i in range(N):
        d = float(i)
        f.write(struct.pack('<f', d))

P.S. PyTorch's onnx converter is buggy at handling Flatten, so I have to make the ONNX with onnx helper manually, instead of from PyTorch, so if this is worth investigating, it might be a good idea to avoid building model from PyTorch.

Environment

TensorRT Version: 8.4.0.6
NVIDIA GPU: RTX2080
NVIDIA Driver Version: 495.29.05
CUDA Version: 11.5
CUDNN Version: 8.3.0
Operating System: 18.04.1-Ubuntu
Python Version (if applicable): 3.8.10
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version): Container built atop nvcr.io/nvidia/tensorrt 21.11-py3

Relevant Files

output.onnx 3.zip

Steps To Reproduce

Run trtexec --onnx=output.onnx --loadInputs='i0':id.bin
The full output log:

&&&& RUNNING TensorRT.trtexec [TensorRT v8400] # trtexec --onnx=output.onnx --loadInputs=i0:id.bin
[05/16/2022-13:02:56] [I] === Model Options ===
[05/16/2022-13:02:56] [I] Format: ONNX
[05/16/2022-13:02:56] [I] Model: output.onnx
[05/16/2022-13:02:56] [I] Output:
[05/16/2022-13:02:56] [I] === Build Options ===
[05/16/2022-13:02:56] [I] Max batch: explicit batch
[05/16/2022-13:02:56] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[05/16/2022-13:02:56] [I] minTiming: 1
[05/16/2022-13:02:56] [I] avgTiming: 8
[05/16/2022-13:02:56] [I] Precision: FP32
[05/16/2022-13:02:56] [I] LayerPrecisions: 
[05/16/2022-13:02:56] [I] Calibration: 
[05/16/2022-13:02:56] [I] Refit: Disabled
[05/16/2022-13:02:56] [I] Sparsity: Disabled
[05/16/2022-13:02:56] [I] Safe mode: Disabled
[05/16/2022-13:02:56] [I] DirectIO mode: Disabled
[05/16/2022-13:02:56] [I] Restricted mode: Disabled
[05/16/2022-13:02:56] [I] Save engine: 
[05/16/2022-13:02:56] [I] Load engine: 
[05/16/2022-13:02:56] [I] Profiling verbosity: 0
[05/16/2022-13:02:56] [I] Tactic sources: Using default tactic sources
[05/16/2022-13:02:56] [I] timingCacheMode: local
[05/16/2022-13:02:56] [I] timingCacheFile: 
[05/16/2022-13:02:56] [I] Input(s)s format: fp32:CHW
[05/16/2022-13:02:56] [I] Output(s)s format: fp32:CHW
[05/16/2022-13:02:56] [I] Input build shapes: model
[05/16/2022-13:02:56] [I] Input calibration shapes: model
[05/16/2022-13:02:56] [I] === System Options ===
[05/16/2022-13:02:56] [I] Device: 0
[05/16/2022-13:02:56] [I] DLACore: 
[05/16/2022-13:02:56] [I] Plugins:
[05/16/2022-13:02:56] [I] === Inference Options ===
[05/16/2022-13:02:56] [I] Batch: Explicit
[05/16/2022-13:02:56] [I] Input inference shapes: model
[05/16/2022-13:02:56] [I] Iterations: 10
[05/16/2022-13:02:56] [I] Duration: 3s (+ 200ms warm up)
[05/16/2022-13:02:56] [I] Sleep time: 0ms
[05/16/2022-13:02:56] [I] Idle time: 0ms
[05/16/2022-13:02:56] [I] Streams: 1
[05/16/2022-13:02:56] [I] ExposeDMA: Disabled
[05/16/2022-13:02:56] [I] Data transfers: Enabled
[05/16/2022-13:02:56] [I] Spin-wait: Disabled
[05/16/2022-13:02:56] [I] Multithreading: Disabled
[05/16/2022-13:02:56] [I] CUDA Graph: Disabled
[05/16/2022-13:02:56] [I] Separate profiling: Disabled
[05/16/2022-13:02:56] [I] Time Deserialize: Disabled
[05/16/2022-13:02:56] [I] Time Refit: Disabled
[05/16/2022-13:02:56] [I] Skip inference: Disabled
[05/16/2022-13:02:56] [I] Inputs:
[05/16/2022-13:02:56] [I] i0<-id.bin
[05/16/2022-13:02:56] [I] === Reporting Options ===
[05/16/2022-13:02:56] [I] Verbose: Disabled
[05/16/2022-13:02:56] [I] Averages: 10 inferences
[05/16/2022-13:02:56] [I] Percentile: 99
[05/16/2022-13:02:56] [I] Dump refittable layers:Disabled
[05/16/2022-13:02:56] [I] Dump output: Disabled
[05/16/2022-13:02:56] [I] Profile: Disabled
[05/16/2022-13:02:56] [I] Export timing to JSON file: 
[05/16/2022-13:02:56] [I] Export output to JSON file: 
[05/16/2022-13:02:56] [I] Export profile to JSON file: 
[05/16/2022-13:02:56] [I] 
[05/16/2022-13:02:56] [I] === Device Information ===
[05/16/2022-13:02:56] [I] Selected Device: NVIDIA GeForce RTX 2080
[05/16/2022-13:02:56] [I] Compute Capability: 7.5
[05/16/2022-13:02:56] [I] SMs: 46
[05/16/2022-13:02:56] [I] Compute Clock Rate: 1.71 GHz
[05/16/2022-13:02:56] [I] Device Global Memory: 7982 MiB
[05/16/2022-13:02:56] [I] Shared Memory per SM: 64 KiB
[05/16/2022-13:02:56] [I] Memory Bus Width: 256 bits (ECC disabled)
[05/16/2022-13:02:56] [I] Memory Clock Rate: 7 GHz
[05/16/2022-13:02:56] [I] 
[05/16/2022-13:02:56] [I] TensorRT version: 8.4.0
[05/16/2022-13:02:56] [I] [TRT] [MemUsageChange] Init CUDA: CPU +314, GPU +0, now: CPU 322, GPU 263 (MiB)
[05/16/2022-13:02:57] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 342 MiB, GPU 263 MiB
[05/16/2022-13:02:57] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 569 MiB, GPU 337 MiB
[05/16/2022-13:02:57] [I] Start parsing network model
[05/16/2022-13:02:57] [I] [TRT] ----------------------------------------------------------------
[05/16/2022-13:02:57] [I] [TRT] Input filename:   output.onnx
[05/16/2022-13:02:57] [I] [TRT] ONNX IR version:  0.0.8
[05/16/2022-13:02:57] [I] [TRT] Opset version:    15
[05/16/2022-13:02:57] [I] [TRT] Producer name:    onnx-example
[05/16/2022-13:02:57] [I] [TRT] Producer version: 
[05/16/2022-13:02:57] [I] [TRT] Domain:           
[05/16/2022-13:02:57] [I] [TRT] Model version:    0
[05/16/2022-13:02:57] [I] [TRT] Doc string:       
[05/16/2022-13:02:57] [I] [TRT] ----------------------------------------------------------------
[05/16/2022-13:02:57] [I] Finish parsing network model
[05/16/2022-13:02:57] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 11.7.3
[05/16/2022-13:02:57] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +508, GPU +224, now: CPU 1077, GPU 561 (MiB)
[05/16/2022-13:02:57] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +114, GPU +52, now: CPU 1191, GPU 613 (MiB)
[05/16/2022-13:02:57] [W] [TRT] TensorRT was linked against cuDNN 8.3.2 but loaded cuDNN 8.3.0
[05/16/2022-13:02:57] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[05/16/2022-13:02:57] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[05/16/2022-13:02:57] [I] [TRT] Total Host Persistent Memory: 0
[05/16/2022-13:02:57] [I] [TRT] Total Device Persistent Memory: 0
[05/16/2022-13:02:57] [I] [TRT] Total Scratch Memory: 0
[05/16/2022-13:02:57] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 4 MiB
[05/16/2022-13:02:57] [I] [TRT] Total Activation Memory: 0
[05/16/2022-13:02:57] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[05/16/2022-13:02:57] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 1191, GPU 595 (MiB)
[05/16/2022-13:02:57] [I] [TRT] Loaded engine size: 0 MiB
[05/16/2022-13:02:57] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[05/16/2022-13:02:57] [I] Engine built in 1.54448 sec.
[05/16/2022-13:02:57] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[05/16/2022-13:02:57] [E] Cannot find input tensor with name "i0" in the engine bindings! Please make sure the input tensor names are correct.
[05/16/2022-13:02:57] [E] Invalid tensor names found in --loadInputs flag.
[05/16/2022-13:02:57] [E] Inference set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8400] # trtexec --onnx=output.onnx --loadInputs=i0:id.bin

The text was updated successfully, but these errors were encountered:

nvpohanh · 2022-05-17T01:50:42Z

It looks like TRT has done some optimization around the Flatten op and the input tensor name has changed. Could you try the following?

trtexec --onnx=output.onnx --saveEngine=output.engine --buildOnly
Then, use the python script to get the input tensor names of the engine:

with open("output.engine", "rb") as f:
    serialized_engine = f.read()

with trt.Runtime(logger) as runtime:
    engine = runtime.deserialize_cuda_engine(serialized_engine)
    print(engine.get_binding_name(0)) # print out the input tensor name

lazycal · 2022-05-17T03:00:04Z

@nvpohanh Sure, I tried the script and its output is __o0.

lazycal · 2022-05-17T03:01:34Z

The full output:

[05/16/2022-23:00:43] [TRT] [I] [MemUsageChange] Init CUDA: CPU +315, GPU +0, now: CPU 324, GPU 263 (MiB)
[05/16/2022-23:00:43] [TRT] [I] Loaded engine size: 0 MiB
[05/16/2022-23:00:43] [TRT] [V] Deserialization required 580 microseconds.
[05/16/2022-23:00:43] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
__o0

and the script:

import tensorrt as trt
logger = trt.Logger(trt.Logger.VERBOSE)

with open("output.engine", "rb") as f:
    serialized_engine = f.read()

with trt.Runtime(logger) as runtime:
    engine = runtime.deserialize_cuda_engine(serialized_engine)
    print(engine.get_binding_name(0))  # print out the input tensor name

nvpohanh · 2022-05-17T07:53:23Z

yes, could you now try: trtexec --onnx=output.onnx --loadInputs='__o0':id.bin?

lazycal · 2022-05-17T15:44:25Z

@nvpohanh Yeah that works. I just realized that I may not be clear: actually the issue is just intended to report a bug and no urgency is needed (since both the unsqueeze one and your solution work as workarounds). Sorry if I didn't make it clear :-).

nvpohanh · 2022-05-18T04:07:12Z

I see. Thanks for the feedback. We can take a look at why the input tensor name is changed.

One question: does the input tensor name change if the network contain more than one op? Such as a normal ONNX file containing the whole network?

lazycal · 2022-05-18T15:44:40Z

@nvpohanh Yes.

For example, I add ReLU and the input name still got changed (to the input name of ReLU). To reproduce, use the following code:

import numpy as np
import onnx
from onnx import helper
from onnx import AttributeProto, TensorProto, GraphProto
import struct
N = 2

i0 = helper.make_tensor_value_info('i0', TensorProto.FLOAT, [N])
o0 = helper.make_tensor_value_info('o0', TensorProto.FLOAT, [1, N])
values = np.array([0], dtype=np.int64)

const = onnx.helper.make_node(
    'Constant',
    inputs=[],
    outputs=['axes'],
    value=onnx.helper.make_tensor(
        name='const_tensor',
        data_type=onnx.TensorProto.INT64,
        dims=values.shape,
        vals=values,
    ),
)

node = onnx.helper.make_node(
    'Flatten',
    inputs=['i0'],
    outputs=['x'],
    axis=0,
)
# node = helper.make_node(
#     'Unsqueeze',
#     inputs=['i0', 'axes'],
#     outputs=['o0'],
# )
relu_node = helper.make_node(
    'Relu',
    inputs=['x'],
    outputs=['o0'],
)

graph_def = helper.make_graph(
    [node, relu_node],        # nodes
    'test-model',      # name
    [i0],  # inputs
    [o0],               # outputs
)


# Create the model (ModelProto)
model_def = helper.make_model(graph_def, producer_name='onnx-example')

print('The model is:\n{}'.format(model_def))
onnx.checker.check_model(model_def, full_check=True)
print('The model is checked!')
onnx.save(model_def, "output.onnx")

with open("id.bin", "wb") as f:
    for i in range(N):
        d = float(i)
        f.write(struct.pack('<f', d))
from subprocess import check_call
check_call(
    'trtexec --onnx=output.onnx --saveEngine=output.engine --buildOnly', shell=True)

import tensorrt as trt
logger = trt.Logger(trt.Logger.VERBOSE)

with open("output.engine", "rb") as f:
    serialized_engine = f.read()

with trt.Runtime(logger) as runtime:
    engine = runtime.deserialize_cuda_engine(serialized_engine)
    print(engine.get_binding_name(0))  # print out the input tensor name

I also tried changing N or the rank of i0 to 2D, and the issue also occurs in both cases.

nvpohanh · 2022-05-26T09:36:08Z

Internal bug tracking id: 3660435

We are investigating the issue. Will keep you updated. Thanks

nvpohanh · 2022-07-06T05:55:41Z

This has been fixed internally and will be available in next TRT release. Thanks for reporting this issue!

ghost · 2022-12-07T08:25:45Z

Hi, which version of TensorRT will fix this? I met a similar problem in TensorRT 8.5.1.7

nvpohanh · 2022-12-07T08:51:43Z

This was supposed to be fixed in TRT 8.5.1.7. @zhangtaoshan could you share your ONNX file?

nvpohanh self-assigned this May 17, 2022

nvpohanh added the triaged Issue has been triaged by maintainers label May 17, 2022

lazycal changed the title ~~--loadInputs not working：input name not found~~ [Bug] --loadInputs not working：input name not found May 17, 2022

lazycal changed the title ~~[Bug] --loadInputs not working：input name not found~~ [Bug] --loadInputs not working: input name mismatch when Flatten is the input node May 17, 2022

nvpohanh added the bug Something isn't working label May 19, 2022

lazycal closed this as completed Jul 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] --loadInputs not working: input name mismatch when Flatten is the input node #1990

[Bug] --loadInputs not working: input name mismatch when Flatten is the input node #1990

lazycal commented May 16, 2022

nvpohanh commented May 17, 2022

lazycal commented May 17, 2022

lazycal commented May 17, 2022

nvpohanh commented May 17, 2022

lazycal commented May 17, 2022

nvpohanh commented May 18, 2022

lazycal commented May 18, 2022

nvpohanh commented May 26, 2022

nvpohanh commented Jul 6, 2022

ghost commented Dec 7, 2022

nvpohanh commented Dec 7, 2022

[Bug] --loadInputs not working: input name mismatch when Flatten is the input node #1990

[Bug] --loadInputs not working: input name mismatch when Flatten is the input node #1990

Comments

lazycal commented May 16, 2022

Description

Environment

Relevant Files

Steps To Reproduce

nvpohanh commented May 17, 2022

lazycal commented May 17, 2022

lazycal commented May 17, 2022

nvpohanh commented May 17, 2022

lazycal commented May 17, 2022

nvpohanh commented May 18, 2022

lazycal commented May 18, 2022

nvpohanh commented May 26, 2022

nvpohanh commented Jul 6, 2022

ghost commented Dec 7, 2022

nvpohanh commented Dec 7, 2022