Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] --loadInputs not working: input name mismatch when Flatten is the input node #1990

Closed
lazycal opened this issue May 16, 2022 · 11 comments
Assignees
Labels
bug Something isn't working triaged Issue has been triaged by maintainers

Comments

@lazycal
Copy link

lazycal commented May 16, 2022

Description

I created a model with an input named i0. However, when I tried to pass an input with --loadInputs=i0:id.bin to trtexec to run the model, I got the following error: Cannot find input tensor with name "i0" in the engine bindings! Please make sure the input tensor names are correct. Below is my code snippet to create the model and input:

import onnx
from onnx import helper
from onnx import AttributeProto, TensorProto, GraphProto
import struct
N = 2

i0 = helper.make_tensor_value_info('i0', TensorProto.FLOAT, [N])
o0 = helper.make_tensor_value_info('o0', TensorProto.FLOAT, [1, N])

node = onnx.helper.make_node(
    'Flatten',
    inputs=['i0'],
    outputs=['o0'],
    axis=0,
)

graph_def = helper.make_graph(
    [node],        # nodes
    'test-model',      # name
    [i0],  # inputs
    [o0],               # outputs
)

# Create the model (ModelProto)
model_def = helper.make_model(graph_def, producer_name='onnx-example')

print('The model is:\n{}'.format(model_def))
onnx.checker.check_model(model_def, full_check=True)
print('The model is checked!')
onnx.save(model_def, "output.onnx")

with open("id.bin", "wb") as f:
    for i in range(N):
        d = float(i)
        f.write(struct.pack('<f', d))

image

I tried adding more layers afterwards and the issue still exists, but after replacing the Flatten with other op will not trigger this error. For example, the code below replaces it with Unsqueeze and passed:

import numpy as np
import onnx
from onnx import helper
from onnx import AttributeProto, TensorProto, GraphProto
import struct
N = 2

i0 = helper.make_tensor_value_info('i0', TensorProto.FLOAT, [N])
o0 = helper.make_tensor_value_info('o0', TensorProto.FLOAT, [1, N])
values = np.array([0], dtype=np.int64)

const = onnx.helper.make_node(
    'Constant',
    inputs=[],
    outputs=['axes'],
    value=onnx.helper.make_tensor(
        name='const_tensor',
        data_type=onnx.TensorProto.INT64,
        dims=values.shape,
        vals=values,
    ),
)

# node = onnx.helper.make_node(
#     'Flatten',
#     inputs=['i0'],
#     outputs=['o0'],
#     axis=0,
# )
node = helper.make_node(
    'Unsqueeze',
    inputs=['i0', 'axes'],
    outputs=['o0'],
)

graph_def = helper.make_graph(
    [const, node],        # nodes
    'test-model',      # name
    [i0],  # inputs
    [o0],               # outputs
)

# Create the model (ModelProto)
model_def = helper.make_model(graph_def, producer_name='onnx-example')

print('The model is:\n{}'.format(model_def))
onnx.checker.check_model(model_def, full_check=True)
print('The model is checked!')
onnx.save(model_def, "output.onnx")

with open("id.bin", "wb") as f:
    for i in range(N):
        d = float(i)
        f.write(struct.pack('<f', d))

P.S. PyTorch's onnx converter is buggy at handling Flatten, so I have to make the ONNX with onnx helper manually, instead of from PyTorch, so if this is worth investigating, it might be a good idea to avoid building model from PyTorch.

Environment

TensorRT Version: 8.4.0.6
NVIDIA GPU: RTX2080
NVIDIA Driver Version: 495.29.05
CUDA Version: 11.5
CUDNN Version: 8.3.0
Operating System: 18.04.1-Ubuntu
Python Version (if applicable): 3.8.10
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version): Container built atop nvcr.io/nvidia/tensorrt 21.11-py3

Relevant Files

output.onnx 3.zip

Steps To Reproduce

Run trtexec --onnx=output.onnx --loadInputs='i0':id.bin
The full output log:

&&&& RUNNING TensorRT.trtexec [TensorRT v8400] # trtexec --onnx=output.onnx --loadInputs=i0:id.bin
[05/16/2022-13:02:56] [I] === Model Options ===
[05/16/2022-13:02:56] [I] Format: ONNX
[05/16/2022-13:02:56] [I] Model: output.onnx
[05/16/2022-13:02:56] [I] Output:
[05/16/2022-13:02:56] [I] === Build Options ===
[05/16/2022-13:02:56] [I] Max batch: explicit batch
[05/16/2022-13:02:56] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[05/16/2022-13:02:56] [I] minTiming: 1
[05/16/2022-13:02:56] [I] avgTiming: 8
[05/16/2022-13:02:56] [I] Precision: FP32
[05/16/2022-13:02:56] [I] LayerPrecisions: 
[05/16/2022-13:02:56] [I] Calibration: 
[05/16/2022-13:02:56] [I] Refit: Disabled
[05/16/2022-13:02:56] [I] Sparsity: Disabled
[05/16/2022-13:02:56] [I] Safe mode: Disabled
[05/16/2022-13:02:56] [I] DirectIO mode: Disabled
[05/16/2022-13:02:56] [I] Restricted mode: Disabled
[05/16/2022-13:02:56] [I] Save engine: 
[05/16/2022-13:02:56] [I] Load engine: 
[05/16/2022-13:02:56] [I] Profiling verbosity: 0
[05/16/2022-13:02:56] [I] Tactic sources: Using default tactic sources
[05/16/2022-13:02:56] [I] timingCacheMode: local
[05/16/2022-13:02:56] [I] timingCacheFile: 
[05/16/2022-13:02:56] [I] Input(s)s format: fp32:CHW
[05/16/2022-13:02:56] [I] Output(s)s format: fp32:CHW
[05/16/2022-13:02:56] [I] Input build shapes: model
[05/16/2022-13:02:56] [I] Input calibration shapes: model
[05/16/2022-13:02:56] [I] === System Options ===
[05/16/2022-13:02:56] [I] Device: 0
[05/16/2022-13:02:56] [I] DLACore: 
[05/16/2022-13:02:56] [I] Plugins:
[05/16/2022-13:02:56] [I] === Inference Options ===
[05/16/2022-13:02:56] [I] Batch: Explicit
[05/16/2022-13:02:56] [I] Input inference shapes: model
[05/16/2022-13:02:56] [I] Iterations: 10
[05/16/2022-13:02:56] [I] Duration: 3s (+ 200ms warm up)
[05/16/2022-13:02:56] [I] Sleep time: 0ms
[05/16/2022-13:02:56] [I] Idle time: 0ms
[05/16/2022-13:02:56] [I] Streams: 1
[05/16/2022-13:02:56] [I] ExposeDMA: Disabled
[05/16/2022-13:02:56] [I] Data transfers: Enabled
[05/16/2022-13:02:56] [I] Spin-wait: Disabled
[05/16/2022-13:02:56] [I] Multithreading: Disabled
[05/16/2022-13:02:56] [I] CUDA Graph: Disabled
[05/16/2022-13:02:56] [I] Separate profiling: Disabled
[05/16/2022-13:02:56] [I] Time Deserialize: Disabled
[05/16/2022-13:02:56] [I] Time Refit: Disabled
[05/16/2022-13:02:56] [I] Skip inference: Disabled
[05/16/2022-13:02:56] [I] Inputs:
[05/16/2022-13:02:56] [I] i0<-id.bin
[05/16/2022-13:02:56] [I] === Reporting Options ===
[05/16/2022-13:02:56] [I] Verbose: Disabled
[05/16/2022-13:02:56] [I] Averages: 10 inferences
[05/16/2022-13:02:56] [I] Percentile: 99
[05/16/2022-13:02:56] [I] Dump refittable layers:Disabled
[05/16/2022-13:02:56] [I] Dump output: Disabled
[05/16/2022-13:02:56] [I] Profile: Disabled
[05/16/2022-13:02:56] [I] Export timing to JSON file: 
[05/16/2022-13:02:56] [I] Export output to JSON file: 
[05/16/2022-13:02:56] [I] Export profile to JSON file: 
[05/16/2022-13:02:56] [I] 
[05/16/2022-13:02:56] [I] === Device Information ===
[05/16/2022-13:02:56] [I] Selected Device: NVIDIA GeForce RTX 2080
[05/16/2022-13:02:56] [I] Compute Capability: 7.5
[05/16/2022-13:02:56] [I] SMs: 46
[05/16/2022-13:02:56] [I] Compute Clock Rate: 1.71 GHz
[05/16/2022-13:02:56] [I] Device Global Memory: 7982 MiB
[05/16/2022-13:02:56] [I] Shared Memory per SM: 64 KiB
[05/16/2022-13:02:56] [I] Memory Bus Width: 256 bits (ECC disabled)
[05/16/2022-13:02:56] [I] Memory Clock Rate: 7 GHz
[05/16/2022-13:02:56] [I] 
[05/16/2022-13:02:56] [I] TensorRT version: 8.4.0
[05/16/2022-13:02:56] [I] [TRT] [MemUsageChange] Init CUDA: CPU +314, GPU +0, now: CPU 322, GPU 263 (MiB)
[05/16/2022-13:02:57] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 342 MiB, GPU 263 MiB
[05/16/2022-13:02:57] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 569 MiB, GPU 337 MiB
[05/16/2022-13:02:57] [I] Start parsing network model
[05/16/2022-13:02:57] [I] [TRT] ----------------------------------------------------------------
[05/16/2022-13:02:57] [I] [TRT] Input filename:   output.onnx
[05/16/2022-13:02:57] [I] [TRT] ONNX IR version:  0.0.8
[05/16/2022-13:02:57] [I] [TRT] Opset version:    15
[05/16/2022-13:02:57] [I] [TRT] Producer name:    onnx-example
[05/16/2022-13:02:57] [I] [TRT] Producer version: 
[05/16/2022-13:02:57] [I] [TRT] Domain:           
[05/16/2022-13:02:57] [I] [TRT] Model version:    0
[05/16/2022-13:02:57] [I] [TRT] Doc string:       
[05/16/2022-13:02:57] [I] [TRT] ----------------------------------------------------------------
[05/16/2022-13:02:57] [I] Finish parsing network model
[05/16/2022-13:02:57] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 11.7.3
[05/16/2022-13:02:57] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +508, GPU +224, now: CPU 1077, GPU 561 (MiB)
[05/16/2022-13:02:57] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +114, GPU +52, now: CPU 1191, GPU 613 (MiB)
[05/16/2022-13:02:57] [W] [TRT] TensorRT was linked against cuDNN 8.3.2 but loaded cuDNN 8.3.0
[05/16/2022-13:02:57] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[05/16/2022-13:02:57] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[05/16/2022-13:02:57] [I] [TRT] Total Host Persistent Memory: 0
[05/16/2022-13:02:57] [I] [TRT] Total Device Persistent Memory: 0
[05/16/2022-13:02:57] [I] [TRT] Total Scratch Memory: 0
[05/16/2022-13:02:57] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 4 MiB
[05/16/2022-13:02:57] [I] [TRT] Total Activation Memory: 0
[05/16/2022-13:02:57] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[05/16/2022-13:02:57] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 1191, GPU 595 (MiB)
[05/16/2022-13:02:57] [I] [TRT] Loaded engine size: 0 MiB
[05/16/2022-13:02:57] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[05/16/2022-13:02:57] [I] Engine built in 1.54448 sec.
[05/16/2022-13:02:57] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[05/16/2022-13:02:57] [E] Cannot find input tensor with name "i0" in the engine bindings! Please make sure the input tensor names are correct.
[05/16/2022-13:02:57] [E] Invalid tensor names found in --loadInputs flag.
[05/16/2022-13:02:57] [E] Inference set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8400] # trtexec --onnx=output.onnx --loadInputs=i0:id.bin
@nvpohanh nvpohanh self-assigned this May 17, 2022
@nvpohanh nvpohanh added the triaged Issue has been triaged by maintainers label May 17, 2022
@nvpohanh
Copy link
Collaborator

It looks like TRT has done some optimization around the Flatten op and the input tensor name has changed. Could you try the following?

  • trtexec --onnx=output.onnx --saveEngine=output.engine --buildOnly
  • Then, use the python script to get the input tensor names of the engine:
with open("output.engine", "rb") as f:
    serialized_engine = f.read()

with trt.Runtime(logger) as runtime:
    engine = runtime.deserialize_cuda_engine(serialized_engine)
    print(engine.get_binding_name(0)) # print out the input tensor name

@lazycal
Copy link
Author

lazycal commented May 17, 2022

@nvpohanh Sure, I tried the script and its output is __o0.

@lazycal
Copy link
Author

lazycal commented May 17, 2022

The full output:

[05/16/2022-23:00:43] [TRT] [I] [MemUsageChange] Init CUDA: CPU +315, GPU +0, now: CPU 324, GPU 263 (MiB)
[05/16/2022-23:00:43] [TRT] [I] Loaded engine size: 0 MiB
[05/16/2022-23:00:43] [TRT] [V] Deserialization required 580 microseconds.
[05/16/2022-23:00:43] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
__o0

and the script:

import tensorrt as trt
logger = trt.Logger(trt.Logger.VERBOSE)

with open("output.engine", "rb") as f:
    serialized_engine = f.read()

with trt.Runtime(logger) as runtime:
    engine = runtime.deserialize_cuda_engine(serialized_engine)
    print(engine.get_binding_name(0))  # print out the input tensor name

@nvpohanh
Copy link
Collaborator

yes, could you now try: trtexec --onnx=output.onnx --loadInputs='__o0':id.bin?

@lazycal
Copy link
Author

lazycal commented May 17, 2022

@nvpohanh Yeah that works. I just realized that I may not be clear: actually the issue is just intended to report a bug and no urgency is needed (since both the unsqueeze one and your solution work as workarounds). Sorry if I didn't make it clear :-).

@lazycal lazycal changed the title --loadInputs not working:input name not found [Bug] --loadInputs not working:input name not found May 17, 2022
@lazycal lazycal changed the title [Bug] --loadInputs not working:input name not found [Bug] --loadInputs not working: input name mismatch when Flatten is the input node May 17, 2022
@nvpohanh
Copy link
Collaborator

I see. Thanks for the feedback. We can take a look at why the input tensor name is changed.

One question: does the input tensor name change if the network contain more than one op? Such as a normal ONNX file containing the whole network?

@lazycal
Copy link
Author

lazycal commented May 18, 2022

@nvpohanh Yes.
image
For example, I add ReLU and the input name still got changed (to the input name of ReLU). To reproduce, use the following code:

import numpy as np
import onnx
from onnx import helper
from onnx import AttributeProto, TensorProto, GraphProto
import struct
N = 2

i0 = helper.make_tensor_value_info('i0', TensorProto.FLOAT, [N])
o0 = helper.make_tensor_value_info('o0', TensorProto.FLOAT, [1, N])
values = np.array([0], dtype=np.int64)

const = onnx.helper.make_node(
    'Constant',
    inputs=[],
    outputs=['axes'],
    value=onnx.helper.make_tensor(
        name='const_tensor',
        data_type=onnx.TensorProto.INT64,
        dims=values.shape,
        vals=values,
    ),
)

node = onnx.helper.make_node(
    'Flatten',
    inputs=['i0'],
    outputs=['x'],
    axis=0,
)
# node = helper.make_node(
#     'Unsqueeze',
#     inputs=['i0', 'axes'],
#     outputs=['o0'],
# )
relu_node = helper.make_node(
    'Relu',
    inputs=['x'],
    outputs=['o0'],
)

graph_def = helper.make_graph(
    [node, relu_node],        # nodes
    'test-model',      # name
    [i0],  # inputs
    [o0],               # outputs
)


# Create the model (ModelProto)
model_def = helper.make_model(graph_def, producer_name='onnx-example')

print('The model is:\n{}'.format(model_def))
onnx.checker.check_model(model_def, full_check=True)
print('The model is checked!')
onnx.save(model_def, "output.onnx")

with open("id.bin", "wb") as f:
    for i in range(N):
        d = float(i)
        f.write(struct.pack('<f', d))
from subprocess import check_call
check_call(
    'trtexec --onnx=output.onnx --saveEngine=output.engine --buildOnly', shell=True)

import tensorrt as trt
logger = trt.Logger(trt.Logger.VERBOSE)

with open("output.engine", "rb") as f:
    serialized_engine = f.read()

with trt.Runtime(logger) as runtime:
    engine = runtime.deserialize_cuda_engine(serialized_engine)
    print(engine.get_binding_name(0))  # print out the input tensor name

I also tried changing N or the rank of i0 to 2D, and the issue also occurs in both cases.

@nvpohanh nvpohanh added the bug Something isn't working label May 19, 2022
@nvpohanh
Copy link
Collaborator

Internal bug tracking id: 3660435

We are investigating the issue. Will keep you updated. Thanks

@nvpohanh
Copy link
Collaborator

nvpohanh commented Jul 6, 2022

This has been fixed internally and will be available in next TRT release. Thanks for reporting this issue!

@lazycal lazycal closed this as completed Jul 6, 2022
@ghost
Copy link

ghost commented Dec 7, 2022

Hi, which version of TensorRT will fix this? I met a similar problem in TensorRT 8.5.1.7

@nvpohanh
Copy link
Collaborator

nvpohanh commented Dec 7, 2022

This was supposed to be fixed in TRT 8.5.1.7. @zhangtaoshan could you share your ONNX file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants