Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on const+prelu+reduce_mean+comparison_op #1738

Closed
lazycal opened this issue Jan 17, 2022 · 4 comments
Closed

Segfault on const+prelu+reduce_mean+comparison_op #1738

lazycal opened this issue Jan 17, 2022 · 4 comments
Labels
bug Something isn't working Myelin triaged Issue has been triaged by maintainers

Comments

@lazycal
Copy link

lazycal commented Jan 17, 2022

Description

image

Running trtexec --onnx=./output.onnx --verbose gets segfault on the above model. The model file is attached, and can be generated with pytorch:

import numpy as np
import onnx
import onnx.checker


inputs = {
    # 'i1': np.random.rand(1, 1, 1, 1).astype(np.float32),
    # 'i2': np.random.rand(1, 1, 1, 1).astype(np.float32),
}
import torch


class Model(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.prelu = torch.nn.PReLU()
        self.c = torch.nn.Parameter(
            torch.rand(1, 1, 1, 1), requires_grad=False)

    @torch.no_grad()
    def forward(self):
        o10 = self.prelu(self.c)
        o10 = o10.mean(axis=0)
        return (o10 > 0,)


model = Model()
model.eval()
torch_inp = {k: torch.from_numpy(v) for k, v in inputs.items()}
torch.onnx.export(
    model, tuple(torch_inp.values()),
    "output.onnx", input_names=list(inputs.keys()),
    verbose=False, opset_version=10)

onnx_model = onnx.load("output.onnx")
onnx.checker.check_model(onnx_model)

Sorry for the weird-looking model. This is the minimal neural net I can craft to trigger this bug, so it looks a bit strange. I tried a few combinations and it seems that the root cause is const+prelu+reduce_mean with some comparison_op around in the graph. For instance, this variant also triggers segfault:

image

This is the pytorch code:

import numpy as np
import onnx
import onnx.checker


inputs = {
    'i1': np.random.rand(1, 1, 1, 1).astype(np.float32),
    # 'i2': np.random.rand(1, 1, 1, 1).astype(np.float32),
}

import torch


class Model(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.prelu = torch.nn.PReLU()
        self.c = torch.nn.Parameter(
            torch.rand(1, 1, 1, 1), requires_grad=False)

    @torch.no_grad()
    def forward(self, i1):
        o10 = self.prelu(self.c)
        o10 = o10.mean(axis=0)
        return (i1 > 0) * o10


model = Model()
model.eval()
torch_inp = {k: torch.from_numpy(v) for k, v in inputs.items()}
torch.onnx.export(
    model, tuple(torch_inp.values()),
    "output.onnx", input_names=list(inputs.keys()),
    verbose=False, opset_version=10)

onnx_model = onnx.load("output.onnx")
onnx.checker.check_model(onnx_model)

Environment

TensorRT Version: 8.2.1.8
NVIDIA GPU: RTX 2080
NVIDIA Driver Version: 495.29.05
CUDA Version: 11.5
CUDNN Version: 8.3.0
Operating System: Ubuntu 18.04
Python Version (if applicable): 3.8.10
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 1.10.1
Baremetal or Container (if so, version): Container

Relevant Files

Archive.zip

Steps To Reproduce

Run trtexec --onnx=./output.onnx --verbose where output.onnx is in the attached file. I am pasting the output for the first model here:

&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # trtexec --onnx=./output.onnx --verbose
[01/17/2022-21:44:15] [I] === Model Options ===
[01/17/2022-21:44:15] [I] Format: ONNX
[01/17/2022-21:44:15] [I] Model: ./output.onnx
[01/17/2022-21:44:15] [I] Output:
[01/17/2022-21:44:15] [I] === Build Options ===
[01/17/2022-21:44:15] [I] Max batch: explicit batch
[01/17/2022-21:44:15] [I] Workspace: 16 MiB
[01/17/2022-21:44:15] [I] minTiming: 1
[01/17/2022-21:44:15] [I] avgTiming: 8
[01/17/2022-21:44:15] [I] Precision: FP32
[01/17/2022-21:44:15] [I] Calibration: 
[01/17/2022-21:44:15] [I] Refit: Disabled
[01/17/2022-21:44:15] [I] Sparsity: Disabled
[01/17/2022-21:44:15] [I] Safe mode: Disabled
[01/17/2022-21:44:15] [I] DirectIO mode: Disabled
[01/17/2022-21:44:15] [I] Restricted mode: Disabled
[01/17/2022-21:44:15] [I] Save engine: 
[01/17/2022-21:44:15] [I] Load engine: 
[01/17/2022-21:44:15] [I] Profiling verbosity: 0
[01/17/2022-21:44:15] [I] Tactic sources: Using default tactic sources
[01/17/2022-21:44:15] [I] timingCacheMode: local
[01/17/2022-21:44:15] [I] timingCacheFile: 
[01/17/2022-21:44:15] [I] Input(s)s format: fp32:CHW
[01/17/2022-21:44:15] [I] Output(s)s format: fp32:CHW
[01/17/2022-21:44:15] [I] Input build shapes: model
[01/17/2022-21:44:15] [I] Input calibration shapes: model
[01/17/2022-21:44:15] [I] === System Options ===
[01/17/2022-21:44:15] [I] Device: 0
[01/17/2022-21:44:15] [I] DLACore: 
[01/17/2022-21:44:15] [I] Plugins:
[01/17/2022-21:44:15] [I] === Inference Options ===
[01/17/2022-21:44:15] [I] Batch: Explicit
[01/17/2022-21:44:15] [I] Input inference shapes: model
[01/17/2022-21:44:15] [I] Iterations: 10
[01/17/2022-21:44:15] [I] Duration: 3s (+ 200ms warm up)
[01/17/2022-21:44:15] [I] Sleep time: 0ms
[01/17/2022-21:44:15] [I] Idle time: 0ms
[01/17/2022-21:44:15] [I] Streams: 1
[01/17/2022-21:44:15] [I] ExposeDMA: Disabled
[01/17/2022-21:44:15] [I] Data transfers: Enabled
[01/17/2022-21:44:15] [I] Spin-wait: Disabled
[01/17/2022-21:44:15] [I] Multithreading: Disabled
[01/17/2022-21:44:15] [I] CUDA Graph: Disabled
[01/17/2022-21:44:15] [I] Separate profiling: Disabled
[01/17/2022-21:44:15] [I] Time Deserialize: Disabled
[01/17/2022-21:44:15] [I] Time Refit: Disabled
[01/17/2022-21:44:15] [I] Skip inference: Disabled
[01/17/2022-21:44:15] [I] Inputs:
[01/17/2022-21:44:15] [I] === Reporting Options ===
[01/17/2022-21:44:15] [I] Verbose: Enabled
[01/17/2022-21:44:15] [I] Averages: 10 inferences
[01/17/2022-21:44:15] [I] Percentile: 99
[01/17/2022-21:44:15] [I] Dump refittable layers:Disabled
[01/17/2022-21:44:15] [I] Dump output: Disabled
[01/17/2022-21:44:15] [I] Profile: Disabled
[01/17/2022-21:44:15] [I] Export timing to JSON file: 
[01/17/2022-21:44:15] [I] Export output to JSON file: 
[01/17/2022-21:44:15] [I] Export profile to JSON file: 
[01/17/2022-21:44:15] [I] 
[01/17/2022-21:44:15] [I] === Device Information ===
[01/17/2022-21:44:15] [I] Selected Device: NVIDIA GeForce RTX 2080
[01/17/2022-21:44:15] [I] Compute Capability: 7.5
[01/17/2022-21:44:15] [I] SMs: 46
[01/17/2022-21:44:15] [I] Compute Clock Rate: 1.71 GHz
[01/17/2022-21:44:15] [I] Device Global Memory: 7982 MiB
[01/17/2022-21:44:15] [I] Shared Memory per SM: 64 KiB
[01/17/2022-21:44:15] [I] Memory Bus Width: 256 bits (ECC disabled)
[01/17/2022-21:44:15] [I] Memory Clock Rate: 7 GHz
[01/17/2022-21:44:15] [I] 
[01/17/2022-21:44:15] [I] TensorRT version: 8.2.1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::GridAnchorRect_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::NMS_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::Reorg_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::Region_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::Clip_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::LReLU_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::PriorBox_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::Normalize_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::ScatterND version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::RPROI_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::FlattenConcat_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::CropAndResize version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::EfficientNMS_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::EfficientNMS_TFTRT_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::Proposal version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::Split version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[01/17/2022-21:44:15] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
[01/17/2022-21:44:15] [I] [TRT] [MemUsageChange] Init CUDA: CPU +321, GPU +0, now: CPU 333, GPU 265 (MiB)
[01/17/2022-21:44:16] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 333 MiB, GPU 265 MiB
[01/17/2022-21:44:16] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 468 MiB, GPU 299 MiB
[01/17/2022-21:44:16] [I] Start parsing network model
[01/17/2022-21:44:16] [I] [TRT] ----------------------------------------------------------------
[01/17/2022-21:44:16] [I] [TRT] Input filename:   ./output.onnx
[01/17/2022-21:44:16] [I] [TRT] ONNX IR version:  0.0.6
[01/17/2022-21:44:16] [I] [TRT] Opset version:    10
[01/17/2022-21:44:16] [I] [TRT] Producer name:    pytorch
[01/17/2022-21:44:16] [I] [TRT] Producer version: 1.9
[01/17/2022-21:44:16] [I] [TRT] Domain:           
[01/17/2022-21:44:16] [I] [TRT] Model version:    0
[01/17/2022-21:44:16] [I] [TRT] Doc string:       
[01/17/2022-21:44:16] [I] [TRT] ----------------------------------------------------------------
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::GridAnchor_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::GridAnchorRect_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::NMS_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::Reorg_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::Region_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::Clip_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::LReLU_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::PriorBox_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::Normalize_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::ScatterND version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::RPROI_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::BatchedNMS_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::BatchedNMSDynamic_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::FlattenConcat_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::CropAndResize version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::DetectionLayer_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::EfficientNMS_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::EfficientNMS_ONNX_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::EfficientNMS_TFTRT_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::Proposal version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::ProposalLayer_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::PyramidROIAlign_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::ResizeNearest_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::Split version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::SpecialSlice_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Plugin creator already registered - ::InstanceNormalization_TRT version 1
[01/17/2022-21:44:16] [V] [TRT] Importing initializer: c
[01/17/2022-21:44:16] [V] [TRT] Importing initializer: 7
[01/17/2022-21:44:16] [V] [TRT] Parsing node: PRelu_0 [PRelu]
[01/17/2022-21:44:16] [V] [TRT] Searching for input: c
[01/17/2022-21:44:16] [V] [TRT] Searching for input: 7
[01/17/2022-21:44:16] [V] [TRT] PRelu_0 [PRelu] inputs: [c -> (1, 1, 1, 1)[FLOAT]], [7 -> (1, 1, 1)[FLOAT]], 
[01/17/2022-21:44:16] [V] [TRT] Registering layer: c for ONNX node: c
[01/17/2022-21:44:16] [V] [TRT] Registering layer: 7 for ONNX node: 7
[01/17/2022-21:44:16] [V] [TRT] Registering layer: PRelu_0 for ONNX node: PRelu_0
[01/17/2022-21:44:16] [V] [TRT] Registering tensor: 3 for ONNX tensor: 3
[01/17/2022-21:44:16] [V] [TRT] PRelu_0 [PRelu] outputs: [3 -> (1, 1, 1, 1)[FLOAT]], 
[01/17/2022-21:44:16] [V] [TRT] Parsing node: ReduceMean_1 [ReduceMean]
[01/17/2022-21:44:16] [V] [TRT] Searching for input: 3
[01/17/2022-21:44:16] [V] [TRT] ReduceMean_1 [ReduceMean] inputs: [3 -> (1, 1, 1, 1)[FLOAT]], 
[01/17/2022-21:44:16] [V] [TRT] Registering layer: ReduceMean_1 for ONNX node: ReduceMean_1
[01/17/2022-21:44:16] [V] [TRT] Registering tensor: 4 for ONNX tensor: 4
[01/17/2022-21:44:16] [V] [TRT] ReduceMean_1 [ReduceMean] outputs: [4 -> (1, 1, 1)[FLOAT]], 
[01/17/2022-21:44:16] [V] [TRT] Parsing node: Constant_2 [Constant]
[01/17/2022-21:44:16] [V] [TRT] Constant_2 [Constant] inputs: 
[01/17/2022-21:44:16] [V] [TRT] Constant_2 [Constant] outputs: [5 -> ()[FLOAT]], 
[01/17/2022-21:44:16] [V] [TRT] Parsing node: Greater_3 [Greater]
[01/17/2022-21:44:16] [V] [TRT] Searching for input: 4
[01/17/2022-21:44:16] [V] [TRT] Searching for input: 5
[01/17/2022-21:44:16] [V] [TRT] Greater_3 [Greater] inputs: [4 -> (1, 1, 1)[FLOAT]], [5 -> ()[FLOAT]], 
[01/17/2022-21:44:16] [V] [TRT] Registering layer: 5 for ONNX node: 5
[01/17/2022-21:44:16] [V] [TRT] Registering layer: Greater_3 for ONNX node: Greater_3
[01/17/2022-21:44:16] [V] [TRT] Registering tensor: 6_0 for ONNX tensor: 6
[01/17/2022-21:44:16] [V] [TRT] Greater_3 [Greater] outputs: [6 -> (1, 1, 1)[BOOL]], 
[01/17/2022-21:44:16] [V] [TRT] Marking 6_0 as output: 6
[01/17/2022-21:44:16] [I] Finish parsing network model
[01/17/2022-21:44:16] [V] [TRT] Applying generic optimizations to the graph for inference.
[01/17/2022-21:44:16] [V] [TRT] Original: 8 layers
[01/17/2022-21:44:16] [V] [TRT] After dead-layer removal: 8 layers
[01/17/2022-21:44:16] [V] [TRT] Running: ConstShuffleFusion
[01/17/2022-21:44:16] [V] [TRT] ConstShuffleFusion: Fusing 7 with (Unnamed Layer* 2) [Shuffle]
[01/17/2022-21:44:16] [V] [TRT] Running: ConstShuffleFusion
[01/17/2022-21:44:16] [V] [TRT] ConstShuffleFusion: Fusing 5 with (Unnamed Layer* 6) [Shuffle]
[01/17/2022-21:44:16] [V] [TRT] After Myelin optimization: 1 layers
[01/17/2022-21:44:16] [V] [TRT] Applying ScaleNodes fusions.
[01/17/2022-21:44:16] [V] [TRT] After scale fusion: 1 layers
[01/17/2022-21:44:16] [V] [TRT] After vertical fusions: 1 layers
[01/17/2022-21:44:16] [V] [TRT] After dupe layer removal: 1 layers
[01/17/2022-21:44:16] [V] [TRT] After final dead-layer removal: 1 layers
[01/17/2022-21:44:16] [V] [TRT] After tensor merging: 1 layers
[01/17/2022-21:44:16] [V] [TRT] After concat removal: 1 layers
[01/17/2022-21:44:16] [V] [TRT] Graph construction and optimization completed in 0.000802345 seconds.
[01/17/2022-21:44:16] [V] [TRT] Using cublasLt as a tactic source
[01/17/2022-21:44:16] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +508, GPU +222, now: CPU 976, GPU 521 (MiB)
[01/17/2022-21:44:16] [V] [TRT] Using cuDNN as a tactic source
[01/17/2022-21:44:16] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +113, GPU +52, now: CPU 1089, GPU 573 (MiB)
[01/17/2022-21:44:16] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[01/17/2022-21:44:16] [V] [TRT] Constructing optimization profile number 0 [1/1].
[01/17/2022-21:44:16] [V] [TRT] Reserving memory for activation tensors. Host: 0 bytes Device: 4 bytes
[01/17/2022-21:44:16] [V] [TRT] =============== Computing reformatting costs
[01/17/2022-21:44:16] [V] [TRT] =============== Computing costs for 
[01/17/2022-21:44:16] [V] [TRT] *************** Autotuning format combination:  -> Bool(1,1,1) ***************
[01/17/2022-21:44:16] [V] [TRT] --------------- Timing Runner: {ForeignNode[c...Greater_3]} (Myelin)
@zerollzeng
Copy link
Collaborator

I can reproduce this, will check it further

@ttyio ttyio added triaged Issue has been triaged by maintainers Myelin bug Something isn't working labels Jan 24, 2022
@ttyio
Copy link
Collaborator

ttyio commented Jan 24, 2022

Internal bug filed, thanks @zerollzeng @lazycal

@nvpohanh
Copy link
Collaborator

nvpohanh commented Jul 1, 2022

@ttyio Has this been fixed in TRT 8.4?

@ttyio
Copy link
Collaborator

ttyio commented Aug 5, 2022

Yes, it is fixed, closing and thanks all!

@ttyio ttyio closed this as completed Aug 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Myelin triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants