Compiling yolov5 #253

Ownmarc · 2021-03-23T20:57:37Z

Hey, I am looking to run yolov5 https://github.com/ultralytics/yolov5 model on an inf1 instance for inference.

I am first trying to get the original Coco model to compile but hitting the following error. I have followed many aws tutorials (yolov4 and resnet) and trying to compile on a c5-xlarge instance (4 cpu with 8gb of ram) using the ubuntu 18 DLAMI in the aws_neuron_pytorch_p36 python env.

One thing I noticed is that the neuron-cc requires numpy <= 1.18.4 while yolov5 requires numpy >= 1.18.5 I first made sure the model would run correctly by updating numpy to 1.18.5 and then downgraded numpy to 1.18.4 per neuron-cc requirement before compiling/converting the model.

Not exactly sure where to look to debug this (if at all possible) and would welcome any hint.

fake_image = torch.zeros([1, 3, 608, 608], dtype=torch.float32)
here is the output of torch.neuron.analyze_model(model, example_inputs=[fake_image])

/home/ubuntu/yolov5/models/yolo.py:48: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.grid[i].shape[2:4] != x[i].shape[2:4]:
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:940: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
  _force_outplace,
INFO:Neuron:The following operations are currently supported in torch-neuron for this model:
INFO:Neuron:prim::TupleConstruct
INFO:Neuron:aten::permute
INFO:Neuron:aten::slice
INFO:Neuron:prim::Constant
INFO:Neuron:prim::ListConstruct
INFO:Neuron:aten::pow
INFO:Neuron:aten::max_pool2d
INFO:Neuron:aten::upsample_nearest2d
INFO:Neuron:aten::Int
INFO:Neuron:aten::mul
INFO:Neuron:aten::_convolution
INFO:Neuron:prim::NumToTensor
INFO:Neuron:aten::sub
INFO:Neuron:aten::sigmoid
INFO:Neuron:aten::silu
INFO:Neuron:prim::TupleUnpack
INFO:Neuron:aten::expand
INFO:Neuron:aten::contiguous
INFO:Neuron:aten::copy_
INFO:Neuron:aten::size
INFO:Neuron:aten::view
INFO:Neuron:aten::cat
INFO:Neuron:aten::select
INFO:Neuron:aten::add
INFO:Neuron:100.00% of all operations (including primitives) (2369 of 2369) are supported
INFO:Neuron:100.00% of arithmetic operations (304 of 304) are supported

and then I run the compiling function model_neuron = torch.neuron.trace(model, example_inputs=[fake_image], compiler_args="-O2") that gives an error with the following trace :

/home/ubuntu/yolov5/models/yolo.py:48: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.grid[i].shape[2:4] != x[i].shape[2:4]:
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:940: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
  _force_outplace,
INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 304, fused = 304, percent fused = 100.0%
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:779: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
  name, func, example_inputs, var_lookup_fn, strict, _force_outplace
INFO:Neuron:Compiler args type is <class 'str'> value is -O2
INFO:Neuron:compiling function _NeuronGraph$1842 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/bin/neuron-cc compile /tmp/tmp274rqrqq/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp274rqrqq/graph_def.neff --io-config {"inputs": {"tensor.1:0": [[1, 3, 608, 608], "float32"]}, "outputs": ["concat_14:0"]} -O2 --verbose 35'
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph#0(%[790] : torch.float32(1, 3, 608, 608)):
  Focus#50:
    %[./6] : torch.float32(1, 3, 304, 608) = ./aten::slice#5(%[790])
    %[./12] : torch.float32(1, 3, 304, 304) = ./aten::slice#10(%[./6])
    %[./17] : torch.float32(1, 3, 304, 608) = ./aten::slice#15(%[790])
    %[./22] : torch.float32(1, 3, 304, 304) = ./aten::slice#20(%[./17])
    %[./27] : torch.float32(1, 3, 304, 608) = ./aten::slice#25(%[790])
    %[./32] : torch.float32(1, 3, 304, 304) = ./aten::slice#30(%[./27])
    %[./37] : torch.float32(1, 3, 304, 608) = ./aten::slice#35(%[790])
    %[./42] : torch.float32(1, 3, 304, 304) = ./aten::slice#40(%[./37])
    %[./45] : torch.float32(1, 12, 304, 304) = ./aten::cat#43()
  Focus#50/Conv#44/Conv2d#2:
        %[Focus#50/Conv#44/6] : torch.float32(1, 48, 304, 304) = ./aten::_convolution#20(%[Focus#50/45])
  Focus#50/Conv#44/SiLU#3:
        %[4215] : torch.float32(1, 48, 304, 304) = ./aten::silu_#0(%[Focus#50/Conv#44/6])
  Conv#51/Conv2d#2:
      %[Conv#51/6] : torch.float32(1, 96, 152, 152) = ./aten::_convolution#20(%[4215])
  Conv#51/SiLU#3:
      %[4216] : torch.float32(1, 96, 152, 152) = ./aten::silu_#0(%[Conv#51/6])
  C3#52/Conv#4/Conv2d#2:
        %[C3#52/Conv#4/6] : torch.float32(1, 48, 152, 152) = ./aten::_convolution#20(%[4216])
  C3#52/Conv#4/SiLU#3:
        %[C3#52/13] : torch.float32(1, 48, 152, 152) = ./aten::silu_#0(%[C3#52/Conv#4/6])
  C3#52/Sequential#5/Bottleneck#2/Conv#2/Conv2d#2:
            %[C3#52/Sequential#5/Bottleneck#2/Conv#2/6] : torch.float32(1, 48, 152, 152) = ./aten::_convolution#20(%[C3#52/13])
  C3#52/Sequential#5/Bottleneck#2/Conv#2/SiLU#3:
            %[C3#52/Sequential#5/Bottleneck#2/8] : torch.float32(1, 48, 152, 152) = ./aten::silu_#0(%[C3#52/Sequential#5/Bottleneck#2/Conv#2/6])
  C3#52/Sequential#5/Bottleneck#2/Conv#3/Conv2d#2:
            %[C3#52/Sequential#5/Bottleneck#2/Conv#3/6] : torch.float32(1, 48, 152, 152) = ./aten::_convolution#20(%[C3#52/Sequential#5/Bottleneck#2/8])
  C3#52/Sequential#5/Bottleneck#2/Conv#3/SiLU#3:
            %[C3#52/Sequential#5/Bottleneck#2/9] : torch.float32(1, 48, 152, 152) = ./aten::silu_#0(%[C3#52/Sequential#5/Bottleneck#2/Conv#3/6])
  C3#52/Sequential#5/Bottleneck#2:
        %[C3#52/Sequential#5/6] : torch.float32(1, 48, 152, 152) = ./aten::add#5(%[C3#52/13], %[./9])
  C3#52/Sequential#5/Bottleneck#3/Conv#2/Conv2d#2:
            %[C3#52/Sequential#5/Bottleneck#3/Conv#2/6] : torch.float32(1, 48, 152, 152) = ./aten::_convolution#20(%[C3#52/Sequential#5/6])
  C3#52/Sequential#5/Bottleneck#3/Conv#2/SiLU#3:
            %[C3#52/Sequential#5/Bottleneck#3/8] : torch.float32(1, 48, 152, 152) = ./aten::silu_#0(%[C3#52/Sequential#5/Bottleneck#3/Conv#2/6])
  C3#52/Sequential#5/Bottleneck#3/Conv#3/Conv2d#2:
            %[C3#52/Sequential#5/Bottleneck#3/Conv#3/6] : torch.float32(1, 48, 152, 152) = ./aten::_convolution#20(%[C3#52/Sequential#5/Bottleneck#3/8])
  C3#52/Sequential#5/Bottleneck#3/Conv#3/SiLU#3:
            %[C3#52/Sequential#5/Bottleneck#3/9] : torch.float32(1, 48, 152, 152) = ./aten::silu_#0(%[C3#52/Sequential#5/Bottleneck#3/Conv#3/6])
  C3#52/Sequential#5/Bottleneck#3:
        %[C3#52/14] : torch.float32(1, 48, 152, 152) = ./aten::add#5(%[C3#52/Sequential#5/6], %[./9])
  C3#52/Conv#6/Conv2d#2:
        %[C3#52/Conv#6/6] : torch.float32(1, 48, 152, 152) = ./aten::_convolution#20(%[4216])
  C3#52/Conv#6/SiLU#3:
        %[C3#52/15] : torch.float32(1, 48, 152, 152) = ./aten::silu_#0(%[C3#52/Conv#6/6])
  C3#52:
    %[./11] : torch.float32(1, 96, 152, 152) = ./aten::cat#9()
  C3#52/Conv#10/Conv2d#2:
        %[C3#52/Conv#10/6] : torch.float32(1, 96, 152, 152) = ./aten::_convolution#20(%[C3#52/11])
  C3#52/Conv#10/SiLU#3:
        %[4217] : torch.float32(1, 96, 152, 152) = ./aten::silu_#0(%[C3#52/Conv#10/6])
  Conv#53/Conv2d#2:
      %[Conv#53/6] : torch.float32(1, 192, 76, 76) = ./aten::_convolution#20(%[4217])
  Conv#53/SiLU#3:
      %[4218] : torch.float32(1, 192, 76, 76) = ./aten::silu_#0(%[Conv#53/6])
  C3#54/Conv#4/Conv2d#2:
        %[C3#54/Conv#4/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[4218])
  C3#54/Conv#4/SiLU#3:
        %[C3#54/13] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Conv#4/6])
  C3#54/Sequential#5/Bottleneck#6/Conv#2/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#6/Conv#2/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/13])
  C3#54/Sequential#5/Bottleneck#6/Conv#2/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#6/8] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#6/Conv#2/6])
  C3#54/Sequential#5/Bottleneck#6/Conv#3/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#6/Conv#3/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/Bottleneck#6/8])
  C3#54/Sequential#5/Bottleneck#6/Conv#3/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#6/9] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#6/Conv#3/6])
  C3#54/Sequential#5/Bottleneck#6:
        %[C3#54/Sequential#5/14] : torch.float32(1, 96, 76, 76) = ./aten::add#5(%[C3#54/13], %[./9])
  C3#54/Sequential#5/Bottleneck#7/Conv#2/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#7/Conv#2/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/14])
  C3#54/Sequential#5/Bottleneck#7/Conv#2/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#7/8] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#7/Conv#2/6])
  C3#54/Sequential#5/Bottleneck#7/Conv#3/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#7/Conv#3/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/Bottleneck#7/8])
  C3#54/Sequential#5/Bottleneck#7/Conv#3/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#7/9] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#7/Conv#3/6])
  C3#54/Sequential#5/Bottleneck#7:
        %[C3#54/Sequential#5/15] : torch.float32(1, 96, 76, 76) = ./aten::add#5(%[C3#54/Sequential#5/14], %[./9])
  C3#54/Sequential#5/Bottleneck#8/Conv#2/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#8/Conv#2/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/15])
  C3#54/Sequential#5/Bottleneck#8/Conv#2/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#8/8] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#8/Conv#2/6])
  C3#54/Sequential#5/Bottleneck#8/Conv#3/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#8/Conv#3/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/Bottleneck#8/8])
  C3#54/Sequential#5/Bottleneck#8/Conv#3/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#8/9] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#8/Conv#3/6])
  C3#54/Sequential#5/Bottleneck#8:
        %[C3#54/Sequential#5/16] : torch.float32(1, 96, 76, 76) = ./aten::add#5(%[C3#54/Sequential#5/15], %[./9])
  C3#54/Sequential#5/Bottleneck#9/Conv#2/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#9/Conv#2/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/16])
  C3#54/Sequential#5/Bottleneck#9/Conv#2/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#9/8] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#9/Conv#2/6])
  C3#54/Sequential#5/Bottleneck#9/Conv#3/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#9/Conv#3/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/Bottleneck#9/8])
  C3#54/Sequential#5/Bottleneck#9/Conv#3/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#9/9] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#9/Conv#3/6])
  C3#54/Sequential#5/Bottleneck#9:
        %[C3#54/Sequential#5/17] : torch.float32(1, 96, 76, 76) = ./aten::add#5(%[C3#54/Sequential#5/16], %[./9])
  C3#54/Sequential#5/Bottleneck#10/Conv#2/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#10/Conv#2/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/17])
  C3#54/Sequential#5/Bottleneck#10/Conv#2/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#10/8] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#10/Conv#2/6])
  C3#54/Sequential#5/Bottleneck#10/Conv#3/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#10/Conv#3/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/Bottleneck#10/8])
  C3#54/Sequential#5/Bottleneck#10/Conv#3/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#10/9] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#10/Conv#3/6])
  C3#54/Sequential#5/Bottleneck#10:
        %[C3#54/Sequential#5/18] : torch.float32(1, 96, 76, 76) = ./aten::add#5(%[C3#54/Sequential#5/17], %[./9])
  C3#54/Sequential#5/Bottleneck#11/Conv#2/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#11/Conv#2/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/18])
  C3#54/Sequential#5/Bottleneck#11/Conv#2/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#11/8] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#11/Conv#2/6])
  C3#54/Sequential#5/Bottleneck#11/Conv#3/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#11/Conv#3/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/Bottleneck#11/8])
  C3#54/Sequential#5/Bottleneck#11/Conv#3/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#11/9] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#11/Conv#3/6])
  C3#54/Sequential#5/Bottleneck#11:
        %[C3#54/14] : torch.float32(1, 96, 76, 76) = ./aten::add#5(%[C3#54/Sequential#5/18], %[./9])
  C3#54/Conv#6/Conv2d#2:
        %[C3#54/Conv#6/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[4218])
  C3#54/Conv#6/SiLU#3:
        %[C3#54/15] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Conv#6/6])
  C3#54:
    %[./11] : torch.float32(1, 192, 76, 76) = ./aten::cat#9()
  C3#54/Conv#10/Conv2d#2:
        %[C3#54/Conv#10/6] : torch.float32(1, 192, 76, 76) = ./aten::_convolution#20(%[C3#54/11])
  C3#54/Conv#10/SiLU#3:
        %[4219] : torch.float32(1, 192, 76, 76) = ./aten::silu_#0(%[C3#54/Conv#10/6])
  Conv#55/Conv2d#2:
      %[Conv#55/6] : torch.float32(1, 384, 38, 38) = ./aten::_convolution#20(%[4219])
  Conv#55/SiLU#3:
      %[4220] : torch.float32(1, 384, 38, 38) = ./aten::silu_#0(%[Conv#55/6])
  C3#56/Conv#4/Conv2d#2:
        %[C3#56/Conv#4/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[4220])
  C3#56/Conv#4/SiLU#3:
        %[C3#56/13] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Conv#4/6])
  C3#56/Sequential#5/Bottleneck#6/Conv#2/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#6/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/13])
  C3#56/Sequential#5/Bottleneck#6/Conv#2/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#6/8] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#6/Conv#2/6])
  C3#56/Sequential#5/Bottleneck#6/Conv#3/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#6/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/Bottleneck#6/8])
  C3#56/Sequential#5/Bottleneck#6/Conv#3/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#6/9] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#6/Conv#3/6])
  C3#56/Sequential#5/Bottleneck#6:
        %[C3#56/Sequential#5/14] : torch.float32(1, 192, 38, 38) = ./aten::add#5(%[C3#56/13], %[./9])
  C3#56/Sequential#5/Bottleneck#7/Conv#2/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#7/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/14])
  C3#56/Sequential#5/Bottleneck#7/Conv#2/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#7/8] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#7/Conv#2/6])
  C3#56/Sequential#5/Bottleneck#7/Conv#3/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#7/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/Bottleneck#7/8])
  C3#56/Sequential#5/Bottleneck#7/Conv#3/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#7/9] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#7/Conv#3/6])
  C3#56/Sequential#5/Bottleneck#7:
        %[C3#56/Sequential#5/15] : torch.float32(1, 192, 38, 38) = ./aten::add#5(%[C3#56/Sequential#5/14], %[./9])
  C3#56/Sequential#5/Bottleneck#8/Conv#2/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#8/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/15])
  C3#56/Sequential#5/Bottleneck#8/Conv#2/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#8/8] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#8/Conv#2/6])
  C3#56/Sequential#5/Bottleneck#8/Conv#3/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#8/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/Bottleneck#8/8])
  C3#56/Sequential#5/Bottleneck#8/Conv#3/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#8/9] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#8/Conv#3/6])
  C3#56/Sequential#5/Bottleneck#8:
        %[C3#56/Sequential#5/16] : torch.float32(1, 192, 38, 38) = ./aten::add#5(%[C3#56/Sequential#5/15], %[./9])
  C3#56/Sequential#5/Bottleneck#9/Conv#2/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#9/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/16])
  C3#56/Sequential#5/Bottleneck#9/Conv#2/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#9/8] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#9/Conv#2/6])
  C3#56/Sequential#5/Bottleneck#9/Conv#3/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#9/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/Bottleneck#9/8])
  C3#56/Sequential#5/Bottleneck#9/Conv#3/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#9/9] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#9/Conv#3/6])
  C3#56/Sequential#5/Bottleneck#9:
        %[C3#56/Sequential#5/17] : torch.float32(1, 192, 38, 38) = ./aten::add#5(%[C3#56/Sequential#5/16], %[./9])
  C3#56/Sequential#5/Bottleneck#10/Conv#2/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#10/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/17])
  C3#56/Sequential#5/Bottleneck#10/Conv#2/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#10/8] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#10/Conv#2/6])
  C3#56/Sequential#5/Bottleneck#10/Conv#3/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#10/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/Bottleneck#10/8])
  C3#56/Sequential#5/Bottleneck#10/Conv#3/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#10/9] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#10/Conv#3/6])
  C3#56/Sequential#5/Bottleneck#10:
        %[C3#56/Sequential#5/18] : torch.float32(1, 192, 38, 38) = ./aten::add#5(%[C3#56/Sequential#5/17], %[./9])
  C3#56/Sequential#5/Bottleneck#11/Conv#2/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#11/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/18])
  C3#56/Sequential#5/Bottleneck#11/Conv#2/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#11/8] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#11/Conv#2/6])
  C3#56/Sequential#5/Bottleneck#11/Conv#3/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#11/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/Bottleneck#11/8])
  C3#56/Sequential#5/Bottleneck#11/Conv#3/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#11/9] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#11/Conv#3/6])
  C3#56/Sequential#5/Bottleneck#11:
        %[C3#56/14] : torch.float32(1, 192, 38, 38) = ./aten::add#5(%[C3#56/Sequential#5/18], %[./9])
  C3#56/Conv#6/Conv2d#2:
        %[C3#56/Conv#6/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[4220])
  C3#56/Conv#6/SiLU#3:
        %[C3#56/15] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Conv#6/6])
  C3#56:
    %[./11] : torch.float32(1, 384, 38, 38) = ./aten::cat#9()
  C3#56/Conv#10/Conv2d#2:
        %[C3#56/Conv#10/6] : torch.float32(1, 384, 38, 38) = ./aten::_convolution#20(%[C3#56/11])
  C3#56/Conv#10/SiLU#3:
        %[4221] : torch.float32(1, 384, 38, 38) = ./aten::silu_#0(%[C3#56/Conv#10/6])
  Conv#57/Conv2d#2:
      %[Conv#57/6] : torch.float32(1, 768, 19, 19) = ./aten::_convolution#20(%[4221])
  Conv#57/SiLU#3:
      %[4222] : torch.float32(1, 768, 19, 19) = ./aten::silu_#0(%[Conv#57/6])
  SPP#58/Conv#8/Conv2d#2:
        %[SPP#58/Conv#8/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[4222])
  SPP#58/Conv#8/SiLU#3:
        %[SPP#58/18] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[SPP#58/Conv#8/6])
  SPP#58/MaxPool2d#9:
      %[SPP#58/19] : torch.float32(1, 384, 19, 19) = ./aten::max_pool2d#13(%[SPP#58/18])
  SPP#58/MaxPool2d#10:
      %[SPP#58/20] : torch.float32(1, 384, 19, 19) = ./aten::max_pool2d#13(%[SPP#58/18])
  SPP#58/MaxPool2d#11:
      %[SPP#58/21] : torch.float32(1, 384, 19, 19) = ./aten::max_pool2d#13(%[SPP#58/18])
  SPP#58:
    %[./16] : torch.float32(1, 1536, 19, 19) = ./aten::cat#14()
  SPP#58/Conv#15/Conv2d#2:
        %[SPP#58/Conv#15/6] : torch.float32(1, 768, 19, 19) = ./aten::_convolution#20(%[SPP#58/16])
  SPP#58/Conv#15/SiLU#3:
        %[4223] : torch.float32(1, 768, 19, 19) = ./aten::silu_#0(%[SPP#58/Conv#15/6])
  C3#59/Conv#4/Conv2d#2:
        %[C3#59/Conv#4/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[4223])
  C3#59/Conv#4/SiLU#3:
        %[C3#59/13] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#59/Conv#4/6])
  C3#59/Sequential#5/Bottleneck#2/Conv#2/Conv2d#2:
            %[C3#59/Sequential#5/Bottleneck#2/Conv#2/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[C3#59/13])
  C3#59/Sequential#5/Bottleneck#2/Conv#2/SiLU#3:
            %[C3#59/Sequential#5/Bottleneck#2/6] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#59/Sequential#5/Bottleneck#2/Conv#2/6])
  C3#59/Sequential#5/Bottleneck#2/Conv#3/Conv2d#2:
            %[C3#59/Sequential#5/Bottleneck#2/Conv#3/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[C3#59/Sequential#5/Bottleneck#2/6])
  C3#59/Sequential#5/Bottleneck#2/Conv#3/SiLU#3:
            %[C3#59/Sequential#5/6] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#59/Sequential#5/Bottleneck#2/Conv#3/6])
  C3#59/Sequential#5/Bottleneck#3/Conv#2/Conv2d#2:
            %[C3#59/Sequential#5/Bottleneck#3/Conv#2/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[C3#59/Sequential#5/6])
  C3#59/Sequential#5/Bottleneck#3/Conv#2/SiLU#3:
            %[C3#59/Sequential#5/Bottleneck#3/6] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#59/Sequential#5/Bottleneck#3/Conv#2/6])
  C3#59/Sequential#5/Bottleneck#3/Conv#3/Conv2d#2:
            %[C3#59/Sequential#5/Bottleneck#3/Conv#3/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[C3#59/Sequential#5/Bottleneck#3/6])
  C3#59/Sequential#5/Bottleneck#3/Conv#3/SiLU#3:
            %[C3#59/14] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#59/Sequential#5/Bottleneck#3/Conv#3/6])
  C3#59/Conv#6/Conv2d#2:
        %[C3#59/Conv#6/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[4223])
  C3#59/Conv#6/SiLU#3:
        %[C3#59/15] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#59/Conv#6/6])
  C3#59:
    %[./11] : torch.float32(1, 768, 19, 19) = ./aten::cat#9()
  C3#59/Conv#10/Conv2d#2:
        %[C3#59/Conv#10/6] : torch.float32(1, 768, 19, 19) = ./aten::_convolution#20(%[C3#59/11])
  C3#59/Conv#10/SiLU#3:
        %[4224] : torch.float32(1, 768, 19, 19) = ./aten::silu_#0(%[C3#59/Conv#10/6])
  Conv#60/Conv2d#2:
      %[Conv#60/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[4224])
  Conv#60/SiLU#3:
      %[4225] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[Conv#60/6])
  Upsample#61:
    %[4226] : torch.float32(1, 384, 38, 38) = ./aten::upsample_nearest2d#4(%[4225])
  Concat#62:
    %[4227] : torch.float32(1, 768, 38, 38) = ./aten::cat#2()
  C3#63/Conv#4/Conv2d#2:
        %[C3#63/Conv#4/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[4227])
  C3#63/Conv#4/SiLU#3:
        %[C3#63/13] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#63/Conv#4/6])
  C3#63/Sequential#5/Bottleneck#2/Conv#2/Conv2d#2:
            %[C3#63/Sequential#5/Bottleneck#2/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#63/13])
  C3#63/Sequential#5/Bottleneck#2/Conv#2/SiLU#3:
            %[C3#63/Sequential#5/Bottleneck#2/6] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#63/Sequential#5/Bottleneck#2/Conv#2/6])
  C3#63/Sequential#5/Bottleneck#2/Conv#3/Conv2d#2:
            %[C3#63/Sequential#5/Bottleneck#2/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#63/Sequential#5/Bottleneck#2/6])
  C3#63/Sequential#5/Bottleneck#2/Conv#3/SiLU#3:
            %[C3#63/Sequential#5/6] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#63/Sequential#5/Bottleneck#2/Conv#3/6])
  C3#63/Sequential#5/Bottleneck#3/Conv#2/Conv2d#2:
            %[C3#63/Sequential#5/Bottleneck#3/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#63/Sequential#5/6])
  C3#63/Sequential#5/Bottleneck#3/Conv#2/SiLU#3:
            %[C3#63/Sequential#5/Bottleneck#3/6] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#63/Sequential#5/Bottleneck#3/Conv#2/6])
  C3#63/Sequential#5/Bottleneck#3/Conv#3/Conv2d#2:
            %[C3#63/Sequential#5/Bottleneck#3/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#63/Sequential#5/Bottleneck#3/6])
  C3#63/Sequential#5/Bottleneck#3/Conv#3/SiLU#3:
            %[C3#63/14] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#63/Sequential#5/Bottleneck#3/Conv#3/6])
  C3#63/Conv#6/Conv2d#2:
        %[C3#63/Conv#6/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[4227])
  C3#63/Conv#6/SiLU#3:
        %[C3#63/15] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#63/Conv#6/6])
  C3#63:
    %[./11] : torch.float32(1, 384, 38, 38) = ./aten::cat#9()
  C3#63/Conv#10/Conv2d#2:
        %[C3#63/Conv#10/6] : torch.float32(1, 384, 38, 38) = ./aten::_convolution#20(%[C3#63/11])
  C3#63/Conv#10/SiLU#3:
        %[4228] : torch.float32(1, 384, 38, 38) = ./aten::silu_#0(%[C3#63/Conv#10/6])
  Conv#64/Conv2d#2:
      %[Conv#64/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[4228])
  Conv#64/SiLU#3:
      %[4229] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[Conv#64/6])
  Upsample#65:
    %[4230] : torch.float32(1, 192, 76, 76) = ./aten::upsample_nearest2d#4(%[4229])
  Concat#66:
    %[4231] : torch.float32(1, 384, 76, 76) = ./aten::cat#2()
  C3#67/Conv#4/Conv2d#2:
        %[C3#67/Conv#4/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[4231])
  C3#67/Conv#4/SiLU#3:
        %[C3#67/13] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#67/Conv#4/6])
  C3#67/Sequential#5/Bottleneck#2/Conv#2/Conv2d#2:
            %[C3#67/Sequential#5/Bottleneck#2/Conv#2/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#67/13])
  C3#67/Sequential#5/Bottleneck#2/Conv#2/SiLU#3:
            %[C3#67/Sequential#5/Bottleneck#2/6] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#67/Sequential#5/Bottleneck#2/Conv#2/6])
  C3#67/Sequential#5/Bottleneck#2/Conv#3/Conv2d#2:
            %[C3#67/Sequential#5/Bottleneck#2/Conv#3/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#67/Sequential#5/Bottleneck#2/6])
  C3#67/Sequential#5/Bottleneck#2/Conv#3/SiLU#3:
            %[C3#67/Sequential#5/6] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#67/Sequential#5/Bottleneck#2/Conv#3/6])
  C3#67/Sequential#5/Bottleneck#3/Conv#2/Conv2d#2:
            %[C3#67/Sequential#5/Bottleneck#3/Conv#2/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#67/Sequential#5/6])
  C3#67/Sequential#5/Bottleneck#3/Conv#2/SiLU#3:
            %[C3#67/Sequential#5/Bottleneck#3/6] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#67/Sequential#5/Bottleneck#3/Conv#2/6])
  C3#67/Sequential#5/Bottleneck#3/Conv#3/Conv2d#2:
            %[C3#67/Sequential#5/Bottleneck#3/Conv#3/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#67/Sequential#5/Bottleneck#3/6])
  C3#67/Sequential#5/Bottleneck#3/Conv#3/SiLU#3:
            %[C3#67/14] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#67/Sequential#5/Bottleneck#3/Conv#3/6])
  C3#67/Conv#6/Conv2d#2:
        %[C3#67/Conv#6/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[4231])
  C3#67/Conv#6/SiLU#3:
        %[C3#67/15] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#67/Conv#6/6])
  C3#67:
    %[./11] : torch.float32(1, 192, 76, 76) = ./aten::cat#9()
  C3#67/Conv#10/Conv2d#2:
        %[C3#67/Conv#10/6] : torch.float32(1, 192, 76, 76) = ./aten::_convolution#20(%[C3#67/11])
  C3#67/Conv#10/SiLU#3:
        %[4232] : torch.float32(1, 192, 76, 76) = ./aten::silu_#0(%[C3#67/Conv#10/6])
  Conv#68/Conv2d#2:
      %[Conv#68/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[4232])
  Conv#68/SiLU#3:
      %[4233] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[Conv#68/6])
  Concat#69:
    %[4234] : torch.float32(1, 384, 38, 38) = ./aten::cat#2()
  C3#70/Conv#4/Conv2d#2:
        %[C3#70/Conv#4/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[4234])
  C3#70/Conv#4/SiLU#3:
        %[C3#70/13] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#70/Conv#4/6])
  C3#70/Sequential#5/Bottleneck#2/Conv#2/Conv2d#2:
            %[C3#70/Sequential#5/Bottleneck#2/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#70/13])
  C3#70/Sequential#5/Bottleneck#2/Conv#2/SiLU#3:
            %[C3#70/Sequential#5/Bottleneck#2/6] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#70/Sequential#5/Bottleneck#2/Conv#2/6])
  C3#70/Sequential#5/Bottleneck#2/Conv#3/Conv2d#2:
            %[C3#70/Sequential#5/Bottleneck#2/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#70/Sequential#5/Bottleneck#2/6])
  C3#70/Sequential#5/Bottleneck#2/Conv#3/SiLU#3:
            %[C3#70/Sequential#5/6] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#70/Sequential#5/Bottleneck#2/Conv#3/6])
  C3#70/Sequential#5/Bottleneck#3/Conv#2/Conv2d#2:
            %[C3#70/Sequential#5/Bottleneck#3/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#70/Sequential#5/6])
  C3#70/Sequential#5/Bottleneck#3/Conv#2/SiLU#3:
            %[C3#70/Sequential#5/Bottleneck#3/6] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#70/Sequential#5/Bottleneck#3/Conv#2/6])
  C3#70/Sequential#5/Bottleneck#3/Conv#3/Conv2d#2:
            %[C3#70/Sequential#5/Bottleneck#3/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#70/Sequential#5/Bottleneck#3/6])
  C3#70/Sequential#5/Bottleneck#3/Conv#3/SiLU#3:
            %[C3#70/14] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#70/Sequential#5/Bottleneck#3/Conv#3/6])
  C3#70/Conv#6/Conv2d#2:
        %[C3#70/Conv#6/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[4234])
  C3#70/Conv#6/SiLU#3:
        %[C3#70/15] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#70/Conv#6/6])
  C3#70:
    %[./11] : torch.float32(1, 384, 38, 38) = ./aten::cat#9()
  C3#70/Conv#10/Conv2d#2:
        %[C3#70/Conv#10/6] : torch.float32(1, 384, 38, 38) = ./aten::_convolution#20(%[C3#70/11])
  C3#70/Conv#10/SiLU#3:
        %[4235] : torch.float32(1, 384, 38, 38) = ./aten::silu_#0(%[C3#70/Conv#10/6])
  Conv#71/Conv2d#2:
      %[Conv#71/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[4235])
  Conv#71/SiLU#3:
      %[4236] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[Conv#71/6])
  Concat#72:
    %[4237] : torch.float32(1, 768, 19, 19) = ./aten::cat#2()
  C3#73/Conv#4/Conv2d#2:
        %[C3#73/Conv#4/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[4237])
  C3#73/Conv#4/SiLU#3:
        %[C3#73/13] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#73/Conv#4/6])
  C3#73/Sequential#5/Bottleneck#2/Conv#2/Conv2d#2:
            %[C3#73/Sequential#5/Bottleneck#2/Conv#2/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[C3#73/13])
  C3#73/Sequential#5/Bottleneck#2/Conv#2/SiLU#3:
            %[C3#73/Sequential#5/Bottleneck#2/6] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#73/Sequential#5/Bottleneck#2/Conv#2/6])
  C3#73/Sequential#5/Bottleneck#2/Conv#3/Conv2d#2:
            %[C3#73/Sequential#5/Bottleneck#2/Conv#3/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[C3#73/Sequential#5/Bottleneck#2/6])
  C3#73/Sequential#5/Bottleneck#2/Conv#3/SiLU#3:
            %[C3#73/Sequential#5/6] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#73/Sequential#5/Bottleneck#2/Conv#3/6])
  C3#73/Sequential#5/Bottleneck#3/Conv#2/Conv2d#2:
            %[C3#73/Sequential#5/Bottleneck#3/Conv#2/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[C3#73/Sequential#5/6])
  C3#73/Sequential#5/Bottleneck#3/Conv#2/SiLU#3:
            %[C3#73/Sequential#5/Bottleneck#3/6] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#73/Sequential#5/Bottleneck#3/Conv#2/6])
  C3#73/Sequential#5/Bottleneck#3/Conv#3/Conv2d#2:
            %[C3#73/Sequential#5/Bottleneck#3/Conv#3/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[C3#73/Sequential#5/Bottleneck#3/6])
  C3#73/Sequential#5/Bottleneck#3/Conv#3/SiLU#3:
            %[C3#73/14] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#73/Sequential#5/Bottleneck#3/Conv#3/6])
  C3#73/Conv#6/Conv2d#2:
        %[C3#73/Conv#6/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[4237])
  C3#73/Conv#6/SiLU#3:
        %[C3#73/15] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#73/Conv#6/6])
  C3#73:
    %[./11] : torch.float32(1, 768, 19, 19) = ./aten::cat#9()
  C3#73/Conv#10/Conv2d#2:
        %[C3#73/Conv#10/6] : torch.float32(1, 768, 19, 19) = ./aten::_convolution#20(%[C3#73/11])
  C3#73/Conv#10/SiLU#3:
        %[4238] : torch.float32(1, 768, 19, 19) = ./aten::silu_#0(%[C3#73/Conv#10/6])
  Detect#74/Conv2d#7:
      %[Detect#74/433] : torch.float32(1, 255, 76, 76) = ./aten::_convolution#20(%[4232])
  Detect#74:
    %[./11] : 1 = ./aten::size#9(%[./433])
    %[./13] : torch.int32() = ./aten::Int#11()
    %[./14] : torch.int32() = ./aten::Int#12()
    %[./19] : 76 = ./aten::size#14(%[./433])
    %[./21] : torch.int32() = ./aten::Int#16()
    %[./23] : 76 = ./aten::size#18(%[./433])
    %[./25] : torch.int32() = ./aten::Int#20()
    %[./29] : torch.float32(1, 3, 85, 76, 76) = ./aten::view#24(%[./433])
    %[./36] : torch.float32(1, 3, 76, 76, 85) = ./aten::permute#31(%[./29])
    %[./38] : torch.float32(1, 3, 76, 76, 85) = ./aten::contiguous#33(%[./36])
    %[./72] : torch.float32(1, 3, 76, 76, 85) = ./aten::sigmoid#35(%[./38])
    %[./77] : torch.float32(1, 3, 76, 76, 2) = ./aten::slice#40(%[./72])
    %[./79] : torch.float32(1, 3, 76, 76, 2) = ./aten::mul#42(%[./77])
    %[./82] : torch.float32(1, 3, 76, 76, 2) = ./aten::sub#45(%[./79])
    %[./84] : torch.float32(1, 3, 76, 76, 2) = ./aten::add#47(%[./82])
    %[./88] : torch.float32() = ./aten::select#51()
    %[./89] : torch.float32(1, 3, 76, 76, 2) = ./aten::mul#52(%[./84], %[./88])
    %[./94] : torch.float32(1, 3, 76, 76, 2) = ./aten::slice#57(%[./72])
    %[./100] : torch.float32(3, 76, 76, 2) = ./aten::view#63(%[./89])
    %[./108] : torch.float32(1, 3, 76, 76, 2) = ./aten::expand#71(%[./100])
    %[./110] : torch.float32(1, 3, 76, 76, 2) = ./aten::copy_#73(%[./94], %[./108])
    %[./115] : torch.float32(1, 3, 76, 76, 2) = ./aten::slice#78(%[./72])
    %[./117] : torch.float32(1, 3, 76, 76, 2) = ./aten::mul#80(%[./115])
    %[./119] : torch.float32(1, 3, 76, 76, 2) = ./aten::pow#82(%[./117])
    %[./122] : torch.float32(1, 3, 1, 1, 2) = ./aten::select#85()
    %[./123] : torch.float32(1, 3, 76, 76, 2) = ./aten::mul#86(%[./119], %[./122])
    %[./128] : torch.float32(1, 3, 76, 76, 2) = ./aten::slice#91(%[./72])
    %[./134] : torch.float32(3, 76, 76, 2) = ./aten::view#97(%[./123])
    %[./142] : torch.float32(1, 3, 76, 76, 2) = ./aten::expand#105(%[./134])
    %[./144] : torch.float32(1, 3, 76, 76, 2) = ./aten::copy_#107(%[./128], %[./142])
    %[./148] : torch.float32(1, 17328, 85) = ./aten::view#111(%[./72])
  Detect#74/Conv2d#112:
      %[Detect#74/434] : torch.float32(1, 255, 38, 38) = ./aten::_convolution#20(%[4235])
    %[./152] : 1 = ./aten::size#114(%[./434])
    %[./154] : torch.int32() = ./aten::Int#116()
    %[./155] : torch.int32() = ./aten::Int#117()
    %[./160] : 38 = ./aten::size#119(%[./434])
    %[./162] : torch.int32() = ./aten::Int#121()
    %[./164] : 38 = ./aten::size#123(%[./434])
    %[./166] : torch.int32() = ./aten::Int#125()
    %[./170] : torch.float32(1, 3, 85, 38, 38) = ./aten::view#129(%[./434])
    %[./177] : torch.float32(1, 3, 38, 38, 85) = ./aten::permute#136(%[./170])
    %[./179] : torch.float32(1, 3, 38, 38, 85) = ./aten::contiguous#138(%[./177])
    %[./213] : torch.float32(1, 3, 38, 38, 85) = ./aten::sigmoid#140(%[./179])
    %[./218] : torch.float32(1, 3, 38, 38, 2) = ./aten::slice#145(%[./213])
    %[./220] : torch.float32(1, 3, 38, 38, 2) = ./aten::mul#147(%[./218])
    %[./223] : torch.float32(1, 3, 38, 38, 2) = ./aten::sub#150(%[./220])
    %[./225] : torch.float32(1, 3, 38, 38, 2) = ./aten::add#152(%[./223])
    %[./228] : torch.float32() = ./aten::select#155()
    %[./229] : torch.float32(1, 3, 38, 38, 2) = ./aten::mul#156(%[./225], %[./228])
    %[./234] : torch.float32(1, 3, 38, 38, 2) = ./aten::slice#161(%[./213])
    %[./240] : torch.float32(3, 38, 38, 2) = ./aten::view#167(%[./229])
    %[./248] : torch.float32(1, 3, 38, 38, 2) = ./aten::expand#175(%[./240])
    %[./250] : torch.float32(1, 3, 38, 38, 2) = ./aten::copy_#177(%[./234], %[./248])
    %[./255] : torch.float32(1, 3, 38, 38, 2) = ./aten::slice#182(%[./213])
    %[./257] : torch.float32(1, 3, 38, 38, 2) = ./aten::mul#184(%[./255])
    %[./259] : torch.float32(1, 3, 38, 38, 2) = ./aten::pow#186(%[./257])
    %[./262] : torch.float32(1, 3, 1, 1, 2) = ./aten::select#189()
    %[./263] : torch.float32(1, 3, 38, 38, 2) = ./aten::mul#190(%[./259], %[./262])
    %[./268] : torch.float32(1, 3, 38, 38, 2) = ./aten::slice#195(%[./213])
    %[./274] : torch.float32(3, 38, 38, 2) = ./aten::view#201(%[./263])
    %[./282] : torch.float32(1, 3, 38, 38, 2) = ./aten::expand#209(%[./274])
    %[./284] : torch.float32(1, 3, 38, 38, 2) = ./aten::copy_#211(%[./268], %[./282])
    %[./288] : torch.float32(1, 4332, 85) = ./aten::view#215(%[./213])
  Detect#74/Conv2d#216:
      %[Detect#74/435] : torch.float32(1, 255, 19, 19) = ./aten::_convolution#20(%[4238])
    %[./292] : 1 = ./aten::size#218(%[./435])
    %[./294] : torch.int32() = ./aten::Int#220()
    %[./295] : torch.int32() = ./aten::Int#221()
    %[./300] : 19 = ./aten::size#223(%[./435])
    %[./302] : torch.int32() = ./aten::Int#225()
    %[./304] : 19 = ./aten::size#227(%[./435])
    %[./306] : torch.int32() = ./aten::Int#229()
    %[./310] : torch.float32(1, 3, 85, 19, 19) = ./aten::view#233(%[./435])
    %[./317] : torch.float32(1, 3, 19, 19, 85) = ./aten::permute#240(%[./310])
    %[./319] : torch.float32(1, 3, 19, 19, 85) = ./aten::contiguous#242(%[./317])
    %[./353] : torch.float32(1, 3, 19, 19, 85) = ./aten::sigmoid#244(%[./319])
    %[./358] : torch.float32(1, 3, 19, 19, 2) = ./aten::slice#249(%[./353])
    %[./360] : torch.float32(1, 3, 19, 19, 2) = ./aten::mul#251(%[./358])
    %[./363] : torch.float32(1, 3, 19, 19, 2) = ./aten::sub#254(%[./360])
    %[./365] : torch.float32(1, 3, 19, 19, 2) = ./aten::add#256(%[./363])
    %[./368] : torch.float32() = ./aten::select#259()
    %[./369] : torch.float32(1, 3, 19, 19, 2) = ./aten::mul#260(%[./365], %[./368])
    %[./374] : torch.float32(1, 3, 19, 19, 2) = ./aten::slice#265(%[./353])
    %[./380] : torch.float32(3, 19, 19, 2) = ./aten::view#271(%[./369])
    %[./388] : torch.float32(1, 3, 19, 19, 2) = ./aten::expand#279(%[./380])
    %[./390] : torch.float32(1, 3, 19, 19, 2) = ./aten::copy_#281(%[./374], %[./388])
    %[./395] : torch.float32(1, 3, 19, 19, 2) = ./aten::slice#286(%[./353])
    %[./397] : torch.float32(1, 3, 19, 19, 2) = ./aten::mul#288(%[./395])
    %[./399] : torch.float32(1, 3, 19, 19, 2) = ./aten::pow#290(%[./397])
    %[./402] : torch.float32(1, 3, 1, 1, 2) = ./aten::select#293()
    %[./403] : torch.float32(1, 3, 19, 19, 2) = ./aten::mul#294(%[./399], %[./402])
    %[./408] : torch.float32(1, 3, 19, 19, 2) = ./aten::slice#299(%[./353])
    %[./414] : torch.float32(3, 19, 19, 2) = ./aten::view#305(%[./403])
    %[./422] : torch.float32(1, 3, 19, 19, 2) = ./aten::expand#313(%[./414])
    %[./424] : torch.float32(1, 3, 19, 19, 2) = ./aten::copy_#315(%[./408], %[./422])
    %[./428] : torch.float32(1, 1083, 85) = ./aten::view#319(%[./353])
    %[./431] : torch.float32(1, 22743, 85) = ./aten::cat#322()
    %[4239] : (torch.float32(1, 3, 76, 76, 85), torch.float32(1, 3, 38, 38, 85), torch.float32(1, 3, 19, 19, 85), torch.float32(1, 22743, 85)) = ./prim::TupleConstruct#323(%[./38], %[./179], %[./319], %[./431])
  %[4211] : torch.float32(1, 3, 76, 76, 85), %[4212] : torch.float32(1, 3, 38, 38, 85), %[4213] : torch.float32(1, 3, 19, 19, 85), %[4214] : torch.float32(1, 22743, 85) = prim::TupleUnpack#75(%[4239])
  %[3088] : [torch.float32(1, 3, 76, 76, 85), torch.float32(1, 3, 38, 38, 85), torch.float32(1, 3, 19, 19, 85)] = prim::ListConstruct#76(%[4211], %[4212], %[4213])
  %[3089] : (torch.float32(1, 22743, 85), [torch.float32(1, 3, 76, 76, 85), torch.float32(1, 3, 38, 38, 85), torch.float32(1, 3, 19, 19, 85)]) = prim::TupleConstruct#77(%[4214], %[3088])
  return(%[3089] : (torch.float32(1, 22743, 85), [torch.float32(1, 3, 76, 76, 85), torch.float32(1, 3, 38, 38, 85), torch.float32(1, 3, 19, 19, 85)]))
; falling back to native python function call
ERROR:Neuron:3107
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/convert.py", line 448, in _convert_item
    item, inputs, compiler_workdir=sg_workdir, **kwargs)
  File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/decorators.py", line 194, in trace
    return create_runnable(metaneff, neff_ts, jit_trace, example_inputs, preprocessor, postprocessor, output_tensors)
  File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/decorators.py", line 313, in create_runnable
    neuron_trace = torch.jit.trace(neuron_function, example_inputs)
  File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py", line 779, in trace
    name, func, example_inputs, var_lookup_fn, strict, _force_outplace
  File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/decorators.py", line 312, in neuron_function
    return postprocessor(output_tensors)
  File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/decorators.py", line 1129, in __call__
    for value in node.inputs()]
  File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/decorators.py", line 1129, in <listcomp>
    for value in node.inputs()]
KeyError: 3107
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 304, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 12 [supported]
INFO:Neuron: => aten::_convolution: 86 [supported]
INFO:Neuron: => aten::add: 17 [supported]
INFO:Neuron: => aten::cat: 15 [supported]
INFO:Neuron: => aten::contiguous: 3 [supported]
INFO:Neuron: => aten::copy_: 6 [supported]
INFO:Neuron: => aten::expand: 6 [supported]
INFO:Neuron: => aten::max_pool2d: 3 [supported]
INFO:Neuron: => aten::mul: 12 [supported]
INFO:Neuron: => aten::permute: 3 [supported]
INFO:Neuron: => aten::pow: 3 [supported]
INFO:Neuron: => aten::select: 6 [supported]
INFO:Neuron: => aten::sigmoid: 3 [supported]
INFO:Neuron: => aten::silu: 83 [supported]
INFO:Neuron: => aten::size: 9 [supported]
INFO:Neuron: => aten::slice: 20 [supported]
INFO:Neuron: => aten::sub: 3 [supported]
INFO:Neuron: => aten::upsample_nearest2d: 2 [supported]
INFO:Neuron: => aten::view: 12 [supported]

The text was updated successfully, but these errors were encountered:

Ownmarc · 2021-03-24T03:35:50Z

Managed to get some of it to compile by adding the following code :

def subgraph_builder_function(node):
    return 'Detect' not in node.name
model_neuron = torch.neuron.trace(model, example_inputs=[fake_image], subgraph_builder_function=subgraph_builder_function)

But I'm not sure if its the way to go. I tried to figure out what was causing problem with the whole model compilation, but couldn't find it.

Here is the full trace of the model I managed to compile:

/home/ubuntu/yolov5/models/yolo.py:48: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.grid[i].shape[2:4] != x[i].shape[2:4]:
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:940: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
  _force_outplace,
INFO:Neuron:There are 3 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::_convolution, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-pytorch.md)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 304, fused = 207, percent fused = 68.09%
INFO:Neuron:compiling function _NeuronGraph$12969 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/bin/neuron-cc compile /tmp/tmpygd_jg_7/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpygd_jg_7/graph_def.neff --io-config {"inputs": {"tensor.1:0": [[1, 3, 608, 608], "float32"]}, "outputs": ["mul_66:0", "mul_74:0", "mul_82:0"]} --verbose 35'
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:441: UserWarning: Neuron runtime cannot be initialized; falling back to CPU execution
Tensor output are ** NOT CALCULATED ** during CPU execution and only indicate tensor shape (Triggered internally at  /opt/workspace/KaenaPyTorchRuntime/neuron_op/neuron_op_impl.cpp:38.)
  outs = wrap_retval(mod(*_clone_inputs(inputs)))
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/graph.py:383: UserWarning: Neuron runtime cannot be initialized; falling back to CPU execution
Tensor output are ** NOT CALCULATED ** during CPU execution and only indicate tensor shape (Triggered internally at  /opt/workspace/KaenaPyTorchRuntime/neuron_op/neuron_op_impl.cpp:38.)
  return self.func(*inputs)
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 304, compiled = 207, percent compiled = 68.09%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 1 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 100.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron: => aten::_convolution: 83
INFO:Neuron: => aten::add: 14
INFO:Neuron: => aten::cat: 14
INFO:Neuron: => aten::max_pool2d: 3
INFO:Neuron: => aten::silu: 83
INFO:Neuron: => aten::slice: 8
INFO:Neuron: => aten::upsample_nearest2d: 2
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 12 [supported]
INFO:Neuron: => aten::_convolution: 3 [supported]
INFO:Neuron: => aten::add: 3 [supported]
INFO:Neuron: => aten::cat: 1 [supported]
INFO:Neuron: => aten::contiguous: 3 [supported]
INFO:Neuron: => aten::copy_: 6 [supported]
INFO:Neuron: => aten::expand: 6 [supported]
INFO:Neuron: => aten::mul: 12 [supported]
INFO:Neuron: => aten::permute: 3 [supported]
INFO:Neuron: => aten::pow: 3 [supported]
INFO:Neuron: => aten::select: 6 [supported]
INFO:Neuron: => aten::sigmoid: 3 [supported]
INFO:Neuron: => aten::size: 9 [supported]
INFO:Neuron: => aten::slice: 12 [supported]
INFO:Neuron: => aten::sub: 3 [supported]
INFO:Neuron: => aten::view: 12 [supported]

Ownmarc · 2021-03-24T15:27:22Z

So I got Yolov5-x to run on an inf1 instance and did some speed test to see how it compares. This is some really basic test with a single image ran 10 time, here are my results on cpu (c5-xlarge), a 1080ti (my own computer) and an inf1-xlarge instance :

Here is the trace of the model compilation :

/home/ubuntu/yolov5/models/yolo.py:48: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.grid[i].shape[2:4] != x[i].shape[2:4]:
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:940: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
  _force_outplace,
INFO:Neuron:There are 3 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::_convolution, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-pytorch.md)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 414, fused = 317, percent fused = 76.57%
INFO:Neuron:compiling function _NeuronGraph$1426 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/bin/neuron-cc compile /tmp/tmphtcdzsru/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmphtcdzsru/graph_def.neff --io-config {"inputs": {"tensor.1:0": [[1, 3, 608, 608], "float32"]}, "outputs": ["mul_106:0", "mul_118:0", "mul_130:0"]} --verbose 35'
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:441: UserWarning: Neuron runtime cannot be initialized; falling back to CPU execution
Tensor output are ** NOT CALCULATED ** during CPU execution and only indicate tensor shape (Triggered internally at  /opt/workspace/KaenaPyTorchRuntime/neuron_op/neuron_op_impl.cpp:38.)
  outs = wrap_retval(mod(*_clone_inputs(inputs)))
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/graph.py:383: UserWarning: Neuron runtime cannot be initialized; falling back to CPU execution
Tensor output are ** NOT CALCULATED ** during CPU execution and only indicate tensor shape (Triggered internally at  /opt/workspace/KaenaPyTorchRuntime/neuron_op/neuron_op_impl.cpp:38.)
  return self.func(*inputs)
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 414, compiled = 317, percent compiled = 76.57%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 1 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 100.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron: => aten::_convolution: 131
INFO:Neuron: => aten::add: 28
INFO:Neuron: => aten::cat: 14
INFO:Neuron: => aten::max_pool2d: 3
INFO:Neuron: => aten::silu: 131
INFO:Neuron: => aten::slice: 8
INFO:Neuron: => aten::upsample_nearest2d: 2
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 12 [supported]
INFO:Neuron: => aten::_convolution: 3 [supported]
INFO:Neuron: => aten::add: 3 [supported]
INFO:Neuron: => aten::cat: 1 [supported]
INFO:Neuron: => aten::contiguous: 3 [supported]
INFO:Neuron: => aten::copy_: 6 [supported]
INFO:Neuron: => aten::expand: 6 [supported]
INFO:Neuron: => aten::mul: 12 [supported]
INFO:Neuron: => aten::permute: 3 [supported]
INFO:Neuron: => aten::pow: 3 [supported]
INFO:Neuron: => aten::select: 6 [supported]
INFO:Neuron: => aten::sigmoid: 3 [supported]
INFO:Neuron: => aten::size: 9 [supported]
INFO:Neuron: => aten::slice: 12 [supported]
INFO:Neuron: => aten::sub: 3 [supported]
INFO:Neuron: => aten::view: 12 [supported]
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:940: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
  _force_outplace,

aws-zejdaj · 2021-03-26T01:36:46Z

Hi Marc, thanks for all the good inputs. As you noted, the model runs successfully when excluding the Detect layer. We expect our upcoming Neuron release to improve performance, and will update back when released so you can run the test again.

Ownmarc · 2021-03-26T02:31:20Z

@aws-zejdaj thanks! Do you know this if model will 100% compile ? Also curious to know whats not compatible right now :)

glenn-jocher · 2021-03-29T10:10:19Z

@aws-zejdaj @Ownmarc thanks for your efforts on getting YOLOv5 to operate correctly on the AWS Inferentia chips! If I can be of any help let me know, I'm the primary maintainer at https://github.com/ultralytics/yolov5.

If you have any specific feedback about the Detect() layer regarding what is causing incompatibilities we may be able to feed this info into future YOLOv5 design decisions as well, as part of our goal is ease of YOLOv5 deployment across the largest addressable markets.

To provide additional info, the YOLOv5 Detect() layer is the very last layer which combines multiple heads (P3/8-small, P4/16-medium, P5/32-large and optionally P6/64-xlarge) into a single output. The source is here. Note that this layer behaves differently during training and deployment, and that a YOLOv5 model.fuse() op will fuse batchnorm and nn.Conv2d() layers together typically before compilation.

If we wanted to incorporate torch.neuron export capability @Ownmarc then we could also update export.py for this.

https://github.com/ultralytics/yolov5/blob/2bf34f50fda2d5997f301364f9a0b196fa57117b/models/yolo.py#L24-L64

class Detect(nn.Module):
    stride = None  # strides computed during build
    export = False  # onnx export

    def __init__(self, nc=80, anchors=(), ch=()):  # detection layer
        super(Detect, self).__init__()
        self.nc = nc  # number of classes
        self.no = nc + 5  # number of outputs per anchor
        self.nl = len(anchors)  # number of detection layers
        self.na = len(anchors[0]) // 2  # number of anchors
        self.grid = [torch.zeros(1)] * self.nl  # init grid
        a = torch.tensor(anchors).float().view(self.nl, -1, 2)
        self.register_buffer('anchors', a)  # shape(nl,na,2)
        self.register_buffer('anchor_grid', a.clone().view(self.nl, 1, -1, 1, 1, 2))  # shape(nl,1,na,1,1,2)
        self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch)  # output conv

    def forward(self, x):
        # x = x.copy()  # for profiling
        z = []  # inference output
        self.training |= self.export
        for i in range(self.nl):
            x[i] = self.m[i](x[i])  # conv
            bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
            x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

            if not self.training:  # inference
                if self.grid[i].shape[2:4] != x[i].shape[2:4]:
                    self.grid[i] = self._make_grid(nx, ny).to(x[i].device)

                y = x[i].sigmoid()
                y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy
                y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
                z.append(y.view(bs, -1, self.no))

        return x if self.training else (torch.cat(z, 1), x)

    @staticmethod
    def _make_grid(nx=20, ny=20):
        yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
        return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()

aws-zejdaj · 2021-03-29T21:24:59Z

Glenn, we'll be happy to discuss the best handling of the Detect layer, feel free to reach to us at aws-neuron-support@amazon.com

aws-renfu · 2021-04-12T21:17:19Z

Hi Glenn, just want to let you know that this is a high priority item and we are actively working on this issue. Thanks!

glenn-jocher · 2021-04-16T20:03:35Z

@aws-zejdaj that's great news! We welcome PRs, so if you discover any useful improvements to the codebase at https://github.com/ultralytics/yolov5 we'd be happy to review and and integrate there also.

glenn-jocher · 2021-04-30T11:07:22Z

@Ownmarc @aws-zejdaj @aws-renfu good news 😃! This issue may now been fixed ✅ in PR #ultralytics/yolov5#2953. To receive this update you can:

git pull from within your yolov5/ directory
git clone https://github.com/ultralytics/yolov5 again
Force-reload PyTorch Hub: model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

Ownmarc · 2021-05-01T14:24:14Z

Nice @glenn-jocher ! Thanks

aws-joshim · 2021-05-11T16:45:51Z

Closing this ticket since the latest torch neuron release supports the ultralytics Yolo v5 model - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/torch-neuron/torch-neuron.html#pytorch-neuron-rn

…ons (aws-neuron#253) * Add note for aws-neuron-dkms install to address multiple kernel versions * fix troubleshooting guide reference * clarify how to check if dkms is installed for the current kernel * clarify dkms dependency on linux kernel

…ons (#253) * Add note for aws-neuron-dkms install to address multiple kernel versions * fix troubleshooting guide reference * clarify how to check if dkms is installed for the current kernel * clarify dkms dependency on linux kernel

diazGT94 · 2021-09-23T10:53:49Z

Hi, I'm trying to replicate the steps indicated here to convert YoloV5s to neuron in inf1.

I am using Ubuntu 18.04 DLAMI. Activate the aws_neuron_pytorch_p36 python env

Installed this: pip install -r https://raw.githubusercontent.com/ultralytics/yolov5/master/requirements.txt
Then import from Pytorch-Hub: model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
Create a Fake Image: fake_image = torch.zeros([1, 3, 608, 608], dtype=torch.float32)
Model inspection: model_neuron_for_inspection = torch.neuron.trace(model, fake_image, skip_compiler=True)

But this gives me the following error:

Fusing layers... Model Summary: 224 layers, 7266973 parameters, 0 gradients Adding AutoShape... /home/ubuntu/.cache/torch/hub/ultralytics_yolov5_master/models/yolo.py:60: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic: /home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:940: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for list, use a tupleinstead. fordict, use a NamedTupleinstead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior. _force_outplace, /home/ubuntu/.cache/torch/hub/ultralytics_yolov5_master/models/yolo.py:60: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic: /home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:940: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. forlist, use a tupleinstead. fordict, use a NamedTupleinstead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior. _force_outplace, Traceback (most recent call last): File "neuron_converter.py", line 11, in <module> model_neuron_for_inspection = torch.neuron.trace(model, fake_image, skip_compiler=True) File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/convert.py", line 103, in trace neuron_graph, jit_trace = to_graph(func, example_inputs, return_trace=True, **kwargs) File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/convert.py", line 182, in to_graph jit_trace = torch.jit.trace(func_or_mod, example_inputs, **kwargs) File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py", line 742, in trace _module_class, File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py", line 966, in trace_module _module_class, File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py", line 519, in _check_trace raise TracingCheckError(*diag_info) torch.jit._trace.TracingCheckError: Tracing failed sanity checks! ERROR: Graphs differed across invocations!

Could you please guide me through how to perform the conversion for deploying it on Inf1? @Ownmarc

Thanks,

Ownmarc · 2021-09-23T13:21:53Z

@diazGT94, I think I had this bug too when I was running torch.neuron.analyze_model. The first time I was running it, I was getting an exception and the second time I was running it it was working fine and then I could run torch.neuron.trace without any issue.

Try this :

fake_image = torch.zeros([1, 3, 608, 608], dtype=torch.float32)
try:
    torch.neuron.analyze_model(model, example_inputs=[fake_image])
except Exception:
    torch.neuron.analyze_model(model, example_inputs=[fake_image])

model_neuron = torch.neuron.trace(model, 
                                example_inputs=[fake_image])

diazGT94 · 2021-09-23T16:19:00Z

@Ownmarc Thanks the code you provide me helped me. With the new upgrades from YoloV5 side is still required to define this and perform the conversion?

I just noticed that when the function below is not included in the conversion the detections doesn't perform well even if the parameters of the NMS are modified.

def subgraph_builder_function(node): return 'Detect' not in node.name

diazGT94 · 2021-12-09T16:29:37Z

Hello, until a couple of weeks ago I was able to use the following structure to convert my custom models to neuron

import torch.neuron

def subgraph_builder_function(node):
    return 'Detect' not in node.name

model = torch.hub.load('ultralytics/yolov5', 'custom', path='best.pt',force_reload=True) 
fake_image = torch.zeros([1, 3, 416, 416], dtype=torch.float32)  #Need to be equal to the input size of the image
try:
    torch.neuron.trace(model, example_inputs=[fake_image],subgraph_builder_function=subgraph_builder_function)
except Exception:
    torch.neuron.trace(model, example_inputs=[fake_image],subgraph_builder_function=subgraph_builder_function)

model_neuron = torch.neuron.trace(model, example_inputs=[fake_image],subgraph_builder_function=subgraph_builder_function)

model_neuron.save('neuron_model.pt')

However, today I tried to do it again but I got the following error:

INFO:Neuron:Compiling with command line: '/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/bin/neuron-cc compile /tmp/tmp3f_r9pcn/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp3f_r9pcn/graph_def.neff --io-config {"inputs": {"tensor:0": [[1, 3, 416, 416], "float32"]}, "outputs": ["tensor:0"]} --verbose 35'
Compiling with command line: '/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/bin/neuron-cc compile /tmp/tmp3f_r9pcn/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp3f_r9pcn/graph_def.neff --io-config {"inputs": {"tensor:0": [[1, 3, 416, 416], "float32"]}, "outputs": ["tensor:0"]} --verbose 35'
.12/09/2021 04:26:36 PM ERROR [neuron-cc]: tensor tensor:0 appears in both input and output of --io-config

I can overcome the error by not parsing the subgraph_builder_function argument to the neuron trace function. However, when I tried to make the inference with a model created with this "hack" the predicted results had a terrible performance compared to the ones I used to had when I pass that argument. Should I downgrade my torch.neuron to get the previous results?

glenn-jocher · 2021-12-09T18:04:13Z

@diazGT94 this is likely related to ultralytics/yolov5#5845 from 5 days ago. A temporary workaround might be to drop down a level, i.e. use model.model instead of model, but I'll think of a better long term solution to revert the behavior to the original behavior.

jpoberhauser · 2021-12-17T21:17:20Z

@diazGT94 I was actually able to convert by running:

import torch
model_v5 = torch.hub.load(<path_to_local_yolov5>,
        'custom',
        path=model_path,
        source='local',
        force_reload=True)  # local repo
# Create an example input for compilation
image = torch.zeros([1, 3, 640, 480], dtype=torch.float32)
#get model trace
model_neuron = torch.neuron.trace(model_v5, example_inputs=[image])

josebenitezg · 2022-05-11T19:34:15Z

Hi!

I was able to convert the model from yolov5 to neuron with the follow code:

import torch
import torch_neuron
from torchvision import models

model = torch.hub.load('yolo5',
        'custom',
        path='yolov5.pt',
        source='local',
        force_reload=True)  # local repo

fake_image = torch.zeros([1, 3, 640, 640], dtype=torch.float32)
#fake_image = (torch.rand(3), torch.rand(3))
try:
    torch.neuron.analyze_model(model, example_inputs=[fake_image])
except Exception:
    torch.neuron.analyze_model(model, example_inputs=[fake_image])

model_neuron = torch.neuron.trace(model, 
                                example_inputs=[fake_image])

## Export to saved model
model_neuron.save("model_converted.pt")
Now that I am trying to test and compare I have the tensors outputs different from yolo as follow:

Neuron Yolov5 Model:

[tensor([[-0.0356,  0.1790,  0.7456,  0.6292,  0.9359, 13.0000],
        [ 0.5830,  0.1404,  1.1279,  0.6628,  0.9359, 13.0000],
        [ 0.0823,  0.6350,  0.6272,  1.1599,  0.9315, 13.0000],
        [-0.1443,  0.1416,  0.2542,  0.5107,  0.9224, 13.0000],
        [ 0.3516,  0.6426,  0.7500,  1.0137,  0.9188, 13.0000],
        [ 0.3555,  0.1436,  0.7539,  0.5127,  0.9147, 13.0000]])]

Yolov5:

[tensor([[334.57495, 176.98302, 407.46155, 213.81169,   0.93721,  13.00000]])]

Inference script:

im = cv2.imread('test_img.jpg')
img0 = im.copy()
im = cv2.resize(im, (640, 640), interpolation = cv2.INTER_AREA)
# Convert
im = im.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB
im = np.ascontiguousarray(im)
# Convert into torch
im = torch.from_numpy(im)
im = im.float()  # uint8 to fp16/32
im /= 255  # 0 - 255 to 0.0 - 1.0
if len(im.shape) == 3:
    im = im[None]  # expand for batch dim

# Load the compiled model
model = torch.jit.load('model_converted.pt')

# Inference
pred = model(im)
pred = non_max_suppression(pred) #nms function used same as yolov5 detect.py

#Process predictions
for i, det in enumerate(pred):  # per image
    im0 = img0.copy()
    color=(30, 30, 30)
    txt_color=(255, 255, 255)
    h_size, w_size = im.shape[-2:]
    print(h_size, w_size)
    lw = max(round(sum(im.shape) / 2 * 0.003), 2) 

    if len(det):
        # Write results
        for *xyxy, conf, cls in reversed(det):
            c = int(cls)  # integer class
            label = f'{CLASSES[c]} {conf:.2f}'
            print(label)
            box = xyxy 
            p1, p2 = (int(box[0]* w_size), int(box[1]* h_size)), (int(box[2]* w_size), int(box[3]* h_size))
            cv2.rectangle(im0, p1, p2, color, thickness=lw, lineType=cv2.LINE_AA)
            tf = max(lw - 1, 1)  # font thickness
            w, h = cv2.getTextSize(label, 0, fontScale=lw / 3, thickness=tf)[0]  # text width, height
            outside = p1[1] - h - 3 >= 0  # label fits outside box
            p2 = p1[0] + w, p1[1] - h - 3 if outside else p1[1] + h + 3
            cv2.rectangle(im0, p1, p2, color, -1, cv2.LINE_AA)  # filled
            cv2.putText(im0,
                        label, (p1[0], p1[1] - 2 if outside else p1[1] + h + 2),
                        0,
                        lw / 3,
                        txt_color,
                        thickness=tf,
                        lineType=cv2.LINE_AA)
    # Save results (image with detections)
    status = cv2.imwrite('out.jpg', im0)

Is there something wrong when converting the model or running inference? The label and also the acc seems to be same as the expected, but tensors not.

jeffhataws · 2022-06-20T20:26:01Z

Is there something wrong when converting the model or running inference? The label and also the acc seems to be same as the expected, but tensors not.

Sorry @josebenitezg, we did not notice this new issue as it was posted in a closed issue. I have gone ahead and create a new github issue for you: #435 .

…ons (#253) * Add note for aws-neuron-dkms install to address multiple kernel versions * fix troubleshooting guide reference * clarify how to check if dkms is installed for the current kernel * clarify dkms dependency on linux kernel

Ownmarc mentioned this issue Mar 28, 2021

Maybe the cheapest cloud inference option for Yolov5 (AWS Neuron inf1 instance) ultralytics/yolov5#2643

Closed

jluntamazon mentioned this issue Apr 27, 2021

YOLOv5 AWS Inferentia Inplace compatibility updates ultralytics/yolov5#2953

Merged

aws-joshim closed this as completed May 11, 2021

mrnikwaws mentioned this issue Jun 16, 2021

Ultralytics yolov5 complilation, using weights attempt_load not working with torch.load() #277

Closed

aws-wanhenr mentioned this issue Jun 25, 2021

Add note for aws-neuron-dkms install to address multiple kernel versi… #292

Closed

diazGT94 mentioned this issue Sep 23, 2021

YoloV5 in Inf1 #330

Closed

minhtcai mentioned this issue Nov 18, 2021

understand model output ultralytics/yolov5#5304

Closed

hadilou mentioned this issue Jun 16, 2022

Compilation lasts long and require huge memory with large input images, error yolov5l6 #434

Closed

jeffhataws mentioned this issue Jun 20, 2022

Large difference in yolov5 inference output between Neuron compiled model vs expected #435

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiling yolov5 #253

Compiling yolov5 #253

Ownmarc commented Mar 23, 2021 •

edited

Loading

Ownmarc commented Mar 24, 2021

Ownmarc commented Mar 24, 2021 •

edited

Loading

aws-zejdaj commented Mar 26, 2021

Ownmarc commented Mar 26, 2021 •

edited

Loading

glenn-jocher commented Mar 29, 2021 •

edited

Loading

aws-zejdaj commented Mar 29, 2021

aws-renfu commented Apr 12, 2021

glenn-jocher commented Apr 16, 2021

glenn-jocher commented Apr 30, 2021

Ownmarc commented May 1, 2021

aws-joshim commented May 11, 2021

diazGT94 commented Sep 23, 2021 •

edited

Loading

Ownmarc commented Sep 23, 2021

diazGT94 commented Sep 23, 2021 •

edited

Loading

diazGT94 commented Dec 9, 2021

glenn-jocher commented Dec 9, 2021

jpoberhauser commented Dec 17, 2021

josebenitezg commented May 11, 2022 •

edited

Loading

jeffhataws commented Jun 20, 2022 •

edited

Loading

Compiling yolov5 #253

Compiling yolov5 #253

Comments

Ownmarc commented Mar 23, 2021 • edited Loading

Ownmarc commented Mar 24, 2021

Ownmarc commented Mar 24, 2021 • edited Loading

aws-zejdaj commented Mar 26, 2021

Ownmarc commented Mar 26, 2021 • edited Loading

glenn-jocher commented Mar 29, 2021 • edited Loading

aws-zejdaj commented Mar 29, 2021

aws-renfu commented Apr 12, 2021

glenn-jocher commented Apr 16, 2021

glenn-jocher commented Apr 30, 2021

Ownmarc commented May 1, 2021

aws-joshim commented May 11, 2021

diazGT94 commented Sep 23, 2021 • edited Loading

Ownmarc commented Sep 23, 2021

diazGT94 commented Sep 23, 2021 • edited Loading

diazGT94 commented Dec 9, 2021

glenn-jocher commented Dec 9, 2021

jpoberhauser commented Dec 17, 2021

josebenitezg commented May 11, 2022 • edited Loading

jeffhataws commented Jun 20, 2022 • edited Loading

Ownmarc commented Mar 23, 2021 •

edited

Loading

Ownmarc commented Mar 24, 2021 •

edited

Loading

Ownmarc commented Mar 26, 2021 •

edited

Loading

glenn-jocher commented Mar 29, 2021 •

edited

Loading

diazGT94 commented Sep 23, 2021 •

edited

Loading

diazGT94 commented Sep 23, 2021 •

edited

Loading

josebenitezg commented May 11, 2022 •

edited

Loading

jeffhataws commented Jun 20, 2022 •

edited

Loading