Expand the model zoo (example model set) #700

nudles · 2020-05-17T08:49:33Z

SINGA has multiple example models at http://singa.apache.org/docs/examples/
Some are implemented from scratch and some are converted from ONNX, which has a bigger model zoo https://github.com/onnx/models.
The task is to convert more onnx models and implement some popular (and interesting) models that are not in onnx model zoo.
Here are some reference model zoos https://modelzoo.co/, https://gluon-nlp.mxnet.io/model_zoo/index.html

nudles · 2020-05-17T14:29:32Z

Examples (To be updated):
MobileNetV2
ShuffleNetV2
YOLO
Mask RCNN
Faster RCNN
InceptionNetV3

Shashankwer · 2020-05-19T15:36:33Z

Will Start working with MobileNetV2, InceptionNetV3

Alvinnyk · 2020-05-21T03:09:10Z

I will work on Mask RCNN

joddiy · 2020-05-21T06:39:38Z

Hi, guys, @Shashankwer , @Alvinnyk , thanks for your help.

Before you start, please check this Jira, https://issues.apache.org/jira/projects/SINGA/issues/SINGA-509?filter=allopenissues

It lists the models which require support and for some of the models, SINGA may lack the necessary operators now.

So I hope you:

check the Jira carefully, to see which model we can do now(I guess is: super_resolution, ResNet101_DUC_HDC, ShuffleNet_V2, ShuffleNet_V1, inception_v2, squeezenet).
find these models in ONNX model zoo, download them and open them by a tool called Netron, try to understand the model and check each part refer the IR doc of ONNX.
try to read the code of SingaRep and SingaBackend in python/singa/soonx.py to understand how we parse and construct the SINGA model.
try to read the code examples/onnx, to make out how we load and do the inference for a ONNX model.
then you can try to support a model(in first point) by yourself.

agnesnatasya · 2020-05-22T15:45:39Z

I will try to work on the ShuffleNet

agnesnatasya · 2020-05-24T16:46:42Z

Hi, may I ask how do we get the Amazon AWS link to download the model itself? For example, for ResNet, the link is https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet18v1/resnet18v1.tar.gz. How do we get the link for the others? Because if I copy the link from ONNX's github, the link to download is Github's link, not Amazon AWS link. Thank you!

Shashankwer · 2020-05-25T16:22:21Z

For the models conversion do we also nee to train model on SINGA after conversion from onnx, and have a comparison matrix?

For example super_resolution via subpixel convolution layer was trained on BSD 300. Model is originally present in pytorch and can be converted well into SINGA using ONNX. It gives similar result for an image across pytorch, onnxruntime and SINGA. Do we need to compare the performance of the model across these platforms with the dataset they were trained on or SINGA compatibility is what we are checking at the moment?

Thanks and Regards,
Shashank Nigam

nudles · 2020-05-26T02:29:19Z

There is no need to retrain the model.
But we need to evaluate the model over a test dataset to make sure the conversion is correct.

nudles · 2020-05-26T02:31:09Z

Hi, may I ask how do we get the Amazon AWS link to download the model itself? For example, for ResNet, the link is https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet18v1/resnet18v1.tar.gz. How do we get the link for the others? Because if I copy the link from ONNX's github, the link to download is Github's link, not Amazon AWS link. Thank you!

https://github.com/onnx/models
I think you need to go to the github page for each model and then find the link for the model checkpoint file, which may not all be on aws.
@joddiy any comments?

joddiy · 2020-05-26T03:19:01Z

Hi, may I ask how do we get the Amazon AWS link to download the model itself? For example, for ResNet, the link is https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet18v1/resnet18v1.tar.gz. How do we get the link for the others? Because if I copy the link from ONNX's github, the link to download is Github's link, not Amazon AWS link. Thank you!

Hi, @agnesnatasya , it's not necessary to use the AWS download link. As you said, for the image classification models, I found the download link from the backend test cases here. But if for other models, you can use the download link within GitHub repo directly.

joddiy · 2020-05-26T03:41:19Z

For the models conversion do we also nee to train model on SINGA after conversion from onnx, and have a comparison matrix?

For example super_resolution via subpixel convolution layer was trained on BSD 300. Model is originally present in pytorch and can be converted well into SINGA using ONNX. It gives similar result for an image across pytorch, onnxruntime and SINGA. Do we need to compare the performance of the model across these platforms with the dataset they were trained on or SINGA compatibility is what we are checking at the moment?

Thanks and Regards,
Shashank Nigam

For this version of SINGA, we don't need to consider the matter of training(or retaining), we just need to test if we can run the model without errors, and the result is correct. In the next version(what we are doing now), the ONNX of SINGA will support retraining.

So for now, we just need to do three things:

load the model correctly without errors.
verify the model by using the test dataset within the model download package. If you Download (with sample test data), you can see a directory called test_data_set_0, and you can verify the result of the model by using these test data.
for demo, we need to do the inference by using a human-readable data(or image), so you need to find a public image and then use the model to do the inference, finally show the result.

agnesnatasya · 2020-05-26T11:56:02Z

@joddiy Thank you for the reply, I could find the link to download, and I have completed SqueezeNet. May I ask how do I test if my conversion is already correct? Thank you!

joddiy · 2020-05-27T03:23:22Z

@joddiy Thank you for the reply, I could find the link to download, and I have completed SqueezeNet. May I ask how do I test if my conversion is already correct? Thank you!

you can use this code to verify the model by using its test data set:

    # verifty the test dataset
    from utils import load_dataset
    inputs, ref_outputs = load_dataset(
        os.path.join('/tmp', 'resnet100', 'test_data_set_0'))
    x_batch = tensor.Tensor(device=dev, data=inputs[0])
    outputs = model.forward(x_batch)
    for ref_o, o in zip(ref_outputs, outputs):
        np.testing.assert_almost_equal(ref_o, tensor.to_numpy(o), 4)

Shashankwer · 2020-05-28T12:59:17Z

Hi,

Can anyone verify if for SuperResolutionNet, attached notebook would be sufficient?

Thanks and Regards,
Shashank Nigam
SuperResolution.zip

Shashankwer · 2020-05-30T16:20:24Z

Hi,

Shufflenetv2 for average pooling layer the conversion from onnx fails on below error:
ValueError: Not implemented yet for count_include_pad or ceil_mode

with below stack trace

ValueError Traceback (most recent call last)
in ()
----> 1 sg_ir = sonnx.prepare(onnx_model, device=dev)

3 frames
/usr/local/lib/python3.7/site-packages/singa/sonnx.py in prepare(cls, model, device, **kwargs)
2033 opset_version = 1
2034 weights, singa_ops = cls._onnx_model_to_singa_net(
-> 2035 model, init_inputs, device, opset_version)
2036 return SingaRep(model, weights, singa_ops,
2037 cls.keep_initializers_as_inputs)

/usr/local/lib/python3.7/site-packages/singa/sonnx.py in _onnx_model_to_singa_net(cls, model, init_inputs, device, opset_version)
1971 ]
1972 handle, forward = cls._onnx_node_to_singa_op(
-> 1973 node, inputs, opset_version)
1974 # if it is Constant, we hanlde it as a weight
1975 # otherwise, we run it and add its output into map for being used by later operators

/usr/local/lib/python3.7/site-packages/singa/sonnx.py in _onnx_node_to_singa_op(cls, onnx_node, inputs, opset_version)
1816 else:
1817 translator = cls._common_onnx_node_to_singa_op
-> 1818 return translator(onnx_node, inputs, opset_version)
1819
1820 classmethod

/usr/local/lib/python3.7/site-packages/singa/sonnx.py in _create_max_avg_pool(cls, onnx_node, inputs, opset_version)
1635 if "count_include_pad" in onnx_node.attrs or "ceil_mode" in onnx_node.attrs:
1636 raise ValueError(
-> 1637 "Not implemented yet for count_include_pad or ceil_mode")
1638
1639 # only support 2d

ValueError: Not implemented yet for count_include_pad or ceil_mode.

Kindly let us know the further steps to be taken

Thanks and Regards,
Shashank Nigam

agnesnatasya · 2020-05-30T16:48:29Z

@Shashankwer I got the same error too, there is a "not implemented error" too when I try to run ShufflenetV1

I got a floating point exception: 8 when loading GPT2, does anyone know what causes this?

agnesnatasya · 2020-05-30T16:52:02Z

@joddiy Hi, can I ask, for the other models that is not in ONNX, how do we find the reliable reference for it? I look at the papers, and they usually only discuss about the general idea and comparison between models, but not the details on operations to be done on every layer.

Shashankwer · 2020-05-30T17:07:43Z

@Shashankwer I got the same error too, there is a "not implemented error" too when I try to run ShufflenetV1

I got a floating point exception: 8 when loading GPT2, does anyone know what causes this?

Hi @agnesnatasya ,

For ShufflenetV1 issue is it requires cuda gpu for implementation. It makes use of grouped 1x1 convolution which does not seem to be supported by cpu module and is written only for cuda. It refers get_default_device as '-1' and gpu devices starting from 0. For me it failed on the check condition
device.id()==-1 in sonnx.py.
Running the same on colab did resolve the issue.

Thanks,
Shashank

agnesnatasya · 2020-05-31T02:55:07Z

@Shashankwer Oh, I see, thank you for that!

agnesnatasya · 2020-06-03T13:11:55Z

@joddiy Hi, can I ask, for the other models that is not in ONNX, how do we find the reliable reference for it? I look at the papers, and they usually only discuss about the general idea and comparison between models, but not the details on operations to be done on every layer. If I look at https://modelzoo.co/, there are a lot of different implementations. How do we choose the one that we could reference from? Thank you!

joddiy · 2020-06-03T13:22:59Z

@joddiy Hi, can I ask, for the other models that is not in ONNX, how do we find the reliable reference for it? I look at the papers, and they usually only discuss about the general idea and comparison between models, but not the details on operations to be done on every layer. If I look at https://modelzoo.co/, there are a lot of different implementations. How do we choose the one that we could reference from? Thank you!

Typically, when an author published his paper, he usually had to release his code at the same time. So, I guess if you cannot find the code on the GitHub, you can try to send an email to the author to ask for the code.

joddiy · 2020-06-03T13:27:32Z

@Shashankwer I got the same error too, there is a "not implemented error" too when I try to run ShufflenetV1

I got a floating point exception: 8 when loading GPT2, does anyone know what causes this?

@Shashankwer @agnesnatasya , sorry for the late, I saw this reply just now. Yes, as @agnesnatasya has said, some features in ONNX have not been implemented in SINGA. For these errors, please skip this model. I guess we will implement these features soon.

Shashankwer · 2020-06-03T15:17:49Z

Hi @joddiy,

For Shufflenetv2 the model conversion is failing due to use of ceil_mode. Although the value is set to false in onnx model (0) its still failing at a condition.

I have opened a new issue for the same. If we can split the condition checking

if "count_include_pad" in onnx_node.attrs or "ceil_mode" in onnx_node.attrs

to

if "ceil_mode" in onnx_node.attrs and onnx_node.attrs["ceil_mode"]:
  RaiseError
if "count_include_pad" in onnx_node.attrs:
  RaiseError

Shufflenetv2 can be successfully converted to singa.

joddiy · 2020-06-03T15:24:40Z

Hi @joddiy,

For Shufflenetv2 the model conversion is failing due to use of ceil_mode. Although the value is set to false in onnx model (0) its still failing at a condition.

I have opened a new issue for the same. If we can split the condition checking
if "count_include_pad" in onnx_node.attrs or "ceil_mode" in onnx_node.attrs
to
if "ceil_mode" in onnx_node.attrs and onnx_node.attrs["ceil_mode"]:
  RaiseError
if "count_include_pad" in onnx_node.attrs:
  RaiseError
Shufflenetv2 can be successfully converted to singa.

Thanks, @Shashankwer , you can open a PR update like this:

ceil_mode = onnx_node.getattr("ceil_mode", 0)
count_include_pad = onnx_node.getattr("count_include_pad", 0)
if ceil_mode != 0 or count_include_pad != 0:
...

Shashankwer · 2020-06-03T15:34:22Z

@joddiy Hi, can I ask, for the other models that is not in ONNX, how do we find the reliable reference for it? I look at the papers, and they usually only discuss about the general idea and comparison between models, but not the details on operations to be done on every layer. If I look at https://modelzoo.co/, there are a lot of different implementations. How do we choose the one that we could reference from? Thank you!

Typically, when an author published his paper, he usually had to release his code at the same time. So, I guess if you cannot find the code on the GitHub, you can try to send an email to the author to ask for the code.

Hi @joddiy ,

Do we need to prepare the model architecture or train it as well? For training what dataset should we include? (some models are trained on imagenet where dataset can be huge).

joddiy · 2020-06-03T15:37:47Z

@joddiy Hi, can I ask, for the other models that is not in ONNX, how do we find the reliable reference for it? I look at the papers, and they usually only discuss about the general idea and comparison between models, but not the details on operations to be done on every layer. If I look at https://modelzoo.co/, there are a lot of different implementations. How do we choose the one that we could reference from? Thank you!

Typically, when an author published his paper, he usually had to release his code at the same time. So, I guess if you cannot find the code on the GitHub, you can try to send an email to the author to ask for the code.

Hi @joddiy ,

Do we need to prepare the model architecture or train it as well? For training what dataset should we include? (some models are trained on imagenet where dataset can be huge).

Sure, all these works have been done in this PR #703 , if you are interested at it, you can check the new SONNX code.

agnesnatasya · 2020-06-06T06:38:08Z

Hi @joddiy I tried to implement InceptionV1, but it involves LRN thus it is not yet supported. I tried to implement InceptionV2 but I received this error

F0606 13:54:23.837922 336954816 tensor.cc:1177] Check failed: size == t.Size() / t.shape(axis) (196 vs. 169) The size of all axis should  be the same except the concatenated axis
*** Check failure stack trace: ***
Abort trap: 6

May I ask if there is any other model (outside ONNX) that I could implement? Thank you!

joddiy · 2020-06-11T05:34:52Z

Thanks @agnesnatasya , I'll test the InceptionV2 again to check the error.

And thanks you all for your contribution! @Shashankwer @agnesnatasya @Alvinnyk

For the next plan, I list some necessary operators we have to implement:

Type 1

~~Abs 1h~~
~~Floor 1h~~

Type 2

~~Exp 1h~~
~~Round 6h~~
~~Pad 6h~~
~~Expand 6h~~
~~Upsample 6h~~

Type 3

~~SpaceToDepth 12h~~
ScatterElements 12h
RoiAlign 12h
NonMaxSuppression 12h
Resize 12h

Type 4

TopK c++
ReduceMin c++
ReduceMax c++

Type 1 means the operators have been implemented in autograd.py, we only need to add it into the sonnx.py.
Type 2 and Type 3 mean the operators don't exist in autograd.py, we need to implement them first. Type 2 is easier but Type 3 is harder.
Type 4 need to modify the c++ code, we'll consider it later.
Xh means the workload hours.

So the plan is:

You should read the code of soonx.py under dev branch. Because we have updated the soonx to support the re-training, so it's a little different from the old version.
You try to add an operator at the Type 1, you may add a key in the SingaBackend._rename_operators, and if you need a function to modify the operator, add a key in the SingaBackend._special_operators to indicate such a function.
Try to add the test cases to test your new operator, the test cases list at here, and you should check the code in test_onnx_backend.py. There are two patterns, the include pattern means the operators which do not in it will be excluded, and the exclude patterns will exclude the pattern in it.
When you finish the Type 1, I guess you must have been familiar with the sonnx, so you can try the Type 2 and Type 3. These need you to add operators in autograd.py, you can read the code in it and add test cases in test_operation.py. In the autograd.py, you need to add a class with forward and backward functions. and you need to add a function to call the class's forward function. Don't forget to add test cases in test_operation.py. After you add the operator in the autograd, you can add it to the sonnx following the point 2 and 3.

I know it's a little difficult at first. So I hope you don't be urgent to do, just read the code carefully first and if you have any questions, feel free to ask me.

By the way, please add your code to dev branch. And please skip the frontend code in sonnx.py, only read the backend one. I'm working on the frontend to upgrade it.

joddiy · 2020-06-14T12:36:56Z

Hi, you can follow my PR as an example.

#734

and

#736

Shashankwer · 2020-06-15T18:53:58Z

Hi @joddiy,

For The Type I operator floor is not present in autograd.py can we implement the same in autograd.py on taking the implementation of Ceil as the reference. Also Equal is implemented in the existing code.

Thanks,
Shashank

nudles · 2020-06-16T01:25:20Z

Yes. I think you can implement it following Ceil implementation.

…

On Tue, Jun 16, 2020 at 2:54 AM Shashank Nigam ***@***.***> wrote: Hi @joddiy <https://github.com/joddiy>, For The Type I operator floor is not present in autograd.py can we implement the same in autograd.py on taking the implementation of Ceil as the reference. Also Equal is implemented in the existing code. Thanks, Shashank — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#700 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA47DR635GXS7ZM2PJWUTL3RWZU5LANCNFSM4NDJFC2A> .

joddiy · 2020-06-16T07:27:54Z

Hi @joddiy,

For The Type I operator floor is not present in autograd.py can we implement the same in autograd.py on taking the implementation of Ceil as the reference. Also Equal is implemented in the existing code.

Thanks,
Shashank

Yes, @Shashankwer , you can create a new operator in autograd.py by using this func singa.Floor

Shashankwer · 2020-06-17T12:59:06Z

Hi,

Exp operator is already implemented in autograd.py

Thanks and Regards,
Shashank Nigam

joddiy · 2020-06-18T07:36:28Z

Hi,

Exp operator is already implemented in autograd.py

Thanks and Regards,
Shashank Nigam

Hi, @Shashankwer , the Type 1 means the operators have been implemented in autograd.py, we only need to add it into the sonnx.py.

You can follow my PR to see how to add an operator to sonnx.py. By the way, you should test the test_onnx_backend.py.

@Shashankwer @Alvinnyk @agnesnatasya
There are two points I should explain a little more:

onnx_node.set_attr_inputs(onnx_node.inputs[1], 'depth')
onnx_node.set_weight_inputs(onnx_node.inputs[2], 'scale')

set_attr_inputs means in ONNX Operator Schemas, they regard an element as input, but we regard it as an attribute, so we mark the second input onnx_node.inputs[1] as attribute depth in SINGA.

set_weight_inputs means in ONNX Operator Schemas, they regard an element as input and they store its value, but we regard it as a weight in layer.py. so we mark the third input onnx_node.inputs[2] as weight scale in SINGA.

Shashankwer · 2020-06-24T18:38:15Z

Hi,

Just one question, does apache singa CTensor support advanced indexing as what numpy supports?

Thanks,
Shashank

joddiy · 2020-06-25T07:00:24Z

Hi,

Just one question, does apache singa CTensor support advanced indexing as what numpy supports?

Thanks,
Shashank

cannot yet. I guess for this operator, you can use tensor.to_numpy(tensor.from_raw_tensor(CTensor)) to convert it to numpy array to do it for now. We will consider to implement it at c++ end later.

Shashankwer · 2020-07-06T05:52:57Z

Hi,

Will work on RoiAlign. Also can GatherElements be implemented similar to ScatterElements?

Thanks and Regards,
Shashank Nigam

joddiy · 2020-07-06T06:35:26Z

Hi,

Will work on RoiAlign. Also can GatherElements be implemented similar to ScatterElements?

Thanks and Regards,
Shashank Nigam

Thanks @Shashankwer . I just saw your PR of ScatterElements.

I guess you can implement the NonMaxSuppression firstly, it's easier than RoiAlign. There are some operators we cannot support for RoiAlign, for example, the crop. But the NonMaxSuppression is straightforward now.

You can follow this one:

https://github.com/pytorch/vision/blob/master/torchvision/ops/boxes.py

And for RoiAlign, we have found an implement on GitHub, we're discussing how to implement on python or c++.

https://github.com/longcw/RoIAlign.pytorch

Shashankwer · 2020-07-06T13:06:27Z

Thanks for the reference will try implementing NonMaxSuppression first

joddiy mentioned this issue May 28, 2020

Implement Squeezenet using Squeezenet1.1 #711

Merged

joddiy mentioned this issue Jun 22, 2020

Adding Operators #738

Merged

nudles pinned this issue Jun 24, 2020

chrishkchris unpinned this issue Jun 25, 2020

Expand the model zoo (example model set) #700

Expand the model zoo (example model set) #700

Comments

nudles commented May 17, 2020

nudles commented May 17, 2020

Shashankwer commented May 19, 2020

Alvinnyk commented May 21, 2020

joddiy commented May 21, 2020

agnesnatasya commented May 22, 2020

agnesnatasya commented May 24, 2020

Shashankwer commented May 25, 2020

nudles commented May 26, 2020

nudles commented May 26, 2020

joddiy commented May 26, 2020 • edited Loading

joddiy commented May 26, 2020

agnesnatasya commented May 26, 2020

joddiy commented May 27, 2020 • edited Loading

Shashankwer commented May 28, 2020

Shashankwer commented May 30, 2020 • edited Loading

agnesnatasya commented May 30, 2020 • edited Loading

agnesnatasya commented May 30, 2020

Shashankwer commented May 30, 2020 • edited Loading

agnesnatasya commented May 31, 2020

agnesnatasya commented Jun 3, 2020

joddiy commented Jun 3, 2020

joddiy commented Jun 3, 2020

Shashankwer commented Jun 3, 2020

joddiy commented Jun 3, 2020

Shashankwer commented Jun 3, 2020

joddiy commented Jun 3, 2020

agnesnatasya commented Jun 6, 2020

joddiy commented Jun 11, 2020 • edited Loading

Type 1

Type 2

Type 3

Type 4

joddiy commented Jun 14, 2020 • edited Loading

Shashankwer commented Jun 15, 2020

nudles commented Jun 16, 2020 via email

joddiy commented Jun 16, 2020

Shashankwer commented Jun 17, 2020

joddiy commented Jun 18, 2020 • edited Loading

Shashankwer commented Jun 24, 2020

joddiy commented Jun 25, 2020

Shashankwer commented Jul 6, 2020

joddiy commented Jul 6, 2020 • edited Loading

Shashankwer commented Jul 6, 2020

joddiy commented May 26, 2020 •

edited

Loading

joddiy commented May 27, 2020 •

edited

Loading

Shashankwer commented May 30, 2020 •

edited

Loading

agnesnatasya commented May 30, 2020 •

edited

Loading

Shashankwer commented May 30, 2020 •

edited

Loading

joddiy commented Jun 11, 2020 •

edited

Loading

joddiy commented Jun 14, 2020 •

edited

Loading

joddiy commented Jun 18, 2020 •

edited

Loading

joddiy commented Jul 6, 2020 •

edited

Loading