Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster Serving refine Timer #2825

Merged
merged 7 commits into from
Sep 8, 2020
Merged

Conversation

Litchilitchy
Copy link
Contributor

No description provided.

@Litchilitchy Litchilitchy merged commit ff7cd88 into intel-analytics:master Sep 8, 2020
@Litchilitchy Litchilitchy deleted the timer branch September 16, 2020 06:53
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Jul 20, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Jul 26, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Jul 30, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 7, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 9, 2021
dding3 added a commit that referenced this pull request Aug 12, 2021
* Modify the thread pool to adopt mkldnn models (#2608)

The `Engine.default` will support single thread including `LocalOptimizer` and `DistriOptimizer`. For supporting single thread version of `invokeWait2` method in `ThreadPool`, it will set the threadpool to current thread.

1. For dnn model, it will use the affinity to bind omp thread. And for performance issue, the default thread must use current main thread.
2. For MTLabeldBGRImgToBatch will use another new threads pool which is called io. So it will not be blocked when the default thread pool is single thread.
3. For FileWriter, it will not use default, otherwise the whole app will stuck at creating summary.

* feature: add shutdown for optimizer which will release the native resources (#2609)

Release native resources at the end of training.

It will call `release` of model for all models cloned in optimizer at the end of training.

1) `LocalOptimizer` is very simple because all models cloned is local.
2) `DistriOptimizer` is a little complicated. We should do release before `models.unpersist`, otherwise it
     will serialize and transfer the model again. And `ModelBroadcast` will clone new model when do
     value, so we should release them also.

* fix: dnn currently not support windows and will be supported in future (#2613)

* [Bug Fix] Handle grey image correctly if model require a 3 channel tensor input (#2616)

* handle grey image correct if model require a 3 channel tensor input

* move the test image files so not break old tests

* fix style error

* fix: set the encoding of input and output files is UTF-8 (#2615)

* [Feature] Allow user to customized how model is broadcast in distributed training (#2618)

* allow user to override ModelBroadcast

* update configuration doc

* meet code review

* NLL unlabeled data fix (#2620)

* fix: the inference performance regression of mkldnn (#2622)

We should copy weights when updateOutput at training. The weights are loaded before and will not be changed when do inference.

* fix: style check errors (#2625)

* [new feature]Hit Ratio and NDCG (#2623)

* [new feature]Parallel Adam (#2626)

* feat: training ResNet50 w/ dnn backend on Spark. (#2624)

* feat: resnet50 training with distributed mode
* fix: unknown segmentfault
* fix: clone of dnn tensor
* fix: delete unused codes
* fix: bn initialization
* fix: performance regression
* fix: convergence regression
* fix: delete the release in ModelBroadcast
* fix: to pass all uni tests and delete segment fault.

* feat: add dnn vgg model. (#2627)

* feat: add dnn vgg model.

* fix: rename the ResNet50Perf to Perf

* Fix issue 2592 (#2629)

* fix issue Predictor 2592

* adjust the algorithm

* fix whitespace style check

* fix code review issue

* fix loop efficiency

* feat: add example for lenet w/ dnn (#2635)

* fix join table will throw exception during backward if batchsize is changed (#2638)

* fix join table backward

* change to resize as

* change Reshape to InferReShape in reshapeLoadTF (#2637)

* change Reshape to InferReShape in reshapeLoadTF

* fix docs

* fix failed code

* fix failed code

* fixes after code review

* fixes after code review

* fix

* fix infer

* add unit tests

* feature: vgg-16 with mkldnn backend (#2631)

* feat: vgg-16 with mkldnn backend
* fix: tests errors
* fix: case class too much arguments
* fix: vgg_16 blas model supports
* fix: protobuf of serializer
* fix: sgd poly test case error
* fix: consitent of poly impl
* fix: rename the version2 of Xavier to varianceNormAverage

* Refine the synchronizer to support prioirty and also make it event driven (#2634)

* refinement

* refinement

* refinement

* refinement per comments

* refinemnt per review

* perf: need not narrow the gradients and zero gradients for dnn backend (#2632)

* perf: need not narrow the gradients and zero gradients for dnn backend
* fix: empty gradient zero for dnn backend
* fix: delete affine

* fix break unit test (#2642)

* New parallel optimizer (#2643)

* add new parallel optimizer

* change infor back to debug for metrics log

* refinement per comments

* refinement per comments on single model optimization

* refinement for sharing common methods

* fix style

* refinement to reuse duplicate code

* Fix transfer learning (#2645)

* fix transfer learning
* add ParseSingleExample, DecodeBmp tf loader
* add corresponding unit tests

* remove potential performance downgrader (#2651) (#2652)

* Fix transfer learning (#2645)

* fix transfer learning
* add ParseSingleExample, DecodeBmp tf loader
* add corresponding unit tests

* remove potential performance downgrader

* abstractify tests with common spark lifecycle (#2654)

apply SparkContextLifeCycle to tests

default app name + extend SparkContextLifeCycle to other compatible tests

add custom before and after

* bump version to 0.8.0-SNAPSHOT (#2660)

* bump version to 0.8.0-SNAPSHOT

* add core link update

* [Enhancement] - Deco legacy transformers and train InceptionV1 to meet training target (#2661)

* refinement inception v1 training code

* fix ut due to the init change

* fix type

* fix param

* Add python API for new transformers and apply them in inception training example (#2663)

* refinement on python API

* fix ut

* fix: uuid() will return a new uuid every call (#2667)

* fix: uuid() will return a new uuid every call
* fix: add partitionId to value()
* fix: we need not add partition id to the value()
* fix: code clean

* [Bug Fix] Fix predictor issue while batch size == 1 for some topology (#2669)

* fix predictor issue

* refinemnt per batchsize only

* fix ut

* remove unused code

* fix batch size == 1

* fix ut

* AND/OR compound triggers support (#2675)

* AND/OR compound triggers support

* Unit-tests for compound triggers

* Unit-tests for compound triggers updates for OR

* Style fixes for Trigger.and/or

* Style fixes for Trigger.and/or

* Trigger.endWhen docs update

* Trigger.endWhen docs update

* add dnn graph (#2666)

* add dnn graph

* move compile to forward, add graph test to perf

* add dnn graph option to example

* style check

* replace dnn with dnn graph in examples

* Update README.md (#2681)

* Fix ut failed due to duplicated spark context (#2687)

* Fix ut failed

* fix ut

* add no phase api when initPrimitives (#2686)

* delete phase in iniPrimitives

* fix style check

* improve memoryReorder layer to handle conversion between nhwc and nchw (#2683)

* fix reorder to handle nhwc

* add init memory for ReorderMemory

* support same padding in dnn layer (#2684)

* support same padding in dnn layer

* meet review

* add BlasWrapper (#2690)

* add BlasWrapper

* refactor code

* meet review

* SerializerSpec excluded mkldnn.BlasWrapper

* change some comments

* add dnn output layer (#2691)

* add dnn output layer

* SerializerSpec excluded mkldnn Output

* change some comments

* Irelement (#2695)

* add ir element and convertion

* add more comments

* meet comments

* change map name

* support dlclassifiermodel binary classification (#2705)

* add IR graph and conversion from IR graph to blas graph or dnn graph (#2704)

* add ir graph

* fix model evaluate & conv without bias

* add dnnMode & support table inputs

* irelement & graph layer use same weights

* meet pr comments and code refactor

* update  latest weight for validation (#2710)

* convert static graph to IR graph and build (#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (#2682)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (#2737)

* refactor predict for dnn model

* remove some unit tests (#2752)

* remove some conflict tests (#2753)

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* change some docs about mkldnn (#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (#2783)

* fix: inplace of input/output and weight dimension error (#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (#2778)

* fix softmax (#2777)

* fix: performance regression on resnet50 (#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* flip to 0.9.0 (#2792)

* test: should compare the right grad input (#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (#2821)

* Optimize backward graph generation and CAddTable (#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (#2822)

* Use one AllReduceParameter for multi-optim method  training (#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* bug fix for cmul (#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (#2824)

* fix: fusion for multi-group of convolution (#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (#2843)

* fix acc bug & init dnn thread (#2841)

* support tnc and ntc conversion (#2844)

* support ntc in dnn layer (#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (#2853)

* fix: wrong affinity settings (#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (#2858)

* Add beam search in transformer (#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (#2854)

* feat: add axis to softmax (#2859)

* flip version to 0.10.0 (#2869)

* [Bug Fix] - Fix module version comparison  (#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (#2872)

* feat: RoiAlign Forward (#2874)

* Add set input output format API in Python (#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (#2870)

* fix memory leak for ir graph training (#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (#2897)

* add gather layer

* [New feature] Add maskhead (#2892)

* support for maskhead

* fix unit tests (#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (#2930)

* Onnx support: add pos parameter to softmax (#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (#2940)

* revert back api (#2943)

* fix: softmax and bn+scale fusion (#2937)

* feat: multi models support with MKL-DNN backend (#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* add maskrcnn inference example (#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (#2947)

* fix: takeSample only works for dnn backend and get one batch

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (#2959)

* fix: the squeeze should not be included in IRElement (#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (#2973)

* fix: nms stability when using treeset. (#2972)

* flip version to 0.11 (#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (#2971)

* fix: enable integration accuracy tests (#2976)

* fix: softmax dnn backend wrong order of primitive (#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* feat: add distri optimizer v2 (#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* Remove final for AbstractModule (#3001)

* DistriOptimizerV2 argument (#3003)

* call DistriOptimizerV2

* fix inception (#3010)

* fix top1 and treenn (#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* test examples by distrioptimizerv2 (#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* deprecate nn.keras (#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (#3021)

* fix loss

* fix ut

* fix style check (#3022)

* flip version to 0.12 (#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* [WIP] spark 3.0 (#3054)

* spark 3.0

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (#3094)

* back port master (#3096)

* set seed to avoid random error in PredictionServiceUT (#3097)

* add serializeUid (#3099)

* update doc (#3104)

* remove DLFrames (#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* make default DistriOptimizer as V2 (#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (#3133)

* DistriOptimizerV2 logger (#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (#3137)

* upgrade spark version (#3138)

* upgrade log4j (#3141)

* flip0.14 (#3142)

* flip0.14

* update

* fix common compile issue

* change dllib package name

* fix serization failure

* fix FileSpec failure

Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: Ian Wong <yiheng.wang@intel.com>
Co-authored-by: Griffin Kardos <kardosgriffin@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: megaSpoon <bowen.she@intel.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Tao Pathompong Ruangyam <tao@starcolon.com>
Co-authored-by: abdmob <abdulla.abd.m@gmail.com>
Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Le-Zheng <30695225+Le-Zheng@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Le-Zheng added a commit that referenced this pull request Sep 1, 2021
* convert static graph to IR graph and build (#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (#2682)

* add spark 2.4 support (#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (#2737)

* refactor predict for dnn model

* remove some unit tests (#2752)

* remove some conflict tests (#2753)

* Update documentation (#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (#2763)

* update release doc for preparation (#2764)

* change some docs about mkldnn (#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (#2783)

* fix: inplace of input/output and weight dimension error (#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (#2778)

* fix softmax (#2777)

* fix: performance regression on resnet50 (#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (#2671)

* flip to 0.9.0 (#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (#2821)

* Optimize backward graph generation and CAddTable (#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (#2822)

* Use one AllReduceParameter for multi-optim method  training (#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (#2824)

* fix: fusion for multi-group of convolution (#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (#2843)

* fix acc bug & init dnn thread (#2841)

* support tnc and ntc conversion (#2844)

* support ntc in dnn layer (#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (#2853)

* fix: wrong affinity settings (#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (#2858)

* Add beam search in transformer (#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (#2854)

* feat: add axis to softmax (#2859)

* add release doc for 0.9 (#2862)

* fix: update core ref to master (#2865)

* flip version to 0.10.0 (#2869)

* [Bug Fix] - Fix module version comparison  (#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (#2872)

* tutorial fix (#2879)

* feat: RoiAlign Forward (#2874)

* Add set input output format API in Python (#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (#2870)

* fix memory leak for ir graph training (#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (#2897)

* add gather layer

* [New feature] Add maskhead (#2892)

* support for maskhead

* fix unit tests (#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (#2930)

* Onnx support: add pos parameter to softmax (#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (#2940)

* revert back api (#2943)

* fix: softmax and bn+scale fusion (#2937)

* feat: multi models support with MKL-DNN backend (#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (#2959)

* fix: the squeeze should not be included in IRElement (#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (#2973)

* fix: nms stability when using treeset. (#2972)

* flip version to 0.11 (#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (#2971)

* fix: enable integration accuracy tests (#2976)

* fix: softmax dnn backend wrong order of primitive (#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (#3002)

* Remove final for AbstractModule (#3001)

* DistriOptimizerV2 argument (#3003)

* call DistriOptimizerV2

* fix inception (#3010)

* fix top1 and treenn (#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (#3020)

* test examples by distrioptimizerv2 (#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (#3021)

* fix loss

* fix ut

* fix style check (#3022)

* specify pyspark version (#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (#3054)

* spark 3.0

* add spark3.0 deployment (#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (#3094)

* back port master (#3096)

* set seed to avoid random error in PredictionServiceUT (#3097)

* Jdk11 support (#3098)

* update for jdk 11 support and doc

* add serializeUid (#3099)

* update doc (#3104)

* add doc for running in ide (#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (#3111)

* add list of df support (#3113)

* Update readme (#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (#3133)

* DistriOptimizerV2 logger (#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (#3137)

* upgrade spark version (#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (#3141)

* flip0.14 (#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
dding3 added a commit to dding3/analytics-zoo that referenced this pull request Sep 2, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
dding3 added a commit to dding3/analytics-zoo that referenced this pull request Sep 7, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 7, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 7, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
dding3 added a commit to dding3/analytics-zoo that referenced this pull request Sep 8, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 8, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng pushed a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 10, 2021
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 10, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 14, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 14, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 14, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 14, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 14, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 14, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit that referenced this pull request Sep 17, 2021
* [Enhancement] Check duplicate layers in the container (#2351)

* check duplicate layers in the container

* add more unit ttest

* fix unit test

* fix test

* meet code review

* add [[MkString]] Operation (#2355)

* add [[MkString]] Operation

* add SerializerTest

* modify the ScalaDoc

* fix the complier error

* Enhance and refactor the logic of InferShape (#2293)

* refactor keras api

style and imports

fix

* fix warning

* fix python

* fix ser test

* fix test

* fix test

* remove checking for weight sharing

* fix unittest

* Keras-like API for training and evaluation (#2306)

* update scala compile fit

* update

* python topology

* refactor lenet

* update

* refactor fit

* add python ut

* ut for image dataset

* fix ut

* fix

* update docs

* update readme

* fix

* [BugFix] TensorOp & SelectTensor (#2380)

* solve conflicts

* resolve conflict again

* Keras-like API functional merge and some fix (#2313)

* resolve conflicts

* update merge

* update merge

* update doc

* refactor python

* fix

* style

* meet review

* Fix reload model in python (#2382)

* fix

* fix default

* Public getInputShape and getOutputShape (#2401)

* open getInputShape and getOutputShape

* clean

* remove logger

* fix

* [Bug Fix] Fix duplicate check sometimes should be suspend (#2403)

* allow keras.Input layer skip duplicate check

* fix some bug

* add some comments

* meet code review

* fix timedistributed to make it compatible with 0.4 (#2408)

* Add Keras Website Documentation for Layers I (#2414)

* add core layers doc

* add advancedactivations and convolutional layers doc

* add dropout, embedding and normalization layers doc

* add pooling and recurrent layers doc

* update

* change

* update

* add merge

* add embedding

* change data

* modify

* update

* update

* change

* update

* Refine Keras-like API LeNet definition (#2407)

* lenet keras seq

* update

* refine

* update

* update

* fix java creator

* meet review

* fix lenet definition

* fix lenet

* test

* fix lenet

* update

* fix

* remove

* [Bug Fix] Fix typo in SpatialSeparableConvoluiton layer name and add related docs (#2420)

* fix typo in spatialseparableconvoluiton name and add docs

* meet code review

* meet code review and fix tests

* Add Keras Website Documentation for Layers II (#2421)

* add zeropadding doc

* modify

* add upsampling

* add atrousconvolution

* add deconvolution2d

* add more

* add locallyconnected

* refactor model evaluate (#2434)

* Keras-Style API website doc for training; refine doc format (#2441)

* training api

* add

* finish

* remove

* fix

* refine doc

* update

* remove

* update

* [Enhancement] ImageFrame adding more APIs (#2464)

* add api for evaluation

* rename method

* fix

* wrapper parquet

* fix api

* fix read issue

* fix BCE return Nan (#2473)

* [bug fix]refine getTimes and time counting. (#2506)

* refine getTimes

* delete some useless code

* sort times

* add unit test

* [New Feature] Add new operation Gather (#2510)

* add opteration gather

* add comments

* gather support float indices

* add gather unit test

* some change

* fix style check

* fix squeeze test (#2509)

* [new feature]add operation max (#2523)

* add max

* add serialization test

* max spec

* meet code review

* meet code review

* fix unit test

* [new feature]add generateBackward for loadTF (#2529)

* max support one-element tensor indices (#2530)

* fix const status not handled correct in loop (#2531)

* add parameter sync for batchnorm (#2559)

* add parameter sync for batchnorm

* fix test

* fix test

* fix test

* fix test

* refinement

* fix

* fix test issue

* add backward compatibility

* refine to set Id instead of renaming

* fix style issue

* refinement per review

* fix

* add change to example

* refinement

* fix ut issue to avoid multiple context

* add comments

* fix style

* [new feature] refine Stridedslice, support begin/end/shrinkAxis mask (#2526)

* refine stridedslice

* delete some file

* meet code review

* fix serial unit test

* fix serialization test

* [new feature] multi optimMethods support in Optimizer (#2560)

* multiOptimMethod

* some update

* fix unit test

* fix ut

* fix unit test

* fix python unit test

* meet code review

* update optimizer.py

* update python

* meet code review

* fix: memory leak in `model.predictImageSet`. (#2557)

* fix: memory leak in `model.predictImageSet`.

There're three reasons of memory leak.

1. repeat allocations in bigquant, which will be fixed in BigDL-core.
2. repeat clone module but no release. `model.predictImageSet` will new
   Predictor again and again.
2. share weights.

This patch add a `StorageManager` which contains a concurrent hash map
to maintain all allocations of native memory/resources and prevent
duplicate release. It's also helpful for debug.

* fix: delete .

* refator:  as the API for AbstractModule

* fix: distribute predictor memory leak

* fix: move delete operation to ModelBroadcast

* refinement per review

* fix ut

* fix scala version issue

* Feat: MKL-DNN Supports (#2482)

This feature enables mkl-dnn support, which can speed up deep learning model. We wrapper the native c api in the java, which are in BigDL-core projects. And in BigDL, we integrated the convolution, batchnorm, maxpooling, avgpooling, relu, lrn, softmax, caddtable and concattable. Currently, it  supports create the model which only contains dnn layer or container.

Because the data layout is optimized in mkl-dnn. The mkl-dnn model will use `DnnTensor` which contains the native buffer as a default tensor. So there're some notations,

1. User should copy the data from jvm heap at the first layer and copy back to jvm heap at the last layer.
2. User should compile the model, which contains the phase (training/inference) and input tensor size. It will infer and allocate the other information.

* fix: linear performance issue and serialization of java object in MklDnnTensor

* memory leak refactor

* memory leak and bn performance issues

1. Memory Leak
The internal buffer with MklDnnTensor should not be re-assigned without
releasing. So we should check it first. At first iteration or after the
changing of input size, we create a new MklDnnTensor as a buffer.

2. Bn perf
The JIT BatchNormalization only supports avx2 or avx512, which has much
batter performance than ref version. The input and gradOutput format
should be the same to get the best performance.

* test: add some test cases for BatchNorm.

The computation of float value is not the same as C/C++/Native with JVM.
And batch norm will make it much greater such as 10^-8 -> 10^-4 -> 10^-1

* fix: rebase with upstream master:

1. Concat and ConcatTable should inherit from DynamicContainer.
2. updateParameters has been depricated.
3. zeroGradParameters should be final. But from now on, the Linear
   should use it.
4. Some other syntax or semantic errors.

* perf: single node and single model performance

* perf: single model

* feat: add fusion for mkl-dnn

* test: add test utils to compare dnn output

* test: add some tests compared with caffe

* add unit tests for dnn tensor

* add unit test for reorder memory

* test: fix the test regression errors

* checkin reorder manager

* add backward for sequential

* fix some bugs

* update core ref

* add unit tests

* refactor: move the static class DataType, AlgKind and so on to standalone class (#4)

* refactor: delete MklDnn.MemoryFormat

* refactor: move the static class DataType, AlgKind and so on to standalone class

* fix: core refactor errors

* refactor: spec errors (#5)

* Mkl dnn dev (#6)

* checkin reorder manager

* add container and refine reorder manager

* fix merge issue

* add join table forward

* refine inteface (#7)

* add LRN and ReLU

* add pooling

* refactor: conv + linear + bn

* add JoinTable backward

* refactor: conv + linear + bn

* add cAddTable concattable

* fix: reorder failed on some of convs

* refactor: softmax

* refactor: fusion support

* refactor: resnet_50

* refactor: move tests to this branch

* refactor: delete unusefull files and enable the special old tests.
refactor: delete unsed methods in MklDnnOps
fix: scalastyle check

* fix: rebase with upstream

* fix: ignore the prototxt tests

* fix: do not change the core commit ref

* fix: move set num of threads for mkldnn to ResNet50Perf

* fix: serialization disabled for mkldnn module

* [Issue fix] - Fix MM layer multi forward/backward issue (#2583)

* fix mm issue

* refinement

* [Bug Fix]clear preTopology's output while cloneCells (#2585)

* clear preTopology's output while cloneCells

* fix unit test

* Dnn model serialization supports. (#2598)

* feat: add simple serialization supports

* feat: mkl-dnn modules serialization supports

* fix: make primitive(desc) to private

* fix: typo

* fix: modified based on comments

* test: private to call api.

* Add serialization for mkl dnn (#2593)

Add dense weights and gradients and support optimizer (local, distribute).

Add a `Blob` for the pair of dense and native weights/gradients with the MemoryData layout.

* Modify the thread pool to adopt mkldnn models (#2608)

The `Engine.default` will support single thread including `LocalOptimizer` and `DistriOptimizer`. For supporting single thread version of `invokeWait2` method in `ThreadPool`, it will set the threadpool to current thread.

1. For dnn model, it will use the affinity to bind omp thread. And for performance issue, the default thread must use current main thread.
2. For MTLabeldBGRImgToBatch will use another new threads pool which is called io. So it will not be blocked when the default thread pool is single thread.
3. For FileWriter, it will not use default, otherwise the whole app will stuck at creating summary.

* feature: add shutdown for optimizer which will release the native resources (#2609)

Release native resources at the end of training.

It will call `release` of model for all models cloned in optimizer at the end of training.

1) `LocalOptimizer` is very simple because all models cloned is local.
2) `DistriOptimizer` is a little complicated. We should do release before `models.unpersist`, otherwise it
     will serialize and transfer the model again. And `ModelBroadcast` will clone new model when do
     value, so we should release them also.

* NLL unlabeled data fix (#2620)

* fix: the inference performance regression of mkldnn (#2622)

We should copy weights when updateOutput at training. The weights are loaded before and will not be changed when do inference.

* feat: training ResNet50 w/ dnn backend on Spark. (#2624)

* feat: resnet50 training with distributed mode
* fix: unknown segmentfault
* fix: clone of dnn tensor
* fix: delete unused codes
* fix: bn initialization
* fix: performance regression
* fix: convergence regression
* fix: delete the release in ModelBroadcast
* fix: to pass all uni tests and delete segment fault.

* feat: add dnn vgg model. (#2627)

* feat: add dnn vgg model.

* fix: rename the ResNet50Perf to Perf

* fix join table will throw exception during backward if batchsize is changed (#2638)

* fix join table backward

* change to resize as

* feature: vgg-16 with mkldnn backend (#2631)

* feat: vgg-16 with mkldnn backend
* fix: tests errors
* fix: case class too much arguments
* fix: vgg_16 blas model supports
* fix: protobuf of serializer
* fix: sgd poly test case error
* fix: consitent of poly impl
* fix: rename the version2 of Xavier to varianceNormAverage

* perf: need not narrow the gradients and zero gradients for dnn backend (#2632)

* perf: need not narrow the gradients and zero gradients for dnn backend
* fix: empty gradient zero for dnn backend
* fix: delete affine

* New parallel optimizer (#2643)

* add new parallel optimizer

* change infor back to debug for metrics log

* refinement per comments

* refinement per comments on single model optimization

* refinement for sharing common methods

* fix style

* refinement to reuse duplicate code

* Fix transfer learning (#2645)

* fix transfer learning
* add ParseSingleExample, DecodeBmp tf loader
* add corresponding unit tests

* remove potential performance downgrader (#2651) (#2652)

* Fix transfer learning (#2645)

* fix transfer learning
* add ParseSingleExample, DecodeBmp tf loader
* add corresponding unit tests

* remove potential performance downgrader

* add dnn graph (#2666)

* add dnn graph

* move compile to forward, add graph test to perf

* add dnn graph option to example

* style check

* replace dnn with dnn graph in examples

* add no phase api when initPrimitives (#2686)

* delete phase in iniPrimitives

* fix style check

* improve memoryReorder layer to handle conversion between nhwc and nchw (#2683)

* fix reorder to handle nhwc

* add init memory for ReorderMemory

* support same padding in dnn layer (#2684)

* support same padding in dnn layer

* meet review

* add BlasWrapper (#2690)

* add BlasWrapper

* refactor code

* meet review

* SerializerSpec excluded mkldnn.BlasWrapper

* change some comments

* add dnn output layer (#2691)

* add dnn output layer

* SerializerSpec excluded mkldnn Output

* change some comments

* add IR graph and conversion from IR graph to blas graph or dnn graph (#2704)

* add ir graph

* fix model evaluate & conv without bias

* add dnnMode & support table inputs

* irelement & graph layer use same weights

* meet pr comments and code refactor

* convert static graph to IR graph and build (#2711)

* add static graph to IR graph

* meet pr comments

* fix: move mkldnn computing to a single thread pool (#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (#2737)

* refactor predict for dnn model

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* enable fustion by default (#2766)

* fix: use too much memory of mkldnn models (#2783)

* fix: inplace of input/output and weight dimension error (#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (#2778)

* fix softmax (#2777)

* fix: performance regression on resnet50 (#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* bugfix - set mask for container (#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* fix: memory data hash code should contain data type (#2821)

* Optimize backward graph generation and CAddTable (#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (#2823)

* feat: enable global average pooling

* test: add more unit tests

* Dilation in MKL-DNN Convolution (#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (#2822)

* [New feature] add transformer layer (#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* Support init_spark_on_yarn and RayContext (#1344)

* rayrunner

* add a jvm killer

* disable killer from spark job and rely on jvm killer only

* add env and verify the cv2 installation

* enhance

* minor

* style

* comments

* local and enhancement

* better local strategy

* doc and style

* doc

* style

* revert

* doc

* disable

* comments

* fix test

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* bug fix for cmul (#2836)

* bug fix for cmul

* meet pr comments

* Enhance Ray on spark (#1449)

* add more doc, spark_conf and extra_options

* fix pip install

* doc

* fix release.sh

* We should search and find the necessary jars from python env rather than upload them again (#1460)

* fix path

* jars

* set new storage to weight and bias for weight fusion (#2839)

* Add transformer to LM example (#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (#2824)

* fix: fusion for multi-group of convolution (#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix acc bug & init dnn thread (#2841)

* support tnc and ntc conversion (#2844)

* support ntc in dnn layer (#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* Remove ray dependencies from init_spark_on_yarn (#1500)

* remove ray dependencies

* delete

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (#2858)

* Add beam search in transformer (#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (#2854)

* feat: add axis to softmax (#2859)

* Add an example notebook for implementing PS with Ray  (#1522)

* upup

* add ps notebook for ray

* Expose more option for driver in RayContext (#1541)

* expose more option for driver

* minor

* more

* modify run-pytests to check version before test ray (#1532)

* Create .keep

* update .keep path

* add rl_pong example

* move to rl_pong direction

* remove original file

* add parameter server example

* add license

* PEP8 checks

* PEP8 checks

* Add into integration test

* Wrap tests into bash function

* Update license

* PEP8 checks

* Correct syntax of rl_pong

* modify run-pytests to check version before test ray

* test pyspark version and spark home

* add check spark_home's pyspark in case pyspark can't be found

* add check version before run ray examples

* change spark home

* change spark home

* install packages which are needed in ray examples

* check error

* fix error

* change execution to spark-submit

* change memory

* change object memory to test

* add atari_py dependency

* remove .keep

* move ray test to new files

* change some ray-pip lines into function

* remove rl_pong and fix parameter_server iterations

* add iteration

* change iterate, print info

* add more info

* add __init__ files

* change ray to rayexample to avoid conflict and change spark-submit to python to submit tasks

* renamed foreach_evaluator to foreach_worker because rllib update and rename file rllib to rllibexample

* add a dedicated file for the ray test

* PEP8 check fix

* PEP8 check fix

* remove test_split

* remove --doctest-modules about ray

* add time.sleep

* feat: RoiAlign Forward (#2874)

* feat: Feature Pyramid Networks Forward (#2870)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (#2897)

* add gather layer

* [New feature] Add maskhead (#2892)

* support for maskhead

* fix ray and add more test (#1596)

* fix ray and add more test

Signed-off-by: Jieru Hong <hongjieru30@gmail.com>

* modify raycontext and move test file to func

Signed-off-by: Jieru Hong <hongjieru30@gmail.com>

* modify process and add sc.stop in the end

Signed-off-by: Jieru Hong <hongjieru30@gmail.com>

* delete one repeat and check PEP8

Signed-off-by: Jieru Hong <hongjieru30@gmail.com>

* change file name and remove some useless code

Signed-off-by: Jieru Hong <hongjieru30@gmail.com>

* rename test yarn reinit file

Signed-off-by: Jieru Hong <hongjieru30@gmail.com>

* ignore test reinit raycontext

Signed-off-by: Jieru Hong <hongjieru30@gmail.com>

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (#2902)

* remove duplicated layers

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* Fix memory leaks on training (#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support batch for mask head and pooler (#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* Onnx support: add pos parameter to softmax (#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* revert back api (#2943)

* fix: softmax and bn+scale fusion (#2937)

* feat: multi models support with MKL-DNN backend (#2936)

* feat: multi models support with MKL-DNN backend

* add no argument apply api for softmax (#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* add maskrcnn inference example (#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Minor fix for detecting sc is local (#1735)

* fix sc local

* update

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* add callZooFunc and change all callBigDlFunc to callZooFunc (#1793)

* feat: add softmax backward (#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (#2966)

* fix: fuse bn scale and relu.

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (#2960)

* ROIAlign refactor

* fix unit tests

* support roialign backward (#2975)

* support roialign backward

* fix sparselinear unit test

* Remove default mkl settings when start ray (#1837)

* fix: bn nhwc error, the channel should be the last dim (#2981)

* fix: softmax dnn backend wrong order of primitive (#2986)

* Add a method to merge nested StaticGraphs (#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Fix ray get in related unit tests (#1985)

* move automl ut test from run-pytests into run-pytests-ray, add limitation for object_store_memory

* config unit tests

* fix echo message in tests script

* Support MXNetTrainer (#2143)

* add mxnet support

* add dummy data

* refactor api

* minor

* print ip

* update to the latest

* some update

* remove image classification example and update lenet

* add lenet

* fix style

* minor update

* update example

* add unit test

* style

* meet review

* style

* update

* fix resource tags

* update

* try to fix

* stop sc and ray_ctx

* add conftest to fix ut

* fix ray local test

* trial

* test

* move mxnet to package

* update conftest

* revert

* update

* [Fix issue #2150] Remove waiting in Ray init (#2151)

* remove wait

* remove waiting before and after

* Xshard pandas on ray support (#2115)

* add xshard interfaces

* add example

* update hdfs,s3 implementation

* update xshard api and example

* remote get_shards methond

* update support for s3

* update pyarrow order

* add comments, docs, update example

* update example readme

* update code, add test

* update apply, read

* update text

* update by comments

* restore gitignore

* update license

* update test

* add conftest

* fix style

* fix style

* add xshard test

* update pytest

* update pytest

* Refactor RayContext (#2175)

* Add doc for MXNetTrainer and some code refactor (#2198)

* add doc and some code refactor

* minor

* update

* minor

* Change ray import folder structure (#2194)

* rebase

* revert automl readme

* Add README for MXNet LeNet example (#2208)

* initial readme

* update

* typo

* typo

* add batch size

* update style

* add validation for gluon

* fix style

* minor

* update

* minor

* more doc

* minor

* Upgrade ray to 0.8.4 (#2249)

* fix ray ip mismatch

* adjust to ray 0.8

* upgrade ray

* upgrade automl ray 0.8.4, feature ut failed

* fix test case and doc

* fix style

* fix style

* fix rllib

* update docker

* remove docker changes and change setup.py

* remove docker change

* reflect ray changes

* fix style

* fix cluster

Co-authored-by: Yu Shan <yushan0105@163.com>
Co-authored-by: Shan Yu <shan.yu@intel.com>

* orca init (#2304)

* orca init

* xshard migration

* doc fix

* add license

* indent

* add csv files

* fix path

* Expose driver core in RayContext; polish ray docs (#2315)

* update doc

* style

* Orca MXNetTrainer migration (#2320)

* migrate mxnet_trainer to estimator

* fix

* newline

* indent

* fix test path

* fix

* ignore mxnet estimator test in spark2.4-

* estimator to trainer

* style

* Remove final for AbstractModule (#3001)

* Add initial version of PyTorchTrainer in orca.learn (#2349)

* add torchtrainer

* style

* meet review

* update

* fix typo

* add ut

* add test

* RayContext init return address info (#2376)

* return

* meet review

* style

* Get the current RayContext (#2411)

* initial

* update

* minor

* remove final setExtraParameters (#3014)

* deprecate nn.keras (#3013)

* deprecate nn.keras

* Move orca learn ray tests to a ray sub dir (#2577)

* move orca learn ray tests to a ray sub dir

* fix path

* fix KerasLayer new parameters() (#3034)

* Update init_spark_on_yarn (#2587)

* update

* meet review

* update doc

* style

* Minor update on Ray (#2604)

* minor update

* update

* Fix Spark configurations in RayContext (#2642)

* update

* meet review

* style

* minor

* update

* Add init_spark_standalone for local node (#2685)

* initial version

* remove

* update to local

* remove

* style

* fix path

* fix

* update pythonhome

* Add horovod estimator save/load/get_models/shutdown (#2702)

* add get_model/save/load/shutdown for pytorch horovod estimator

* change args

* remove condition

* change name

* remove get multiple models

* add pytorch estimator ut and disable for now

* meet comments

* Add Horovod tests (#2761)

* add pytorch horovod tests

* add horovod tf tests

* fix

* fix style

* fix tests

* fix tests

* fix tests

* fix tests

* fix tests

* fix tests

* Add init_orca_context (#2774)

* initial imple

* update

* meet review

* review and style

* remove stopped

* add doc

* minor

* move import

* fix mxnet

* remove

* add file lock on ray start to avoid port conflict on the same machine (#2777)

* Update UTs and examples with init_orca_context (#2787)

* update unit tests

* minor

* update

* update mxnet

* move barrier

* fix mxnet

* update

* bug fix

* update

* update test

* update mxnet example

* update mxnet

* minor

* minor

* minor

* update examples

* move ray import dependencies

* readme

* minor

* bug fix

* remove default

* fix duplicated ray worker logs (#2799)

* fix multiple ray worker logs

* refine comments

* Update raycontext.py

* Support init_spark_on_k8s (#2813)

* initial

* fix

* code refactor

* bug fix

* update docker

* style

* Support RayOnSpark for k8s and add docs (#2836)

* support ray on k8s

* add to init orca context

* style

* minor

* minor

* ut

* Hotfix for ray psutil on macOS (#2856)

* Fix psutil test fail on macOS.
* Add exception handle for running without root.

* [WIP] spark 3.0 (#3054)

* spark 3.0

* expose tcmf num_workers to zouwu (#2884)

* expose tcmf num_workers to zouwu

* fix style & add logger handler & default num_workers

* change backend to horovod

* change zouwu forecast base test case from ZooTestCase to TestCase

* split zouwu model forecast test into with and without ft

* change import

* change import

* change zouwu tests structure

* fix typo

* expose evaluate and fix num_workers after rebase

* change default value of num_workers

* Add ray rdd (#2996)

* add ray rdd

* fix style

* add more tests

* Fix orca ray pytorch example (#3007)

* fix horovod pytorch exampe

* fix bug

* fix process group

* fix style

* fix tests

* fix test

* fix tests

* revert ray context change

* Fix validate in Orca PyTorch Estimator (#3012)

* fix validate

* rename

* fix

* fix ut

* update

* minor

* fix ut

* squeeze target dimension (corner case) in ClassNLLCriterion (#3072)

* fix target dimension match error

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix locale environment variables when launching Ray (#3167)

* run ray stop before ray tests started (fix ray memory test fail issue) (#3193)

* run ray stop before ray tests started

* add more ray stop

* fix ray memory issue

* change scope to class

* change mxnet scope to function

* attempt to fix ray memory (#3205)

* attempt to fix ray memory

* exclude webui

* Update doc for RayContext (#3211)

* update

* update

* Fix macOS ZombieProcess Exception (#3221)

* Fix macOS ZombieProcess
* Add ProcessLookupError catch

* add serializeUid (#3099)

* update doc (#3104)

* upgrade ray to 1.0 (#3257)

* upgrade ray to 1.0

fix automl

ray port

* fix tests

* fix bug

* fix bug

* fix tests

* fix example

* fix example

* fix tests

* change back

* comment out test

* upate setup

* Add initial version of auto estimator (#3731)

* add initial version of auto estimator

* support str for torch optimizor and loss

* move create searcher

* change name to model_builder

* add best model

* add convert probability to class in metrics.Accuracy

* change condition order

* add pytorch ut

* change util name

* add LR_NAME

* add document for LR_NAME

* add check with best model and optimizer class test

* add ut for tf.keras

* move tests to orca/automl/autoestimator/

* change name

* add auto test

* fix pep8

* remove return in fit

* add raise error for fit multiple times

* remove optimizer class

* fix bug

* change error message

* read parquet dataset as tf.data.Dataset (#3956)

* Change zouwu to chronos in source codes (#4000)

* change zouwu to chronos in all codes

* change autotcn location

* change zouwu to chronos in notebook

* fix conflict

* fix conflict again

* fix bug

* manual check fix

* links in doc

* Add support for non-barrier mode to launch ray (#4014)

* add support for non-barrier mode

* fix style

* meet review

* meet review

* move barrier mode to zoocontext

* bug fix

* modify

* update

* May fix jenkins AutoEstimator randomly fail (#4164)

* change order of autoestimator test

* reduce core num and trial num

* Add assert error message when launching Ray for non-barrier mode (#4221)

* Be compatible to ray 1.5.0 (#4387)

* compatible to ray 1.5.0

* refine

* fix raycontext (#4412)

* Kill process group instead of iterator of pids in shutdown hook (#4494)

* kill process group instead of process iter

* change name

* change name

* update doc

* fix style

* change to string

* Move automl.model to orca.automl (#4667)

* delete common folder

* fix reference

* move test

* modify scripts

* Delete automl (#4680)

* rm automl

* rm test automl

* rm automl in tests

* Add ray daemon to kill ray processes (#4571)

* add ray daemon

* remove in bigdl

* add ray daemon in start_restricted_worker

* change to static method

* remove ProcessMonitor.register_shutdown_hook and clean_fn

* change name

* clean useless code

* add license

* zoo.ray -> bigdl.orca.ray; zoo.util->bigdl.dllib.utils

* zoo -> bigdl.dllib.utils.nncontext

* comment out other tests than ray

* change path in run-pytest-ray

* update test script and rename tfpark package

* fix pythotfpark ut I

* turn off scala style check (#4695)

* add resources for python tfpark

* fix prepare_env

* add __init__.py in test subfolders

* remove orca/src/test

* fix scala style check and disable header check  (#4715)

* uncomment partial keras ut test (#4716)

* move autograd to keras/autograd and migrate local estimator (#4721)

* bigdl keras private (#4731)

* add inferenceModelLoadOpenVINONg (#4730)

* uncomment TimeDistributed (#4736)

* move bigdl.nn to bigdl.dllib.nn

* rm bigdl keras ut test from keras

* update keras path in ut

Co-authored-by: Ian Wong <yiheng.wang@intel.com>
Co-authored-by: tosky001 <yaojianhang001@126.com>
Co-authored-by: li,zhichao <zhichao.li@intel.com>
Co-authored-by: Kai Huang <huangkaivision@gmail.com>
Co-authored-by: Xu Xiao <lovedreamf@gmail.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Quincy2014 <412363303@qq.com>
Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: Griffin Kardos <kardosgriffin@gmail.com>
Co-authored-by: megaSpoon <bowen.she@intel.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: Jieru Hong <30741238+respecteverything@users.noreply.github.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Yu Shan <yushan0105@163.com>
Co-authored-by: Shane Huang <shengsheng.huang@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Shan Yu <shan.yu@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Qiyuan Gong <qiyuan.gong@intel.com>
Co-authored-by: Junwei Deng <35031544+TheaperDeng@users.noreply.github.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 17, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Litchilitchy added a commit to Litchilitchy/analytics-zoo that referenced this pull request Sep 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant