Feat encoder generic auto trainable #8630

sidphbot · 2021-06-05T02:30:13Z

AutoKerasEncoder trains and encodes the documents with custom(best) encoder architecture suiting the dataset, using neural architectural search via AutoKeras.

Data Format: tuple of numpy.ndarray or tf.data.Dataset. The two elements are,
1. input data - x
  for vision (image) : The shape of the data should be should be (samples, width, height) or (samples, width, height, channels).
  for bert (text) : The data should be one dimensional. Each element in the data should be a string which is a full sentence.
2. output data - y (labels)
  for classification based training : It can be raw labels, one-hot encoded if more than two classes, or binary encoded for binary classification. The raw labels will be encoded to one column if two classes were found, or one-hot encoded if more than two classes were found.
  for regression based training : It can be single-column or multi-column. The values should all be numerical.
model architectures checked and tuned

'vision' mode : ResNet(variants), Xception(variants), conv2d
'bert' mode : Vanilla, Transformer, ngram

Usage:

encoder = AutoKerasEncoder(model_type='vision')                 # init 
encoder.train((x_train, y_train))                 # architecture search and train 
encoder.encode((x_catalog, y_catalog))                 # encode

or,

encoder = AutoKerasEncoder(model_type='vision')                   # init 
encoder.encode((x_full, y_full))                 # architecture search, train and encode

maximilianwerk · 2021-06-07T06:30:36Z

Hey, thanks a lot for your contribution. I can see the value in having this integrated training/encoding for Jina.

Could you clean up the code a bit more? There are quite some left-over comments, TODO's and I am pretty sure, the training via the the encode function does not work.

Thanks a lot.

sidphbot · 2021-06-07T11:01:51Z

Hi, Thank you gor reviewing the pull request. I will clean up the comments, it seems i missed cleaning some then. However the TODOs are for future scope and i will not be implementing them right away dur to time constraints. Apologies, if the training inside encode has some bug, let me test it again and include some tests for that as well. I will update within 1-2 days. Thanks and regards Sidharth Pal

…

On Mon, Jun 7, 2021, 8:30 AM Maximilian Werk ***@***.***> wrote: Hey, thanks a lot for your contribution. I can see the value in having this integrated training/encoding for Jina. Could you clean up the code a bit more? There are quite some left-over comments, TODO's and I am pretty sure, the training via the the encode function does not work. Thanks a lot. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#8630 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFAIMSYKJABNO4BARSL5DELTRRRR3ANCNFSM46D6IQQA> .

sidphbot · 2021-06-07T12:12:08Z

Hey, I have updated the branch with the fix for the training inside encode and included related tests. I have tried to keep only informative comments and TODOs for strictly future performance/feature improvements. Please check once.

sidphbot added 3 commits June 5, 2021 02:08

feat(encoder): generic trainable encoder with automl

18835aa

feat(encoder): generic trainable encoder with automl

031a33b

Update README.md

2268c25

jina-bot added size/L component/encoder tests type/dockerfile type/manifest type/python labels Jun 5, 2021

sidphbot added 2 commits June 7, 2021 12:00

fix: train inside encode, included tests

36b967f

Update README.md

e921c25

Create config.yml

073c13b

jina-bot added the type/config label Jun 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat encoder generic auto trainable #8630

Feat encoder generic auto trainable #8630

sidphbot commented Jun 5, 2021

maximilianwerk commented Jun 7, 2021

sidphbot commented Jun 7, 2021 via email

sidphbot commented Jun 7, 2021

Feat encoder generic auto trainable #8630

Are you sure you want to change the base?

Feat encoder generic auto trainable #8630

Conversation

sidphbot commented Jun 5, 2021

maximilianwerk commented Jun 7, 2021

sidphbot commented Jun 7, 2021 via email

sidphbot commented Jun 7, 2021