Skip to content
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.

Feat encoder generic auto trainable #8630

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

sidphbot
Copy link

@sidphbot sidphbot commented Jun 5, 2021

AutoKerasEncoder trains and encodes the documents with custom(best) encoder architecture suiting the dataset, using neural architectural search via AutoKeras.

  • Data Format: tuple of numpy.ndarray or tf.data.Dataset. The two elements are,

    1. input data - x
      for vision (image) : The shape of the data should be should be (samples, width, height) or (samples, width, height, channels).
      for bert (text) : The data should be one dimensional. Each element in the data should be a string which is a full sentence.

    2. output data - y (labels)
      for classification based training : It can be raw labels, one-hot encoded if more than two classes, or binary encoded for binary classification. The raw labels will be encoded to one column if two classes were found, or one-hot encoded if more than two classes were found.
      for regression based training : It can be single-column or multi-column. The values should all be numerical.

  • model architectures checked and tuned

    'vision' mode : ResNet(variants), Xception(variants), conv2d
    'bert' mode : Vanilla, Transformer, ngram

Usage:

encoder = AutoKerasEncoder(model_type='vision')                 # init 
encoder.train((x_train, y_train))                 # architecture search and train 
encoder.encode((x_catalog, y_catalog))                 # encode

or,

encoder = AutoKerasEncoder(model_type='vision')                   # init 
encoder.encode((x_full, y_full))                 # architecture search, train and encode

@maximilianwerk
Copy link
Member

Hey, thanks a lot for your contribution. I can see the value in having this integrated training/encoding for Jina.

Could you clean up the code a bit more? There are quite some left-over comments, TODO's and I am pretty sure, the training via the the encode function does not work.

Thanks a lot.

@sidphbot
Copy link
Author

sidphbot commented Jun 7, 2021 via email

@sidphbot
Copy link
Author

sidphbot commented Jun 7, 2021

Hey, I have updated the branch with the fix for the training inside encode and included related tests. I have tried to keep only informative comments and TODOs for strictly future performance/feature improvements. Please check once.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants