Bangla-sentence-embedding-transformer

This is a Transformer base bangla sentence embedding. I trained 2,50,000 Bangla sentences(wiki) by sentence transformer. Embedding dimension is 300d.

What do you get from here?

A pretrained Sentence embedding using transformer
How to add new data and train it?
How to create Your own pretrained sentence embedding model?

Python package

sentence_transformers

#install it by below command

pip3 install sentence_transformers

Model download

As my model size is 1.1gb, I can't upload it here. So i upload it in google drive. drive link

Or You can use our python module sbnltk . Check it!

Clone this project, then download my model. After download, unzip the folder in 'Bangla-sentence-embedding-transformer' directory.

How to use it?

from Bangla-sentence-embedding-transformer.Bangla_transformer import Bangla_sentence_transformer_small

transformer=Bangla_sentence_transformer_small()

sentences=['আপনার বয়স কত','আমি তোমার বয়স জানতে চাই','আমার ফোন ভাল আছে','আপনার সেলফোনটি দুর্দান্ত দেখাচ্ছে']

sentences_embeddings=transformer.encode(sentences)

for i in range(len(sentences)):
    j=i+1
    while j<len(sentences):
        s1=sentences[i]
        s2=sentences[j]
        print(s1,' --- ',s2,transformer.similarity(sentences_embeddings[s1],sentences_embeddings[s2]))
        j+=1

Output:

আপনার বয়স কত  ---  আপনার বয়স কত tensor([[1.0000]])
আপনার বয়স কত  ---  আমি তোমার বয়স জানতে চাই tensor([[0.8607]])
আপনার বয়স কত  ---  আমার ফোন ভাল আছে tensor([[0.1994]])
আপনার বয়স কত  ---  আপনার সেলফোনটি দুর্দান্ত দেখাচ্ছে tensor([[0.2581]])
আমি তোমার বয়স জানতে চাই  ---  আমি তোমার বয়স জানতে চাই tensor([[1.0000]])
আমি তোমার বয়স জানতে চাই  ---  আমার ফোন ভাল আছে tensor([[0.1960]])
আমি তোমার বয়স জানতে চাই  ---  আপনার সেলফোনটি দুর্দান্ত দেখাচ্ছে tensor([[0.2495]])
আমার ফোন ভাল আছে  ---  আমার ফোন ভাল আছে tensor([[1.0000]])
আমার ফোন ভাল আছে  ---  আপনার সেলফোনটি দুর্দান্ত দেখাচ্ছে tensor([[0.9281]])
আপনার সেলফোনটি দুর্দান্ত দেখাচ্ছে  ---  আপনার সেলফোনটি দুর্দান্ত দেখাচ্ছে tensor([[1.0000]])

How to add and train new data?

If you want to train more data or add data, you should install 'Cuda' GPU.
If you haven't any nvidia graphics card, You should use Google Colab GPU.
CPU is very much slow for transformers model.

Suppose you have a dataset, now you want to train and with my model.

from Bangla-sentence-embedding-transformer.Bangla_transformer import Bangla_sentence_transformer_small

transformer=Bangla_sentence_transformer_small()

path='/dataset.txt'
transformer.train(path)

Run this in google colab. You must download my model from drive

How to create my own model?

It was pretty much same way.

from Bangla-sentence-embedding-transformer.Bangla_transformer import Bangla_sentence_transformer_small

transformer=Bangla_sentence_transformer_small()

path='/dataset.txt'
transformer.train_new(path)

You don't need to download my model, if you want to create your own model

How to prepare a dataset?

This model needs parallel dataset of english-bangla. First line of your text file must be a english and bangla sentence separated by a tab. Sentence length should be less than 128

English sentence1 \tab Bangla sentence1
English sentence2 \tab Bangla sentence2
English sentence3 \tab Bangla sentence3
- - - 
- - -

Suppose you have only bangla Sentence, You can use Google translator and manually check it. Or you can directly use it. Google translator accuracy(85%).

About my model

I prepared 2,50,000 parallel dataset for training using google translator. Then i roughly check it.

Epochs=5
Every Epochs iteration=7000
Device=google colab gpu
MSE=11.39
Evaluation_size=500
training_time=3 hours 44 minutes

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Bangla_transformer.py		Bangla_transformer.py
README.md		README.md
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bangla-sentence-embedding-transformer

What do you get from here?

Python package

Model download

How to use it?

How to add and train new data?

How to create my own model?

How to prepare a dataset?

About my model

About

Releases

Packages

Languages

Foysal87/Bangla-sentence-embedding-transformer

Folders and files

Latest commit

History

Repository files navigation

Bangla-sentence-embedding-transformer

What do you get from here?

Python package

Model download

How to use it?

How to add and train new data?

How to create my own model?

How to prepare a dataset?

About my model

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages