Awesome datasets for Bangla language computing.
-
Updated
Mar 7, 2022 - Python
Awesome datasets for Bangla language computing.
Bangla news classification and generation
Different bangla datasets for sentiment analysis on bangla text
A collection of Bangla newspaper and blog crawlers. Can be used to mine bangla text data for Natural Language Processing tasks.
Nirmol is an open-source dataset and API for detecting Bangla slang words. Detect offensive/bad/slang words in Bangla/Bengali/Banglish sentences. A helpful API and dataset for developers and researchers.
Scrape 4000+ Bangla Song Lyrics
Bangla dataset for Opinion Mining
Zilla-64: A Bangla Handwritten Word Dataset Of 64 Districts Name of Bangladesh and Recognition Using Holistic Approach
"WBSUBNdb_text: Bangla handwritten text document dataset" is a Bangla text dataset containing 1383 offline handwritten text documents contributed by 190 writers. The dataset is composed of both simple and compound characters.
Bangla Q&A dataset that contains questions, answer and paragraphs to train your model
In this project, we have built a database of Bangla Handwritten Letters which contains handwritten images of 84 Bangla letters (10 numerals, 11 vowels, 39 consonants, 24 compound letters). We also investigated some of the existing Bangla character recognition models and found that these models have lower accuracy when the database contains some …
Noise Identification, Noise reduction, and Sentiment Analysis on Bangla Noisy Texts
Bengali Natural Language Processing(BengaliNLP)
Bengali/Bangla Fake Review Detection Dataset
The official GitHub repository of the Bangla Visual Question Answering (VQA) system ChitroJera
This is the official repository of the paper titled "BnPC: A Gold Standard Corpus for Paraphrase Detection in Bangla, and its Evaluation", accepted in The 17th Workshop on Building and Using Comparable Corpora (BUCC 2024) co-located with LREC-COLING 2024. It contains the codes and the dataset.
Implementation of the paper 'Towards Full page Offline Bangla Handwritten Text Recognition using Image-to-Sequence Architecture'. For details, please read the README section.
Handwritten Bangla Character Classification using ResNet-34 trained using BanglaLekha Dataset. System has been implemented in PyTorch. For details, see the README file.
A Bangla license plates dataset (synthetic), generated with a mixture of deep learning and image processing. The labels are in darknet yolo format. [.txt, .data, .names]
Add a description, image, and links to the bangla-dataset topic page so that developers can more easily learn about it.
To associate your repository with the bangla-dataset topic, visit your repo's landing page and select "manage topics."