Skip to content

milad-s5/Speech-Recognition-on-the-Free-Spoken-Digit-Dataset-FSDD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spoken Digit Recognition Dataset

Repository for the Statistical Learning course projec: Spoken Digit Recognition with Machine learning methods


Dataset: Free Spoken Digit Dataset (FSDD)

A simple audio/speech dataset consisting of recordings of spoken digits in wav files at 8kHz. The recordings are trimmed so that they have near minimal silence at the beginnings and ends.

FSDD is an open dataset, which means it will grow overtime as data is contributed. Thus in order to enable reproducibility and accurate citation in scientific journals the dataset is versioned using git tags.

Model and Training

image

The Notebook.ipynb consists of:

  • Phase_1: Preprocessing
  • Phase_2a: Supervised Learning without extracting features
  • Phase_2b: Supervised Learning with extracting features
  • Phase_3: Unsupervised Learning