- Speaker Diarization

Group: Enthusiasm_Overflow
Shivam Kumar 170668 
Yash Mittal 170818 
Prateek Varshney 170494

Instructions for setting up Drive

Since we ran all our experiments on Google Colab, to reproduce our code the user will need to download the above data folders and upload them at the following locations (respectively) on their Google Drive:

Folders to be downloaded	Description	Path at which to upload in your Google Drive
YashVAD, CNN, TransferLearningBestModels	Folders containing Model weights	'/content/drive/MyDrive/'
LSTM_keras_50epochs_completedata_nonfreeze_SGD.h5 LSTM_keras_50epochs_completedata_history_nofreeze_SGD	Saved Weights for Transfer Learning Variant 3	‘/content/drive/MyDrive/’
ATML	Folder containing ami_public_manual_1.6.2 and code folder	‘/content/drive/MyDrive/’
amicorpusfinal	Training AMI WAV dataset.	‘/content/drive/MyDrive/’
Hindi	Constains dataset, model & python scripts for Hindi_English BiLSTM Model	"/content/drive/MyDrive/"
plots (create an empty folder)	create an empty folder named 'plots' to store generated plots	‘/content/drive/MyDrive/’

Discription of files present in this Github Repo.

Main Project Codes

Contains the following jupyter notebooks:

Files	Description
Resemblyser_spectral.ipynb	Contains the baseline Speaker Diarization code which uses a pre-trained instance of their model (trained on fixed-length segmentsextracted from a large corpus) as the Embedding module for our Speaker Diarization system.
CNN_embedding_submission.ipynb	Uses Mel-log spectrum and MFCC feature extractor as well as a denoiser to remove the silence parts and speech noise and a CNN Model to generate the embeddings
AMI_LSTM_Submission_BaseLine.ipynb	Uses the log-melspectrum of the wav chunks as the input vectors (features) to the LSTM based Embedding module.
DER_Hindi_English.ipynb	Contains code for Speaker Diarization using BiLSTM model trained on Hindi English Custom Dataset.
vad_comparisons.ipynb	Compares the performance of the three VAD methods: WebRTC-VAD, Voice Activity Detector, LSTM based Model

Transfer Learning Variants

Contains the following jupyter notebooks:

Files	Description
Transfer_Learning_Variant1.ipynb	Passes the dataset to the pre-trained Hindi-English-BiLSTM and the resulting "refined" features to train a new Embedding Module from scratch. This is similar to passing the dataset through a sequence of 2 models aligned one after the other.
Transfer_Learning_Variant2.ipynb	Combines the above 2 models into one: freezes the weights of the BiLSTM layers of the Hindi-English-BiLSTM Model, removes and replaces the TimeDistributed Dense Layers with one LSTM + Simple Dense Layers and retrain the model using MFCC features of the AMI-Corpus Dataset, thereby enabling only the training of the top layers.
Transfer_Learning_Variant3.ipynb	Similar to Variant 2 except that it also unfreezes the BiLSTM layers as well, i.e., trains the "pre-trained" model (after replacing the Dense Layers) end to end on the current dataset and finetunes it accordingly.

Demo

Demos_Part1 Contains the results of speaker diarization on live run on Youtube Clip.

Demo_Part2 Contains the results of our variation of applying transfer learning to adapt our model from one dataset to another dataset.

Libraries needed to be imported

We use the following libraries:

pydub
xmltodict
resemblyzer
pyannote
noisereduce
spectralcluster
PyTorch
pyannote.metrics
pyannote.core
hdbscan
keras
tensorflow_addons
python_speech_features

Note: To install any of the above libraries:

Use pip install library_name for your local system.
Use !pip install library_name when installing on Colab.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Main Project Codes		Main Project Codes
Transfer Learning Variants		Transfer Learning Variants
.DS_Store		.DS_Store
.gitattributes		.gitattributes
Hindi_English_BiLSTM_Demo.ipynb		Hindi_English_BiLSTM_Demo.ipynb
ReadMe.md		ReadMe.md
Report_EE698.pdf		Report_EE698.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

- Speaker Diarization

Instructions for setting up Drive

Discription of files present in this Github Repo.

Main Project Codes

Transfer Learning Variants

Demo

Libraries needed to be imported

Note: To install any of the above libraries:

About

Releases

Packages

Languages

skymatte/Speaker-Diarization

Folders and files

Latest commit

History

Repository files navigation

- Speaker Diarization

Instructions for setting up Drive

Discription of files present in this Github Repo.

Main Project Codes

Transfer Learning Variants

Demo

Libraries needed to be imported

Note: To install any of the above libraries:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages