Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ComputeKaldiPitch prototype #1063

Closed
wants to merge 1 commit into from
Closed

Conversation

mthrok
Copy link
Collaborator

@mthrok mthrok commented Dec 2, 2020

Superseded by #1243

This is the initial step to add Kaldi's pitch feature.

Approach

Kaldi's pitch feature is somewhat convoluted and it involves resampling and Viterbi algorithm, which is not easy to reproduce correctly in another language (i.e. Python). Therefore I took an approach to reuse Kaldi's original implementation as much as possible.

In this approach, I re-implemented the minimum interface of the family of Kaldi's matrix library, (such as kaldi::VectorBase, kaldi::MatrixBase, etc...) with torch::Tensor class, and pulled the source files required to compile kaldi/src/feat/pitch-functions.cc with minimum modifications. (Commenting out some #include statements and correcting the type definitions. This is done in my fork of kaldi.)

The reason I used torch::Tensor class is that, this way, we do not need to be worried about the availability of a BLAS library, and the existing build process should just work. (with the exception of the addition of git submodule initialization) Once we find out a reliable way to detect BLAS library that PyTorch is linked against, then we should be able to switch to use bare Vector/Matrix classes.

The resulting torchaudio.functional.compute_kaldi_pitch function produces numerically identical result as the compute-kaldi-pitch-feats binary, but it is very slow. (x60) One major reason seems to be element-wise access, which Kaldi uses a lot but is not efficient in PyTorch.

There are three major things to consider forward;

1. Speed

The possible way to improve the speed is;

  1. Modify the higher level implementation like ComputeKaldiPitch function to apply operation in vectorized manner.
  2. Use the stripped version of the original implementation of Kaldi's Vector/Matrix classes and compile them with the same BLAS library that PyTorch uses. In Conda environment this is kind of easy, but we need to figure out the correct way to do so cross-platform (Linux/Windows/macOS, conda/pip)

2. Interface

Batch

Currently torchaudio.functional.compute_kaldi_pitch is a simple wrapper around kaldi::ComputeKaldiPitch and can only process one waveform at a time. We need to extend the function so that it can handle batched samples. A naive approach will be to parallelize the operation over batch dimension.

Return values

Currently the returned Tensor is 2D with [time, pitch then NCCF]. Providing an easy way to get pitch or NCCF without manually doing index slicing is preferred. One possible way is named Tensor.

3. Implementation detail

There are miscellaneous implementation details to be polished. Examples are;

  • I found that bare KALDI_ASSERT hangs, so they have to be replaced with TORCH_CHECK or TORCH_INTERNAL_ASSERT
  • Better way to maintain modified Kaldi. We should maintain a fork of Kaldi in pytorch org.

@mthrok mthrok force-pushed the pitch-feature branch 2 times, most recently from 6b09bd8 to dbe8039 Compare December 2, 2020 18:47
@mthrok mthrok force-pushed the pitch-feature branch 2 times, most recently from ab00886 to e9439e8 Compare December 10, 2020 21:28
@vincentqb
Copy link
Contributor

To consider: would git subtree allow for a more appropriate way of handling the history and modification of kaldi code than git submodule ? e.g. here

@mthrok
Copy link
Collaborator Author

mthrok commented Dec 11, 2020

Proposed improved approach;

  • find out the way to submoduling only needed (unmodified files) so that we do not need to keep original source code that we do not need to maintain the copy of unmodified files.
  • Only keep the modified source code.

@vincentqb vincentqb mentioned this pull request Jan 8, 2021
@mthrok mthrok force-pushed the pitch-feature branch 2 times, most recently from 044bfc2 to 061dabe Compare January 15, 2021 23:36
@mthrok mthrok added this to the v0.8 milestone Jan 25, 2021
@mthrok mthrok force-pushed the pitch-feature branch 2 times, most recently from a004912 to 1a72ebc Compare January 26, 2021 21:32
@mthrok mthrok closed this Feb 8, 2021
@mthrok mthrok deleted the pitch-feature branch February 8, 2021 19:25
@mthrok mthrok mentioned this pull request Feb 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants