Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reworking the FFT in power_spectrum #2

Closed
skewballfox opened this issue Jan 13, 2022 · 9 comments
Closed

reworking the FFT in power_spectrum #2

skewballfox opened this issue Jan 13, 2022 · 9 comments

Comments

@skewballfox
Copy link
Collaborator

skewballfox commented Jan 13, 2022

Okay, so I've been tackling translating this this morning, and I realized that translation for this part is going to be a bit more complicated. np.fft.rfft doesn't stand for reverse, it's an fft calculated for real input and outputs complex numbers(see the docs). for more information on the difference between np.fft.fft and np.fft.rfft please see this stack overflow question

there are only two crates that handle this:

  1. realfft
  2. ndrustfft

also, it seems that the ffts from numpy normalize a portion of the returned matrixes, where the third argument (norm=None) is actually indicating that it should use the default normalization option(backward) that appears to be distinct from an inverse fft(see the list real ffts).

ndrustfft may be a better fit here, but I'm still trying to figure out if it has options for normalization.

@skewballfox
Copy link
Collaborator Author

after reading over the docs a couple more times it looks like the output of this particular function isn't getting normalized. normalization only happens for inverse ffts(unless I am misinterpreting)

The default normalization ("backward") has the direct (forward) transforms unscaled and the inverse (backward) transforms scaled by $1/n$.

it's happening along the last axis, but the number of elements isn't determined by the number of elements in that vector, and it looks like the output should be a 2d matrix.

@StuartIanNaylor
Copy link

There is an example in C if that helps?
https://github.com/42io/dataset/tree/master/google_speech_commands
Also hidden away in tensorflow is https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/experimental/microfrontend

@skewballfox
Copy link
Collaborator Author

thanks, for the links. It's definitely useful to get some reference to other implementations. I'm going to look at this again sometime Monday.

@StuartIanNaylor
Copy link

StuartIanNaylor commented May 28, 2022

The 42io/dataset is quite a good example as dumping to stdout is quite a good method to create a pipeline.

Also spec augment is now quite commonly used and is an optional part of MFCC creation.
https://github.com/DemisEom/SpecAugment

@skewballfox https://github.com/HEnquist/realfft but compared to say pocketfft or fftw I don't think there is a neon implementation and RUST itself from a complete noob perspective looks at 1st glance lacking in embedded arm optimisation.
The C implementations that do support neon could be x2+ times faster.

@skewballfox
Copy link
Collaborator Author

I don't think there is a neon implementation and RUST itself from a complete noob perspective looks at 1st glance lacking in embedded arm optimisation.
The C implementations that do support neon could be x2+ times faster.

Still pretty new to this stuff, is this supposed to be handled directly by the library or more as a compile time optimization? I think the language has support for (most) neon instruction sets, going off this issue and the last corresponding PR.

also, a bit off topic, but what are some good resources for learning about this topic?

@StuartIanNaylor
Copy link

StuartIanNaylor commented May 31, 2022

Dunno as I am the same with C as I can hack bits and pieces together I had a little look out of interest with RUST but apart from 'let mut' is about bringing the dog in the only thing it seems to me you either create your own crates or you have limits on what RUST supplies.

I have been wondering especially with voiceAI being so Arm concentric that maybe even the ArmLibs might be the way to go but with FFT its the usual of FFTW but the new one that tensorflow and torch have employed is pocketfftw.
https://docs.rs/rustfft/latest/rustfft/ seems to be AVX/SSE4.1 with no mention of Neon?
Pocket & FFTW are Neon optimised but again I am wondering what gains you get with specific Arm optimised libs on Arm as Arm make some big claims but have never checked it as generally FFT twists my melon.
https://developer.arm.com/documentation/101004/2100/Fast-Fourier-Transforms-FFTs/Fast-Fourier-Transforms-FFTs-Introduction
https://developer.arm.com/documentation/102620/latest/

I doubt I am any further on than you apart from when you mentioned reverse iFFT (well you mentioned real FFT) but reverse got a mention.
There are a lot of algs in the input stream of a voiceAI pipeline and often are standalone instances where the end part uses iFFT to return back to a PCM stream for the next step to convert to FFT for processing.
I think the biggest optimisation could be to feed FFT and treat alg pipeline as one and keep as FFT all the way through to MFCC.
Thing is with the math these algs employ you need to be a rocket scientist to have a clue what is going on in the math and where I am a big fail.

I still think plain old C is a better option as its so much more portable across platforms in a general way (yeah you will get args against) but also the existing codebase and examples is much more.
Still even if you have the language and the libs the math can still be a huge barrier as if you hit bugs and a shallow understanding of the math its prob impossible to debug.

@sheosi
Copy link
Collaborator

sheosi commented May 31, 2022

Some nugget of info about SIMD, C, Rust and embedded.

Neither C nor Rust provide any architecture-agnostic SIMD and both rely on libraries that take profit of those to offer something more comfortable. Not long ago Rust (1.59.0) Rust added stable (it was on nightly before that) support for some SIMD on Aarch64.

Overall, C support is much more mature, (specially in embedded) while Rust's is improving rapidly. Other than ready-made libs (which is important, but can be wrapped) I can't see a huge impact in Rust vs clang C embedded-wise. And the funny thing is that Rust has opportunities to be faster than C (ownership, which enables making non-allocating code even on complex codebases) (pointer alias which does not happen in Rust and thus another source for potential optimizations).

And on the topic of RustFFT and Neon, there's a PR working exactly on that, now that it is finally in Rust stable.

@StuartIanNaylor
Copy link

And on the topic of RustFFT and Neon, there's a PR working exactly on that

That is exactly what I was saying as working on still doesn't mean its available which I was pointing out its not.
I am not here to play fan boy fave programming language I was just pointing out its likely what your currently doing with FFT isn't optimised for Neon which can be pretty huge.

All the complex math libs such as FFT OpenBlas and various others are highly optimised after years of dev where the likes of Intel & Arm create API identical optimised libs and obviously you don't understand but its likely a fresh generalised lib such as in Rust is not as performant as say FFTW or PocketFFT and definitely not as performant as platform optimised vendor versions which Rust lacks.

I am not bothered what you use but that is very much the situation, but prob no bother as really its already part of Torch.audio & TF.signal so probably you don't even need to provide unless fanning out to multiple models with the same MFCC.

@skewballfox
Copy link
Collaborator Author

coming back to this, as this is actually the current source of failure for the MFCC test.

in both the python version and this implementation, the shape of frames is [6248,320]. python side, the shape of the SPECTRUM_VECTOR inside fft_spectrum function is [6248,257]; currently the rust code is generating a panic due to incompatible shapes. I could just provide a view inside a correctly sized slice of frames, or pad it with zeros, but I'm not sure if either would be correct.

my intuition says that zero padding would probably be the correct as this is a 512 point FFT, and there are only 320 entries along the specified axis, but I need to confirm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants