Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting text back with no audio #157

Open
serosenstein opened this issue Aug 15, 2024 · 0 comments
Open

Getting text back with no audio #157

serosenstein opened this issue Aug 15, 2024 · 0 comments

Comments

@serosenstein
Copy link

Using ESP32 S Box 3 with willow installed.

"Hi ESP, lock Front door"

Texted displayed on ESP32: "Front door has been locked" Audio: none

Expected audio: Front door has been locked

[2024-08-15 22:53:06 +0000] [93] [DEBUG] FASTAPI: Got WILLOW request for model medium beam size 1 language detection False
[2024-08-15 22:53:06 +0000] [93] [DEBUG] WILLOW: Audio information: sample rate: 16000, bits: 16, channel(s): 1, codec: pcm
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WILLOW: Source audio is raw PCM, creating WAV container
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Loading audio took 1.5610000000000002 ms
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Feature extraction took 34.336 ms
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Using system default language en
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Using model medium with beam size 1
[2024-08-15 22:53:07 +0000] [93] [DEBUG] Processing GPU batch 1 of expected 1
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Model took 322.387 ms
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Decode took 0.339 ms
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: ASR transcript: Lock front door.
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Inference took 359.313 ms
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Inference speedup: 3x
[2024-08-15 22:53:09 +0000] [93] [DEBUG] FASTAPI: Got TTS request for speaker CLB with format FLAC and text: Front door has been locked.
[2024-08-15 22:53:09 +0000] [93] [DEBUG] TTS: Got request for speaker CLB with text: Front door has been locked.
[2024-08-15 22:53:09 +0000] [93] [DEBUG] TTS: Loaded included speaker CLB
[2024-08-15 22:53:09 +0000] [93] [DEBUG] TTS: Loading speaker embedding took 1.484 ms
[2024-08-15 22:53:09 +0000] [93] [DEBUG] TTS: Getting inputs took 1.0970000000000002 ms
[2024-08-15 22:53:10 +0000] [93] [DEBUG] TTS: Generating audio took 493.322 ms
[2024-08-15 22:53:10 +0000] [93] [DEBUG] TTS: Generating file took 3.4099999999999997 ms
[2024-08-15 22:53:10 +0000] [93] [DEBUG] TTS: Total time took 499.855 ms

Using WIS in docker, on ubuntu 24.04 (running as proxmox VM with GPU passthrough for Tesla P40).

Side note: webrtc also doesn't work for recording but I can generate TTS speech through API documents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant