Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for real-time TTS! #5994

Open
wants to merge 6 commits into
base: dev
Choose a base branch
from
Open

Support for real-time TTS! #5994

wants to merge 6 commits into from

Conversation

czuzu
Copy link

@czuzu czuzu commented May 8, 2024

Hello,

I've setup TGUI with the alltalk_tts extension locally, modified the setup to allow for passing LLM replies as they're being generated (stream mode) to the extension, and to subsequently do real-time TTS (aka "incremental" TTS).

PR for the extension is in the backlog too, streaming TTS is working as expected locally, this one is for the parts in TGUI I needed to adjust/extend to allow this to work smoothly.

Mainly, 2 changes were needed:

  1. Add an output_modifier_stream handler for extensions (works only for chat-mode currently) as the enabler for streaming the LLM text to extensions
  2. Do the chat HTML updates structurally and "incrementally" ("diff" mode) - only update what's needed using JS, this was needed because "audio" elements in the chat HTML were previously continuously re-rendered and made audio streaming not possible

(the rest are miscellaneous changes - adding a llama3 instruction template and a commented line to allow remotely debugging TGUI)

Let me know what you think and btw, nice project!
Thanks!

Checklist:

czuzu added 4 commits May 8, 2024 11:12
1. Workaround gradio's limitation that doesn't directly allow passing
   data from Python -> JS (only indirectly, through components) - see create_dataholder_gradio
2. Update the chat HTML structure-wise and incrementally ("diff" mode) -
   see js_chat_html_update

This significantly makes the updates more efficient (no redundant HTML
rendering) and additionally allows for stable components in the chat
in streaming mode (e.g. important for example when extensions add <audio> elements in the chat - e.g. alltalk_TTS).
Allow extensions to modify the output when in chat mode and bot replies
are streamed. This is useful, for example, for extensions that need
access to the bot replies while they are streamed (e.g.
incremental/streaming TTS).
@hypersniper05
Copy link

Please support instruct mode

@hypersniper05
Copy link

@oobabooga please consider this 🙏

@bobcate
Copy link

bobcate commented May 25, 2024

Hey @czuzu
Would you consider making this for SillyTavern?
Given that you listed only 2 things for it, I thought I'd just ask, if it's no trouble.

@RandomInternetPreson
Copy link
Contributor

I gotta check the PR list more often, this is something I've needed for a while. Thank God textgen is open source and I can implement these changes on my own rig. Ty❤️❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants