QuiLLMan: Voice Chat with LLMs

A complete chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.

This repo is meant to serve as a starting point for your own language model-based apps, as well as a playground for experimentation. Contributions are welcome and encouraged!

OpenAI Whisper V3 is used to produce a transcript, which is then passed into the Llama 3.1 8B Instruct language model to generate a response, which is then synthesized by Coqui's XTTS text-to-speech model. All together, this produces a voice-to-voice chat experience.

You can find the demo live here.

[Note: this code is provided for illustration only; please remember to check the license before using any model for commercial purposes.]

File structure

React frontend (src/frontend/)
FastAPI server (src/app.py)
Whisper transcription module (src/whisper.py)
XTTS text-to-speech module (src/xtts.py)
LLaMA 3.1 text generation module (src/llama.py)

Developing locally

Requirements

modal installed in your current Python virtual environment (pip install modal)
A Modal account
A Modal token set up in your environment (modal token new)

Developing the inference modules

Whisper, XTTS, and Llama each have a local_entrypoint method that is invoked when you run that file directly. This is useful for testing each module standalone, without needing to run the whole app.

For example, to test the Whisper transcription module, run:

modal run -q src.whisper

Developing the http server and frontend

The http server at src/app.py is a FastAPI app that chains the inference modules into a single pipeline.

It also serves the frontend as static files.

To run a development server, execute this command from the root directory of this repo:

modal serve src.app

In the terminal output, you'll find a URL that you can visit to use your app. While the modal serve process is running, changes to any of the project files will be automatically applied. Ctrl+C will stop the app. Note that for frontend changes, the browser cache will need to be cleared.

Deploying to Modal

Once you're happy with your changes, deploy your app:

modal deploy src.app

Note that leaving the app deployed on Modal doesn't cost you anything! Modal apps are serverless and scale to 0 when not in use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

QuiLLMan: Voice Chat with LLMs

File structure

Developing locally

Requirements

Developing the inference modules

Developing the http server and frontend

Deploying to Modal

Files

README.md

Latest commit

History

README.md

File metadata and controls

QuiLLMan: Voice Chat with LLMs

File structure

Developing locally

Requirements

Developing the inference modules

Developing the http server and frontend

Deploying to Modal