Skip to content

Commit

Permalink
web page with samples
Browse files Browse the repository at this point in the history
  • Loading branch information
taras-sereda committed Jan 3, 2024
1 parent 54d5ee8 commit 494912e
Show file tree
Hide file tree
Showing 52 changed files with 39 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
theme: jekyll-theme-cayman
8 changes: 8 additions & 0 deletions docs/assets/css/style.scss
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
---

@import "{{ site.theme }}";

.main-content {
max-width: 90%;
}
30 changes: 30 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
## PHEME: Efficient and Conversational Speech Generation.

- Abstract. In recent years, speech generation has seen remarkable progress, now achieving one-shot generation capability that is often virtually indistinguishable from real human voice. Integrating such advancements in speech generation with large language models might revolutionize a wide range of applications. However, certain applications, such as assistive conversational systems, require natural and conversational speech generation which also operates efficiently in real time. Current state-of-the-art models like VALL-E and SoundStorm, powered by hierarchical neural audio codecs, require large neural components and extensive training data to work well. In contrast, MQTTS aims to build more compact conversational TTS models while capitalizing on smaller-scale real-life conversational speech data. However, its autoregressive nature yields high inference latency and thus limits its real-time usage. In order to mitigate the current limitations of the state-of-the-art TTS models while capitalizing on their strengths, in this work we propose the *PHEME* model series that **1)** offers compact yet high-performing models, **2)** allows for parallel speech generation of **3)** natural conversational speech, and **4)** it can be trained efficiently on smaller-scale conversational data, cutting data demands by more than 10x but still matching the quality of the autoregressive TTS models. We also show that through simple teacher-student distillation we can meet significant improvements in voice quality for single-speaker setups on top of pretrained *PHEME* checkpoints, relying solely on synthetic speech generated by much larger teacher models.
- [Code](https://github.com/PolyAI-LDN/pheme)
- [Paper](...)

### Artificial Voice TTS Examples



| Prompt audio | Reference audio | PHEME (300M) no training on artificial voice | PHEME (300M) | Prompt text | Reference text |
| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| <audio src="samples/empress/46.wav" type="audio/wav" controls preload></audio> | <audio src="samples/empress/269.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-no-empress-300/270.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-empress-300/270.wav" type="audio/wav" controls preload></audio> | Our garden terrace is a lovely spot for afternoon tea. | The city’s ghost walk is a spooky and fascinating evening adventure. |
| <audio src="samples/empress/29.wav" type="audio/wav" controls preload></audio> | <audio src="samples/empress/234.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-no-empress-300/235.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-empress-300/235.wav" type="audio/wav" controls preload></audio> | If you need a quiet place to work, our library is just perfect. | Our hotel’s evening bonfires are a great place to socialize. |
| <audio src="samples/empress/114.wav" type="audio/wav" controls preload></audio> | <audio src="samples/empress/242.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-no-empress-300/243.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-empress-300/243.wav" type="audio/wav" controls preload></audio> | There’s a delightful chocolate factory tour, great for families. | Our rooftop jazz nights feature some of the best local talent. |
| <audio src="samples/empress/161.wav" type="audio/wav" controls preload></audio> | <audio src="samples/empress/226.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-no-empress-300/227.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-empress-300/227.wav" type="audio/wav" controls preload></audio> | The rooftop bar hosts a live DJ on Friday nights. | Our in-house sommelier leads an exquisite wine and cheese pairing event. |
| <audio src="samples/empress/148.wav" type="audio/wav" controls preload></audio> | <audio src="samples/empress/189.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-no-empress-300/190.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-empress-300/190.wav" type="audio/wav" controls preload></audio> | The comedy club in town is known for its hilarious acts. | The annual food fair showcases the best of local cuisine. |




### GigaSpeech TTS Examples

| Prompt audio | Reference audio | PHEME (100M) | PHEME (300M) no speaker embeddings | PHEME (300M) | Prompt text | Reference text |
| :----------------------------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| <audio src="samples/gigaspeech/YOU1000000044_S0000798.wav" type="audio/wav" controls preload></audio> | <audio src="samples/gigaspeech/YOU1000000044_S0000799.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-100/209.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-no-spkr-300/209.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-300/209.wav" type="audio/wav" controls preload></audio> | let's just say in her own words, once i sat down and watched it i never moved, i w as enthralled by it. | and she told me the next time she went back she would take me with her. and i waited, of course, like i said, thirteen years. |
| <audio src="samples/gigaspeech/POD1000000004_S0000246.wav" type="audio/wav" controls preload></audio> | <audio src="samples/gigaspeech/POD1000000004_S0000247.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-100/019.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-no-spkr-300/019.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-300/019.wav" type="audio/wav" controls preload></audio> | in early twenty-twenty, blue apron put the word out that it was interested in possibly getting scooped up. maybe by a big grocery chain. or someone else with deep pockets who wanted to own a meal kit delivery business. | at the same time, garcia says, the company acted like it was in turnaround mode. it decid ed to streamline operations, including shutting down its fulfillment center in texas |
| <audio src="samples/gigaspeech/POD1000000018_S0000253.wav" type="audio/wav" controls preload></audio> | <audio src="samples/gigaspeech/POD1000000018_S0000254.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-100/042.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-no-spkr-300/042.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-300/042.wav" type="audio/wav" controls preload></audio> | aside from influencing basically everyone who matters he was one of the first if not, in fact the first artist to bring an electric guitar player with him on to the grand oleopry stag e. | if you want to call it a honky tonk, and it happened after ernest tubb. it was influenced by ernest tubb. before i get to the story and episode, i'd like to address one other thing. |
| <audio src="samples/gigaspeech/POD1000000048_S0000035.wav" type="audio/wav" controls preload></audio> | <audio src="samples/gigaspeech/POD1000000048_S0000036.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-100/080.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-no-spkr-300/080.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-300/080.wav" type="audio/wav" controls preload></audio> | so it's ah i think there's a range of risks, but generally speaking ah there's goi ng to be a study increase in the floor of the skill level as these ah a i technologies diffuse. | that is, there will be more and more ah capabilities available to people at the bottom of the scale, that is individuals as well as people with more access to computing power, ah money, and data at the higher end. |
| <audio src="samples/gigaspeech/YOU1000000006_S0000052.wav" type="audio/wav" controls preload></audio> | <audio src="samples/gigaspeech/YOU1000000006_S0000051.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-100/188.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-no-spkr-300/188.wav" type="audio/wav" controls preload></audio> | <audio src="samples/pheme-300/188.wav" type="audio/wav" controls preload></audio> | so after they put in their name, phone number, email address onto your landing pag e. where would you like to send them? would you like to send them to your facebook page your website? | book an appointment to a buyer on facebook messenger bot, a seller messenger bot. where w ould you like to send them? so for this example i'm just gonna say book an appointment. |
Binary file added docs/samples/empress/114.wav
Binary file not shown.
Binary file added docs/samples/empress/148.wav
Binary file not shown.
Binary file added docs/samples/empress/161.wav
Binary file not shown.
Binary file added docs/samples/empress/189.wav
Binary file not shown.
Binary file added docs/samples/empress/217.wav
Binary file not shown.
Binary file added docs/samples/empress/226.wav
Binary file not shown.
Binary file added docs/samples/empress/234.wav
Binary file not shown.
Binary file added docs/samples/empress/242.wav
Binary file not shown.
Binary file added docs/samples/empress/262.wav
Binary file not shown.
Binary file added docs/samples/empress/269.wav
Binary file not shown.
Binary file added docs/samples/empress/29.wav
Binary file not shown.
Binary file added docs/samples/empress/46.wav
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added docs/samples/pheme-100/019.wav
Binary file not shown.
Binary file added docs/samples/pheme-100/042.wav
Binary file not shown.
Binary file added docs/samples/pheme-100/080.wav
Binary file not shown.
Binary file added docs/samples/pheme-100/188.wav
Binary file not shown.
Binary file added docs/samples/pheme-100/209.wav
Binary file not shown.
Binary file added docs/samples/pheme-300/019.wav
Binary file not shown.
Binary file added docs/samples/pheme-300/042.wav
Binary file not shown.
Binary file added docs/samples/pheme-300/080.wav
Binary file not shown.
Binary file added docs/samples/pheme-300/188.wav
Binary file not shown.
Binary file added docs/samples/pheme-300/209.wav
Binary file not shown.
Binary file added docs/samples/pheme-empress-300/001.wav
Binary file not shown.
Binary file added docs/samples/pheme-empress-300/002.wav
Binary file not shown.
Binary file added docs/samples/pheme-empress-300/190.wav
Binary file not shown.
Binary file added docs/samples/pheme-empress-300/227.wav
Binary file not shown.
Binary file added docs/samples/pheme-empress-300/235.wav
Binary file not shown.
Binary file added docs/samples/pheme-empress-300/243.wav
Binary file not shown.
Binary file added docs/samples/pheme-empress-300/270.wav
Binary file not shown.
Binary file added docs/samples/pheme-no-empress-300/190.wav
Binary file not shown.
Binary file added docs/samples/pheme-no-empress-300/227.wav
Binary file not shown.
Binary file added docs/samples/pheme-no-empress-300/235.wav
Binary file not shown.
Binary file added docs/samples/pheme-no-empress-300/243.wav
Binary file not shown.
Binary file added docs/samples/pheme-no-empress-300/270.wav
Binary file not shown.
Binary file added docs/samples/pheme-no-spkr-300/019.wav
Binary file not shown.
Binary file added docs/samples/pheme-no-spkr-300/042.wav
Binary file not shown.
Binary file added docs/samples/pheme-no-spkr-300/080.wav
Binary file not shown.
Binary file added docs/samples/pheme-no-spkr-300/188.wav
Binary file not shown.
Binary file added docs/samples/pheme-no-spkr-300/209.wav
Binary file not shown.

0 comments on commit 494912e

Please sign in to comment.