Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnswerToSpeech #2584

Merged
merged 74 commits into from
Jun 15, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
1332691
Add new audio answer primitives
ZanSara May 20, 2022
0529a6e
Add AnswerToSpeech
ZanSara May 20, 2022
b4fbb85
Add dependency group
ZanSara May 20, 2022
2e12d3c
Update Documentation & Code Style
github-actions[bot] May 20, 2022
3968131
Extract TextToSpeech in a helper class, create DocumentToSpeech and p…
ZanSara Jun 1, 2022
8b0f129
Add tests
ZanSara Jun 1, 2022
1ba2010
Update Documentation & Code Style
github-actions[bot] Jun 1, 2022
98ff519
Add ability to compress audio and more tests
ZanSara Jun 1, 2022
e449fe2
Merge branch 'text2speech' of github.com:deepset-ai/haystack into tex…
ZanSara Jun 1, 2022
1641248
Add audio group to test, all and all-gpu
ZanSara Jun 1, 2022
092a32f
fix pylint
ZanSara Jun 1, 2022
1a801f5
Update Documentation & Code Style
github-actions[bot] Jun 1, 2022
010de18
Merge branch 'text2speech' of github.com:deepset-ai/haystack into tex…
ZanSara Jun 2, 2022
6152f90
Accidental git tag
ZanSara Jun 2, 2022
5b905f8
Merge branch 'master' into text2speech
ZanSara Jun 2, 2022
2a8e291
Try pleasing mypy
ZanSara Jun 2, 2022
4434e55
Update Documentation & Code Style
github-actions[bot] Jun 2, 2022
b59be2f
fix pylint
ZanSara Jun 2, 2022
088e446
Add warning for missing OS library and support in CI
ZanSara Jun 2, 2022
836be33
Try fixing mypy
ZanSara Jun 2, 2022
bf1ae0e
Merge branch 'text2speech' of github.com:deepset-ai/haystack into tex…
ZanSara Jun 2, 2022
f704d90
Update Documentation & Code Style
github-actions[bot] Jun 2, 2022
61e95b3
Add docs, simplify args for audio nodes and add tutorials
ZanSara Jun 6, 2022
04df900
Fix mypy
ZanSara Jun 6, 2022
90f8d42
Fix run_batch
ZanSara Jun 6, 2022
fcef153
Feedback on tutorials
ZanSara Jun 6, 2022
ab4bdd8
fix mypy and pylint
ZanSara Jun 6, 2022
af2e88b
Fix mypy again
ZanSara Jun 6, 2022
b268061
Fix mypy yet again
ZanSara Jun 6, 2022
bbc57cc
Fix the ci
ZanSara Jun 6, 2022
c72a65d
Fix dicts merge and install ffmpeg on CI
ZanSara Jun 6, 2022
292482f
Make the audio nodes import safe
ZanSara Jun 6, 2022
6d24456
Trying to increase tolerance in audio test
ZanSara Jun 6, 2022
330a11a
Fix import paths
ZanSara Jun 6, 2022
88da3e7
fix linter
ZanSara Jun 6, 2022
fc4a3ad
Merge branch 'master' into text2speech
ZanSara Jun 6, 2022
6cb3a31
Update Documentation & Code Style
github-actions[bot] Jun 6, 2022
a1a4343
Merge branch 'master' into text2speech
ZanSara Jun 7, 2022
47acec3
Add audio libs in unit tests
ZanSara Jun 7, 2022
2661316
Update _text_to_speech.py
agnieszka-m Jun 8, 2022
695908b
Update answer_to_speech.py
agnieszka-m Jun 8, 2022
60b2f84
Use dedicated dataset & update telemetry
ZanSara Jun 10, 2022
746244e
Remove and use distilled roberta
ZanSara Jun 10, 2022
cd22f9d
Revert special primitives so that the nodes run in indexing
ZanSara Jun 10, 2022
3efd157
Improve tutorials and fix smaller bugs
ZanSara Jun 10, 2022
8c18a95
Merge branch 'text2speech' of github.com:deepset-ai/haystack into tex…
ZanSara Jun 10, 2022
1916567
Update Documentation & Code Style
github-actions[bot] Jun 10, 2022
0cfef18
Fix serialization issue
ZanSara Jun 10, 2022
f79d0e1
Merge branch 'text2speech' of github.com:deepset-ai/haystack into tex…
ZanSara Jun 10, 2022
c5f3155
Update Documentation & Code Style
github-actions[bot] Jun 10, 2022
b5bdbc1
Improve tutorial
ZanSara Jun 10, 2022
8fd1051
Update Documentation & Code Style
github-actions[bot] Jun 10, 2022
5ab9da2
Update _text_to_speech.py
agnieszka-m Jun 13, 2022
754dae0
Minor lg updates
agnieszka-m Jun 13, 2022
b1758a4
Minor lg updates to tutorial
agnieszka-m Jun 13, 2022
60e5e6e
Making indexing work in tutorials
ZanSara Jun 13, 2022
8d725ef
Merge branch 'text2speech' of github.com:deepset-ai/haystack into tex…
ZanSara Jun 13, 2022
8c21f13
Update Documentation & Code Style
github-actions[bot] Jun 13, 2022
b0f352a
Improve docstrings
ZanSara Jun 13, 2022
c1df345
Try to use GPU when available
ZanSara Jun 13, 2022
72d47d9
Merge branch 'text2speech' of github.com:deepset-ai/haystack into tex…
ZanSara Jun 13, 2022
d6bbb8b
Update Documentation & Code Style
github-actions[bot] Jun 13, 2022
70714c6
Fixi mypy and pylint
ZanSara Jun 13, 2022
2edb898
Try to pass the device correctly
ZanSara Jun 13, 2022
dc3ab30
Merge branch 'text2speech' of github.com:deepset-ai/haystack into tex…
ZanSara Jun 13, 2022
acfb3b2
Merge branch 'master' into text2speech
ZanSara Jun 13, 2022
2880862
Update Documentation & Code Style
github-actions[bot] Jun 13, 2022
71fff71
Use type of device
ZanSara Jun 13, 2022
66d4a21
Merge branch 'text2speech' of github.com:deepset-ai/haystack into tex…
ZanSara Jun 13, 2022
15abd0c
use .cpu()
ZanSara Jun 13, 2022
113a088
Improve .ipynb
ZanSara Jun 13, 2022
d8eea89
update apt index to be able to download libsndfile1
ZanSara Jun 13, 2022
c754292
Fix SpeechDocument.from_dict()
ZanSara Jun 13, 2022
04b05c1
Change pip URL
ZanSara Jun 13, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions haystack/nodes/audio/answer_to_speech.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
import logging
from typing import Union, List, Dict, Any, Tuple

import os
import hashlib
from pathlib import Path

from espnet2.bin.tts_inference import Text2Speech
import soundfile as sf

from haystack.nodes import BaseComponent
from haystack.schema import Answer, AudioAnswer, GeneratedAudioAnswer
ZanSara marked this conversation as resolved.
Show resolved Hide resolved


class AnswerToSpeech(BaseComponent):
ZanSara marked this conversation as resolved.
Show resolved Hide resolved
ZanSara marked this conversation as resolved.
Show resolved Hide resolved

outgoing_edges = 1

def __init__(
self,
model_name_or_path: Union[str, Path] = "espnet/kan-bayashi_ljspeech_vits",
generated_audio_path: Path = Path(__file__).parent / "generated_audio_answers",
):
super().__init__()
self.model = Text2Speech.from_pretrained(model_name_or_path)
self.generated_audio_path = generated_audio_path

if not os.path.exists(self.generated_audio_path):
os.mkdir(self.generated_audio_path)

def text_to_speech(self, text: str) -> Any:
filename = hashlib.md5(text.encode("utf-8")).hexdigest()
path = self.generated_audio_path / f"{filename}.wav"
ZanSara marked this conversation as resolved.
Show resolved Hide resolved

# Duplicate answers might be in the list, in this case we save time by not regenerating.
if not os.path.exists(path):
output = self.model(text)["wav"]
ZanSara marked this conversation as resolved.
Show resolved Hide resolved
sf.write(path, output.numpy(), self.model.fs, "PCM_16")

return path

def run(self, answers: List[Answer]) -> Tuple[Dict[str, AudioAnswer], str]:

audio_answers = []
for answer in answers:

logging.info(f"Processing answer '{answer.answer}' and its context...")
answer_audio = self.text_to_speech(answer.answer)
context_audio = self.text_to_speech(answer.context)

audio_answer = GeneratedAudioAnswer.from_text_answer(
answer_object=answer, generated_audio_answer=answer_audio, generated_audio_context=context_audio
)
audio_answer.type = "generative"
audio_answers.append(audio_answer)

return {"answers": audio_answers}, "output_1"

def run_batch(self, answers: List[Answer]) -> Tuple[Dict[str, AudioAnswer], str]:
return self.run(answers)
36 changes: 36 additions & 0 deletions haystack/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,42 @@ def from_json(cls, data):
return cls.from_dict(data)


@dataclass
class AudioAnswer(Answer):
answer: Path
ZanSara marked this conversation as resolved.
Show resolved Hide resolved
context: Optional[Path] = None
offsets_in_document: Optional[Any] = None
offsets_in_context: Optional[Any] = None

def __str__(self):
return f"<AudioAnswer: answer='{self.answer}', score={self.score}, context='{self.context}'>"

def __repr__(self):
return f"<AudioAnswer {asdict(self)}>"


@dataclass
class GeneratedAudioAnswer(AudioAnswer):
type: str = "text-to-speech"
ZanSara marked this conversation as resolved.
Show resolved Hide resolved
answer_transcript: Optional[str] = None
context_transcript: Optional[str] = None

@classmethod
def from_text_answer(
cls, answer_object: Answer, generated_audio_answer: Any, generated_audio_context: Optional[Any] = None
):
answer_dict = answer_object.to_dict()
answer_dict = {key: value for key, value in answer_dict.items() if value}

answer_dict["answer_transcript"] = answer_dict["answer"]
answer_dict["context_transcript"] = answer_dict["context"]

answer_dict["answer"] = generated_audio_answer
answer_dict["context"] = generated_audio_context

return cls(**answer_dict)


@dataclass
class Label:
id: str
Expand Down
9 changes: 7 additions & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,12 @@ docstores =
farm-haystack[faiss,milvus,weaviate,graphdb,pinecone]
docstores-gpu =
farm-haystack[faiss-gpu,milvus,weaviate,graphdb,pinecone]

audio =
espnet
espnet-model-zoo
beir =
beir; platform_system != 'Windows'
crawler =
selenium
webdriver-manager
Expand All @@ -172,10 +178,9 @@ ray =
ray>=1.9.1,<2; platform_system != 'Windows'
ray>=1.9.1,<2,!=1.12.0; platform_system == 'Windows' # Avoid 1.12.0 due to https://github.com/ray-project/ray/issues/24169 (fails on windows)
aiorwlock>=1.3.0,<2

colab =
grpcio==1.43.0
beir =
beir; platform_system != 'Windows'
dev =
# Type check
mypy
Expand Down