Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement text-to-speech support on Android, iOS, HTML5, Linux, macOS, and Windows. #56192

Merged
merged 1 commit into from
Apr 28, 2022

Conversation

bruvzg
Copy link
Member

@bruvzg bruvzg commented Dec 23, 2021

Improved version of #21478 and https://github.com/bruvzg/godot_tts/tree/master.

Thanks to https://github.com/hpvb/dynload-wrapper LGPL dependency on Linux is loaded dynamically (the same way other LGPL libs are linked: asoundlib, pulseaudio and udev), on other platforms TTS is part of the core system API.

New functions:

  • bool DisplayServer.tts_is_speaking()
  • bool DisplayServer.tts_is_paused()
  • Array DisplayServer.tts_get_voices() - voice/language enumeration, returns Array of Dictionaries with following key-value pairs "name":String, "id":String, "language":String.
  • PackedStringArray DisplayServer.tts_get_voices_for_language(language) - returns array of voice IDs for the language.
  • void DisplayServer.tts_speak(text, voice_id, volume, pitch, rate, utterance_id, interrupt) - asynchronous, adds utterance with the specified parameters to the queue.
  • void DisplayServer.tts_stop()
  • void DisplayServer.tts_pause()
  • void DisplayServer.tts_resume()
  • void DisplayServer.tts_set_utterance_callback(event, callable) - adds a callback for the specific utterance event:
    • started, ended, canceled/failed, callback function take one int parameter, the utterance_id passed to tts_speak.
    • boundary, callback function take two int parameters, the char_index and utterance_id passed to tts_speak.

Also added ICU backed TextServer.string_get_word_breaks(text, language) required to add index marks at the word breaks.

Platforms:

  • Android, implemented and tested (on Android 11 / Poco F3).
  • Linux (speech-dispatcher), implemented and tested (on x86_64 Fedora 35, arm64 Ubuntu 21.10).
  • Windows (SAPI), implemented and tested (on Windows 11).
  • macOS (AVSpeech on 10.14+ / NSSpeech on older versions), implemented and tested (on M1 macOS and x86_64 via Rosetta).
  • iOS (AVSpeech), same code as macOS, tested on M1 mac.
  • Web, implemented and tested (on Firefox / Windows 11).

Notes:

All methods and callbacks are implemented on all platforms, but Web version probably won't have boundary callbacks on Linux, since it seems to be not implemented in Chromium and Firefox.

Demo:

tts2_test.zip
Demo last updated: 29. Dec. 2021.

Screenshot 2021-12-28 at 14 10 07

Related: #14011, #20683, #20254

Partially implements godotengine/godot-proposals#983

Bugsquad edit: This closes godotengine/godot-proposals#3584.

@bruvzg bruvzg added this to the 4.0 milestone Dec 23, 2021
@bruvzg bruvzg force-pushed the tts2.0 branch 7 times, most recently from aec2b7c to 2ca7483 Compare December 30, 2021 07:18
@bruvzg bruvzg force-pushed the tts2.0 branch 3 times, most recently from d7f2742 to c76e988 Compare January 13, 2022 11:46
@bruvzg bruvzg marked this pull request as ready for review January 13, 2022 11:46
@bruvzg bruvzg requested review from a team as code owners January 13, 2022 11:46
doc/classes/DisplayServer.xml Outdated Show resolved Hide resolved
if (ids.has(p_id)) {
int pos = 0;
if ((TTSUtteranceEvent)p_event == DisplayServer::TTS_UTTERANCE_BOUNDARY) {
// Convert position from UTF-8 to UTF-32.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Also, it may be better to add this utility method to CharString (if it's not available already) in order to prevent unnecessary code duplication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be UTF-16 not UTF-8, but the idea is the same, Java string and Char16String will have some characters (with codes > 0xFFFF) encoded as two codepoints, Godot String as one. This is to compensate the difference in offsets. Not sure if it's worth adding a utility method to the CharString / Char16String, UTF-8 version is used once for Javascript, and UTF-16 twice for Android and Windows, and unlikely more will be added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fact it's used more than once warrant an addition :).

I'm mostly looking forward trying to avoid subtle bugs creeping in if one side is updated and the other not. With common methods, it's easy to see who's using it and the changes propagate automatically.
With custom logic, you lose that advantage.

Copy link
Contributor

@m4gr3d m4gr3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the Android logic looks sounds; the majority of my comments are around styling and organization issues so let me know if you have any questions.

@@ -166,6 +167,7 @@ private void setButtonPausedState(boolean paused) {

public static GodotIO io;
public static GodotNetUtils netUtils;
public static GodotTTS tts;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be made private. I believe the jni logic should still work as expected with the scope change, but let me know if it doesn't.

import java.util.LinkedList;
import java.util.Set;

public class GodotTTS {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a javadoc describing the role/use of the class.

From this PR's implementation, it looks like this class is mostly used on the rendering thread. I didn't see anything in the documentation that would suggest otherwise, but were you able to validate that calling the android tts apis on the render thread works as expected?

If you expect this class to be called both from the render thread and from the main thread, then some of its fields should be made thread-safe. Let me know if that's the case and I can advise on the best manner to do so.

GodotTTS tts = null;

public void updateTTS() {
if (!tts.tts_speaking && tts.tts_queue.size() > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's recommend to use methods instead of fields as it simplifies any future refactoring (e.g: method signature remains the same but internal logic and fields can change).

That said, given these classes are within the same package and interdependent, it's not a big issue here, so I leave the decision up to you.

@bruvzg bruvzg force-pushed the tts2.0 branch 2 times, most recently from 2ada3fa to 91018a3 Compare January 21, 2022 07:43
@akien-mga
Copy link
Member

akien-mga commented Apr 28, 2022

Since we seem to make most "advanced" features opt out at compile time on Linux, this could be done here too:

diff --git a/platform/linuxbsd/SCsub b/platform/linuxbsd/SCsub
index 479659dfa4..d7ee9821b6 100644
--- a/platform/linuxbsd/SCsub
+++ b/platform/linuxbsd/SCsub
@@ -8,9 +8,9 @@ import platform_linuxbsd_builders
 common_linuxbsd = [
     "crash_handler_linuxbsd.cpp",
     "os_linuxbsd.cpp",
-    "tts_linux.cpp",
     "joypad_linux.cpp",
     "freedesktop_screensaver.cpp",
+    "tts_linux.cpp",
 ]
 
 if "x11" in env and env["x11"]:
@@ -27,7 +27,8 @@ if "vulkan" in env and env["vulkan"]:
 if "udev" in env and env["udev"]:
     common_linuxbsd.append("libudev-so_wrap.c")
 
-common_linuxbsd.append("speechd-so_wrap.c")
+if "speechd" in env and env["speechd"]:
+    common_linuxbsd.append("speechd-so_wrap.c")
 
 prog = env.add_program("#bin/godot", ["godot_linuxbsd.cpp"] + common_linuxbsd)
 
diff --git a/platform/linuxbsd/detect.py b/platform/linuxbsd/detect.py
index 2fba58fc53..1ebfd941d5 100644
--- a/platform/linuxbsd/detect.py
+++ b/platform/linuxbsd/detect.py
@@ -76,6 +76,7 @@ def get_opts():
         BoolVariable("pulseaudio", "Detect and use PulseAudio", True),
         BoolVariable("dbus", "Detect and use D-Bus to handle screensaver", True),
         BoolVariable("udev", "Use udev for gamepad connection callbacks", True),
+        BoolVariable("speechd", "Detect and use Speech Dispatcher for Text-to-Speech support", True),
         BoolVariable("x11", "Enable X11 display", True),
         BoolVariable("debug_symbols", "Add debugging symbols to release/release_debug builds", True),
         BoolVariable("separate_debug_symbols", "Create a separate file containing debugging symbols", False),
@@ -337,6 +338,13 @@ def configure(env):
         else:
             print("Warning: D-Bus development libraries not found. Disabling screensaver prevention.")
 
+    if env["speechd"]:
+        if os.system("pkg-config --exists speech-dispatcher") == 0:  # 0 means found
+            env.Append(CPPDEFINES=["SPEECHD_ENABLED"])
+            env.ParseConfig("pkg-config speech-dispatcher --cflags")  # Only cflags, we dlopen the library.
+        else:
+            print("Warning: Speech Dispatcher development libraries not found. Disabling Text-to-Speech support.")
+
     if platform.system() == "Linux":
         env.Append(CPPDEFINES=["JOYDEV_ENABLED"])
         if env["udev"]:
diff --git a/platform/linuxbsd/display_server_x11.cpp b/platform/linuxbsd/display_server_x11.cpp
index 44f96fd69e..59056d8be3 100644
--- a/platform/linuxbsd/display_server_x11.cpp
+++ b/platform/linuxbsd/display_server_x11.cpp
@@ -308,6 +308,8 @@ void DisplayServerX11::_flush_mouse_motion() {
 	xi.relative_motion.y = 0;
 }
 
+#ifdef SPEECHD_ENABLED
+
 bool DisplayServerX11::tts_is_speaking() const {
 	ERR_FAIL_COND_V(!tts, false);
 	return tts->is_speaking();
@@ -343,6 +345,8 @@ void DisplayServerX11::tts_stop() {
 	tts->stop();
 }
 
+#endif
+
 void DisplayServerX11::mouse_set_mode(MouseMode p_mode) {
 	_THREAD_SAFE_METHOD_
 
@@ -4669,8 +4673,10 @@ DisplayServerX11::DisplayServerX11(const String &p_rendering_driver, WindowMode
 	xdnd_finished = XInternAtom(x11_display, "XdndFinished", False);
 	xdnd_selection = XInternAtom(x11_display, "XdndSelection", False);
 
+#ifdef SPEECH_ENABLLED
 	// Init TTS
 	tts = memnew(TTS_Linux);
+#endif
 
 	//!!!!!!!!!!!!!!!!!!!!!!!!!!
 	//TODO - do Vulkan and OpenGL support checks, driver selection and fallback
@@ -5024,7 +5030,9 @@ DisplayServerX11::~DisplayServerX11() {
 		memfree(xmbstring);
 	}
 
+#ifdef SPEECHD_ENABLED
 	memdelete(tts);
+#endif
 
 #ifdef DBUS_ENABLED
 	memdelete(screensaver);
diff --git a/platform/linuxbsd/display_server_x11.h b/platform/linuxbsd/display_server_x11.h
index 9c77fe4189..10be853604 100644
--- a/platform/linuxbsd/display_server_x11.h
+++ b/platform/linuxbsd/display_server_x11.h
@@ -45,7 +45,6 @@
 #include "servers/audio_server.h"
 #include "servers/rendering/renderer_compositor.h"
 #include "servers/rendering_server.h"
-#include "tts_linux.h"
 
 #if defined(GLES3_ENABLED)
 #include "gl_manager_x11.h"
@@ -60,6 +59,10 @@
 #include "freedesktop_screensaver.h"
 #endif
 
+#if defined(SPEECHD_ENABLED)
+#include "tts_linux.h"
+#endif
+
 #include <X11/Xcursor/Xcursor.h>
 #include <X11/Xlib.h>
 #include <X11/extensions/XInput2.h>
@@ -113,7 +116,9 @@ class DisplayServerX11 : public DisplayServer {
 	bool keep_screen_on = false;
 #endif
 
+#if defined(SPEECHD_ENABLED)
 	TTS_Linux *tts = nullptr;
+#endif
 
 	struct WindowData {
 		Window x11_window;
@@ -301,6 +306,7 @@ public:
 	virtual bool has_feature(Feature p_feature) const override;
 	virtual String get_name() const override;
 
+#if defined(SPEECHD_ENABLED)
 	virtual bool tts_is_speaking() const override;
 	virtual bool tts_is_paused() const override;
 	virtual Array tts_get_voices() const override;
@@ -309,6 +315,7 @@ public:
 	virtual void tts_pause() override;
 	virtual void tts_resume() override;
 	virtual void tts_stop() override;
+#endif
 
 	virtual void mouse_set_mode(MouseMode p_mode) override;
 	virtual MouseMode mouse_get_mode() const override;
diff --git a/platform/linuxbsd/speechd-so_wrap.c b/platform/linuxbsd/speechd-so_wrap.c
index 34a2418033..1a3f8e5436 100644
--- a/platform/linuxbsd/speechd-so_wrap.c
+++ b/platform/linuxbsd/speechd-so_wrap.c
@@ -1,7 +1,7 @@
 // This file is generated. Do not edit!
 // see https://github.com/hpvb/dynload-wrapper for details
-// generated by ./dynload-wrapper/generate-wrapper.py 0.3 on 2021-11-05 07:08:15
-// flags: ./dynload-wrapper/generate-wrapper.py --sys-include <speech-dispatcher/libspeechd.h> --include /usr/include/speech-dispatcher/libspeechd.h --soname libspeechd.so.2 --init-name speechd --output-header speechd-so_wrap.h --output-implementation speechd-so_wrap.c
+// generated by ./generate-wrapper.py 0.3 on 2022-04-28 11:58:03
+// flags: ./generate-wrapper.py --sys-include <libspeechd.h> --include /usr/include/speech-dispatcher/libspeechd.h --soname libspeechd.so.2 --init-name speechd --output-header speechd-so_wrap.h --output-implementation speechd-so_wrap.c --omit-prefix spd_get_client_list
 //
 #include <stdint.h>
 
@@ -71,7 +71,6 @@
 #define spd_set_output_module spd_set_output_module_dylibloader_orig_speechd
 #define spd_set_output_module_all spd_set_output_module_all_dylibloader_orig_speechd
 #define spd_set_output_module_uid spd_set_output_module_uid_dylibloader_orig_speechd
-#define spd_get_client_list spd_get_client_list_dylibloader_orig_speechd
 #define spd_get_message_list_fd spd_get_message_list_fd_dylibloader_orig_speechd
 #define spd_list_modules spd_list_modules_dylibloader_orig_speechd
 #define free_spd_modules free_spd_modules_dylibloader_orig_speechd
@@ -85,7 +84,7 @@
 #define spd_execute_command_wo_mutex spd_execute_command_wo_mutex_dylibloader_orig_speechd
 #define spd_send_data spd_send_data_dylibloader_orig_speechd
 #define spd_send_data_wo_mutex spd_send_data_wo_mutex_dylibloader_orig_speechd
-#include <speech-dispatcher/libspeechd.h>
+#include <libspeechd.h>
 #undef SPDConnectionAddress__free
 #undef spd_get_default_address
 #undef spd_open
@@ -152,7 +151,6 @@
 #undef spd_set_output_module
 #undef spd_set_output_module_all
 #undef spd_set_output_module_uid
-#undef spd_get_client_list
 #undef spd_get_message_list_fd
 #undef spd_list_modules
 #undef free_spd_modules
@@ -171,7 +169,7 @@
 void (*SPDConnectionAddress__free_dylibloader_wrapper_speechd)( SPDConnectionAddress*);
 SPDConnectionAddress* (*spd_get_default_address_dylibloader_wrapper_speechd)( char**);
 SPDConnection* (*spd_open_dylibloader_wrapper_speechd)(const char*,const char*,const char*, SPDConnectionMode);
-SPDConnection* (*spd_open2_dylibloader_wrapper_speechd)(const char*,const char*,const char*, SPDConnectionMode, SPDConnectionAddress*, int, char**);
+SPDConnection* (*spd_open2_dylibloader_wrapper_speechd)(const char*,const char*,const char*, SPDConnectionMode,const SPDConnectionAddress*, int, char**);
 int (*spd_get_client_id_dylibloader_wrapper_speechd)( SPDConnection*);
 void (*spd_close_dylibloader_wrapper_speechd)( SPDConnection*);
 int (*spd_say_dylibloader_wrapper_speechd)( SPDConnection*, SPDPriority,const char*);
@@ -234,7 +232,6 @@ char* (*spd_get_language_dylibloader_wrapper_speechd)( SPDConnection*);
 int (*spd_set_output_module_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
 int (*spd_set_output_module_all_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
 int (*spd_set_output_module_uid_dylibloader_wrapper_speechd)( SPDConnection*,const char*, unsigned int);
-int (*spd_get_client_list_dylibloader_wrapper_speechd)( SPDConnection*, char**, int*, int*);
 int (*spd_get_message_list_fd_dylibloader_wrapper_speechd)( SPDConnection*, int, int*, char**);
 char** (*spd_list_modules_dylibloader_wrapper_speechd)( SPDConnection*);
 void (*free_spd_modules_dylibloader_wrapper_speechd)( char**);
@@ -242,10 +239,10 @@ char* (*spd_get_output_module_dylibloader_wrapper_speechd)( SPDConnection*);
 char** (*spd_list_voices_dylibloader_wrapper_speechd)( SPDConnection*);
 SPDVoice** (*spd_list_synthesis_voices_dylibloader_wrapper_speechd)( SPDConnection*);
 void (*free_spd_voices_dylibloader_wrapper_speechd)( SPDVoice**);
-char** (*spd_execute_command_with_list_reply_dylibloader_wrapper_speechd)( SPDConnection*, char*);
-int (*spd_execute_command_dylibloader_wrapper_speechd)( SPDConnection*, char*);
-int (*spd_execute_command_with_reply_dylibloader_wrapper_speechd)( SPDConnection*, char*, char**);
-int (*spd_execute_command_wo_mutex_dylibloader_wrapper_speechd)( SPDConnection*, char*);
+char** (*spd_execute_command_with_list_reply_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
+int (*spd_execute_command_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
+int (*spd_execute_command_with_reply_dylibloader_wrapper_speechd)( SPDConnection*,const char*, char**);
+int (*spd_execute_command_wo_mutex_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
 char* (*spd_send_data_dylibloader_wrapper_speechd)( SPDConnection*,const char*, int);
 char* (*spd_send_data_wo_mutex_dylibloader_wrapper_speechd)( SPDConnection*,const char*, int);
 int initialize_speechd(int verbose) {
@@ -787,14 +784,6 @@ int initialize_speechd(int verbose) {
       fprintf(stderr, "%s\n", error);
     }
   }
-// spd_get_client_list
-  *(void **) (&spd_get_client_list_dylibloader_wrapper_speechd) = dlsym(handle, "spd_get_client_list");
-  if (verbose) {
-    error = dlerror();
-    if (error != NULL) {
-      fprintf(stderr, "%s\n", error);
-    }
-  }
 // spd_get_message_list_fd
   *(void **) (&spd_get_message_list_fd_dylibloader_wrapper_speechd) = dlsym(handle, "spd_get_message_list_fd");
   if (verbose) {
diff --git a/platform/linuxbsd/speechd-so_wrap.h b/platform/linuxbsd/speechd-so_wrap.h
index 043ba6c3c6..b8c59bf0d8 100644
--- a/platform/linuxbsd/speechd-so_wrap.h
+++ b/platform/linuxbsd/speechd-so_wrap.h
@@ -2,8 +2,8 @@
 #define DYLIBLOAD_WRAPPER_SPEECHD
 // This file is generated. Do not edit!
 // see https://github.com/hpvb/dynload-wrapper for details
-// generated by ./dynload-wrapper/generate-wrapper.py 0.3 on 2021-11-05 07:08:15
-// flags: ./dynload-wrapper/generate-wrapper.py --sys-include <speech-dispatcher/libspeechd.h> --include /usr/include/speech-dispatcher/libspeechd.h --soname libspeechd.so.2 --init-name speechd --output-header speechd-so_wrap.h --output-implementation speechd-so_wrap.c
+// generated by ./generate-wrapper.py 0.3 on 2022-04-28 11:58:03
+// flags: ./generate-wrapper.py --sys-include <libspeechd.h> --include /usr/include/speech-dispatcher/libspeechd.h --soname libspeechd.so.2 --init-name speechd --output-header speechd-so_wrap.h --output-implementation speechd-so_wrap.c --omit-prefix spd_get_client_list
 //
 #include <stdint.h>
 
@@ -73,7 +73,6 @@
 #define spd_set_output_module spd_set_output_module_dylibloader_orig_speechd
 #define spd_set_output_module_all spd_set_output_module_all_dylibloader_orig_speechd
 #define spd_set_output_module_uid spd_set_output_module_uid_dylibloader_orig_speechd
-#define spd_get_client_list spd_get_client_list_dylibloader_orig_speechd
 #define spd_get_message_list_fd spd_get_message_list_fd_dylibloader_orig_speechd
 #define spd_list_modules spd_list_modules_dylibloader_orig_speechd
 #define free_spd_modules free_spd_modules_dylibloader_orig_speechd
@@ -87,7 +86,7 @@
 #define spd_execute_command_wo_mutex spd_execute_command_wo_mutex_dylibloader_orig_speechd
 #define spd_send_data spd_send_data_dylibloader_orig_speechd
 #define spd_send_data_wo_mutex spd_send_data_wo_mutex_dylibloader_orig_speechd
-#include <speech-dispatcher/libspeechd.h>
+#include <libspeechd.h>
 #undef SPDConnectionAddress__free
 #undef spd_get_default_address
 #undef spd_open
@@ -154,7 +153,6 @@
 #undef spd_set_output_module
 #undef spd_set_output_module_all
 #undef spd_set_output_module_uid
-#undef spd_get_client_list
 #undef spd_get_message_list_fd
 #undef spd_list_modules
 #undef free_spd_modules
@@ -237,7 +235,6 @@ extern "C" {
 #define spd_set_output_module spd_set_output_module_dylibloader_wrapper_speechd
 #define spd_set_output_module_all spd_set_output_module_all_dylibloader_wrapper_speechd
 #define spd_set_output_module_uid spd_set_output_module_uid_dylibloader_wrapper_speechd
-#define spd_get_client_list spd_get_client_list_dylibloader_wrapper_speechd
 #define spd_get_message_list_fd spd_get_message_list_fd_dylibloader_wrapper_speechd
 #define spd_list_modules spd_list_modules_dylibloader_wrapper_speechd
 #define free_spd_modules free_spd_modules_dylibloader_wrapper_speechd
@@ -254,7 +251,7 @@ extern "C" {
 extern void (*SPDConnectionAddress__free_dylibloader_wrapper_speechd)( SPDConnectionAddress*);
 extern SPDConnectionAddress* (*spd_get_default_address_dylibloader_wrapper_speechd)( char**);
 extern SPDConnection* (*spd_open_dylibloader_wrapper_speechd)(const char*,const char*,const char*, SPDConnectionMode);
-extern SPDConnection* (*spd_open2_dylibloader_wrapper_speechd)(const char*,const char*,const char*, SPDConnectionMode, SPDConnectionAddress*, int, char**);
+extern SPDConnection* (*spd_open2_dylibloader_wrapper_speechd)(const char*,const char*,const char*, SPDConnectionMode,const SPDConnectionAddress*, int, char**);
 extern int (*spd_get_client_id_dylibloader_wrapper_speechd)( SPDConnection*);
 extern void (*spd_close_dylibloader_wrapper_speechd)( SPDConnection*);
 extern int (*spd_say_dylibloader_wrapper_speechd)( SPDConnection*, SPDPriority,const char*);
@@ -317,7 +314,6 @@ extern char* (*spd_get_language_dylibloader_wrapper_speechd)( SPDConnection*);
 extern int (*spd_set_output_module_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
 extern int (*spd_set_output_module_all_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
 extern int (*spd_set_output_module_uid_dylibloader_wrapper_speechd)( SPDConnection*,const char*, unsigned int);
-extern int (*spd_get_client_list_dylibloader_wrapper_speechd)( SPDConnection*, char**, int*, int*);
 extern int (*spd_get_message_list_fd_dylibloader_wrapper_speechd)( SPDConnection*, int, int*, char**);
 extern char** (*spd_list_modules_dylibloader_wrapper_speechd)( SPDConnection*);
 extern void (*free_spd_modules_dylibloader_wrapper_speechd)( char**);
@@ -325,10 +321,10 @@ extern char* (*spd_get_output_module_dylibloader_wrapper_speechd)( SPDConnection
 extern char** (*spd_list_voices_dylibloader_wrapper_speechd)( SPDConnection*);
 extern SPDVoice** (*spd_list_synthesis_voices_dylibloader_wrapper_speechd)( SPDConnection*);
 extern void (*free_spd_voices_dylibloader_wrapper_speechd)( SPDVoice**);
-extern char** (*spd_execute_command_with_list_reply_dylibloader_wrapper_speechd)( SPDConnection*, char*);
-extern int (*spd_execute_command_dylibloader_wrapper_speechd)( SPDConnection*, char*);
-extern int (*spd_execute_command_with_reply_dylibloader_wrapper_speechd)( SPDConnection*, char*, char**);
-extern int (*spd_execute_command_wo_mutex_dylibloader_wrapper_speechd)( SPDConnection*, char*);
+extern char** (*spd_execute_command_with_list_reply_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
+extern int (*spd_execute_command_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
+extern int (*spd_execute_command_with_reply_dylibloader_wrapper_speechd)( SPDConnection*,const char*, char**);
+extern int (*spd_execute_command_wo_mutex_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
 extern char* (*spd_send_data_dylibloader_wrapper_speechd)( SPDConnection*,const char*, int);
 extern char* (*spd_send_data_wo_mutex_dylibloader_wrapper_speechd)( SPDConnection*,const char*, int);
 int initialize_speechd(int verbose);
diff --git a/platform/linuxbsd/tts_linux.cpp b/platform/linuxbsd/tts_linux.cpp
index aea1183d3d..0ffa52f7bb 100644
--- a/platform/linuxbsd/tts_linux.cpp
+++ b/platform/linuxbsd/tts_linux.cpp
@@ -30,6 +30,8 @@
 
 #include "tts_linux.h"
 
+#ifdef SPEECHD_ENABLED
+
 #include "core/config/project_settings.h"
 #include "servers/text_server.h"
 
@@ -259,3 +261,5 @@ TTS_Linux::~TTS_Linux() {
 
 	singleton = nullptr;
 }
+
+#endif // SPEECHD_ENABLED
diff --git a/platform/linuxbsd/tts_linux.h b/platform/linuxbsd/tts_linux.h
index 12a3d0f052..fcc243eaa6 100644
--- a/platform/linuxbsd/tts_linux.h
+++ b/platform/linuxbsd/tts_linux.h
@@ -31,6 +31,8 @@
 #ifndef TTS_LINUX_H
 #define TTS_LINUX_H
 
+#ifdef SPEECHD_ENABLED
+
 #include "core/os/thread.h"
 #include "core/os/thread_safe.h"
 #include "core/string/ustring.h"
@@ -39,8 +41,6 @@
 #include "core/variant/array.h"
 #include "servers/display_server.h"
 
-#include <speech-dispatcher/libspeechd.h>
-
 #include "speechd-so_wrap.h"
 
 class TTS_Linux {
@@ -77,4 +77,6 @@ public:
 	~TTS_Linux();
 };
 
+#endif // SPEECHD_ENABLED
+
 #endif // TTS_LINUX_H

It's questionable whether this is really useful, as the dlopen mechanics already makes it optional, but this can be used to remove the compile-time dependency (since the wrapper needs the actual library header).

BTW I changed the system header format since speech-dispatcher seems to be included in the CFLAGS:

 $ pkg-config speech-dispatcher --cflags
-I/usr/include/speech-dispatcher -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include

So this would be needed to compile on NixOS (see #59991), or on any distro where the speechd includes are not in /usr/include.

@bruvzg
Copy link
Member Author

bruvzg commented Apr 28, 2022

Added a build option and a new wrapper, for extra compatibility, generated from older libspeechd2 0.9.1-4 (Ubuntu 20.04 LTS version).

@akien-mga
Copy link
Member

I think you need to use --sys-include <libspeechd.h> instead of --sys-include <speech-dispatcher/libspeechd.h> as I mentioned here:

BTW I changed the system header format since speech-dispatcher seems to be included in the CFLAGS:

$ pkg-config speech-dispatcher --cflags
-I/usr/include/speech-dispatcher -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include

So this would be needed to compile on NixOS (see #59913), or on any distro where the speechd includes are not in /usr/include.

… and Windows.

Implement TextServer word break method.
@bruvzg
Copy link
Member Author

bruvzg commented Apr 28, 2022

I think you need to use --sys-include <libspeechd.h> instead of --sys-include <speech-dispatcher/libspeechd.h> as I mentioned here

Changed, it seems to work both ways.

Copy link
Member

@akien-mga akien-mga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and reviewed Linux code, looks good!

@akien-mga akien-mga merged commit d25c3aa into godotengine:master Apr 28, 2022
@akien-mga
Copy link
Member

Thanks!

@bruvzg bruvzg deleted the tts2.0 branch April 28, 2022 13:15
@Sslaxx
Copy link

Sslaxx commented Apr 29, 2022

Would there be any chance of backporting this to 3.x?

CC @Cheeseness

@CsloudX
Copy link

CsloudX commented May 1, 2022

Wow! Godot, why I love you so much!!!

@CsloudX
Copy link

CsloudX commented May 13, 2022

Why this methods not under AudioServer but DisplayServer?

@seocwen
Copy link

seocwen commented May 17, 2022

The demo for this is out of date. I believe line 31 should be:
$ButtonPause.set_pressed(DisplayServer.tts_is_paused())

Great work though. I'd love if the voice playback could be routed through the audio buses so we could throw effects on them.

@lesleyrs
Copy link

Linux (speech-dispatcher), implemented and tested (on x86_64 Fedora 35, arm64 Ubuntu 21.10)

@bruvzg On fedora 37 it seems to just spam TTS is not supported by this display server. Even after switching to X11. I haven't tried any other version.

How do you get it working?

@Calinou
Copy link
Member

Calinou commented Dec 31, 2022

@bruvzg On fedora 37 it seems to just spam TTS is not supported by this display server. Even after switching to X11. I haven't tried any other version.

How do you get it working?

See #67863.

@ghost
Copy link

ghost commented Feb 17, 2023

Linux speech-dispatcher gets my ears bleeding. It's also hard to understand, just a sound mess. Have you guys tried using RHVoice instead? It has much, MUCH better quality.

@Cheeseness
Copy link
Contributor

Linux speech-dispatcher gets my ears bleeding. It's also hard to understand, just a sound mess. Have you guys tried using RHVoice instead? It has much, MUCH better quality.

Out of the box, Speech Dispatcher doesn't sound great to my ear, but I can be confident that any users dependent on it will already have it configured with voices (you're not limited to the default!), speeds, volumes, etc. to match their own tastes and needs, which is important!

The understanding I've gained from talking to vision-impaired players is that the kind of robotty voice that Speech Dispatcher users by default can end up being easier to parse at higher speed for people who are used to it. The analogy I often use to help explain this is that it's similar to the difference between a fancy script font that feels hand written with personality and a monospace font or a dyslexic-friendly font that's focused on readability but not very pretty by comparison - which one is easier to skim for useful information?

IMO, TTS as an accessibility tool (rather than a substitute for voice acting) is best viewed as interface rather than content - an insight that took me some time to internalise.

All that said, I don't rely on these features for my day-to-day computer usage and will happily defer to anybody who does.

And with all that said, from what I can see, RHVoice uses Speech Dispatcher on Linux, just with some different initial configuration loaded, which again, people who use this stuff will already have set up the way they need/like it to be.

@Calinou
Copy link
Member

Calinou commented Feb 18, 2023

The understanding I've gained from talking to vision-impaired players is that the kind of robotty voice that Speech Dispatcher users by default can end up being easier to parse at higher speed for people who are used to it.

This is indeed the case 🙂

Here's a sample of a screenreader used at such high speeds: https://s3.amazonaws.com/freecodecamp/screen-reader.mp3
It may sound like gibberish, but it is intelligible with proper training.

@ghost
Copy link

ghost commented Feb 19, 2023

Out of the box, Speech Dispatcher doesn't sound great to my ear, but I can be confident that any users dependent on it will already have it configured with voices (you're not limited to the default!), speeds, volumes, etc. to match their own tastes and needs, which is important!

Sure. But I'm rather talking about the characters voicing, not screen reader feature. In this special case, character voices should sound the way developer wants it to. You may suggest using pre-recorded .wav's, but what if I want characters to voice player's name as well?

It may sound like gibberish, but it is intelligible with proper training.

Eminem mode, lol

@Zireael07
Copy link
Contributor

Sure. But I'm rather talking about the characters voicing, not screen reader feature. In this special case, character voices should sound the way developer wants it to.

Current text to speech voices are very limited and not at all relevant to voicing characters.

@Calinou
Copy link
Member

Calinou commented Feb 20, 2023

Text-to-speech is meant to be an accessibility or convenience feature (e.g. to read player text chat aloud).

TTS APIs aren't what you should use for voicing characters. You will need an entirely different solution, which is more complex to develop as there's no standard for this. AI-based voice synthesis is a thing but it needs to be done offline, as it requires lots of hardware resources.

@ghost
Copy link

ghost commented Feb 24, 2023

You will need an entirely different solution, which is more complex to develop as there's no standard for this.

I realize that my suggestion may not be convinient for many people, but everyone should have a choice. It's more a matter of freedom, it's cheap and it's better than nothing. Besides, it might give the game a twist, making it unique. So why not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement support for text-to-speech