Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Voice activity threshold #2556

Closed
Show file tree
Hide file tree
Changes from 74 commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
d92a209
wip: voice activity detection
hugohutri Jul 23, 2022
aed1a05
feat: voice activity detection
hugohutri Aug 1, 2022
7572556
feat: voice activity slider
hugohutri Aug 1, 2022
7ad7ee2
style: remove unrelated code
hugohutri Aug 2, 2022
0d04f77
feat: voice activity cooldown debounce
hugohutri Aug 13, 2022
3aa93d8
wip: voice activity detection
hugohutri Jul 23, 2022
9f6d7ad
feat: voice activity detection
hugohutri Aug 1, 2022
44ddb40
feat: voice activity slider
hugohutri Aug 1, 2022
418c207
style: remove unrelated code
hugohutri Aug 2, 2022
57a6f05
chore: add prettier config
hugohutri Aug 15, 2022
c0380d3
style: remove legacy code
hugohutri Aug 15, 2022
2a5e074
style: add newline
hugohutri Aug 15, 2022
f440ce4
style: remove legacy code
hugohutri Aug 15, 2022
1ff489b
fix: calling undefined
hugohutri Aug 17, 2022
7fff660
test: add voice activation tests
hugohutri Aug 24, 2022
587bb1d
style: fix typos
hugohutri Aug 24, 2022
1f0c2d1
style: prettier changes
hugohutri Aug 24, 2022
a367e10
fix formatting
DashieTM Aug 25, 2022
0a98fbe
fix VAD
DashieTM Aug 25, 2022
512cf68
Update yarn.lock
DashieTM Aug 25, 2022
23310a5
fix VAD
DashieTM Aug 25, 2022
d28a845
fix VAD
DashieTM Aug 25, 2022
a68e525
Update src/webrtc/groupCall.ts
DashieTM Aug 25, 2022
17ca726
Merge pull request #1 from DashieTM/robertlong/group-call
hugohutri Aug 25, 2022
df8665c
fix more formatting
DashieTM Aug 25, 2022
3f1c406
fix more formatting
DashieTM Aug 25, 2022
85503c2
Delete .prettierrc
DashieTM Aug 25, 2022
5e645a9
fix more formatting
DashieTM Aug 25, 2022
bf94307
fix more formatting
DashieTM Aug 25, 2022
2f6e5e5
fix more formatting
DashieTM Aug 25, 2022
5b00df1
lower polling rate
DashieTM Aug 25, 2022
31e0764
fix indentation
DashieTM Aug 25, 2022
3be467a
fix indentation
DashieTM Aug 25, 2022
55122f7
Update spec/unit/webrtc/callFeed.spec.ts
DashieTM Aug 25, 2022
e224eeb
chore: remove leftover code
DashieTM Aug 25, 2022
e3577aa
chore: remove leftover code
DashieTM Aug 25, 2022
0f66073
chore: fix comments
DashieTM Aug 26, 2022
6839e85
chore: fix various code oversights
DashieTM Aug 26, 2022
d997d95
chore: fix various code oversights
DashieTM Aug 26, 2022
f31db86
chore: fix various code oversights
DashieTM Aug 26, 2022
5748253
chore: set threshold to -100
DashieTM Sep 1, 2022
6346873
chore: remove blank padding
DashieTM Sep 1, 2022
c505b00
fix delay on treshold
DashieTM Sep 14, 2022
6368ac2
fix typos
DashieTM Sep 21, 2022
519f1e5
wip: move to props
hugohutri Sep 21, 2022
ae1c3c8
Update src/webrtc/callFeed.ts
DashieTM Oct 6, 2022
922f341
Update src/webrtc/groupCall.ts
DashieTM Oct 6, 2022
84e6304
refactor: streams
DashieTM Oct 6, 2022
db4dae4
refactor: streams
DashieTM Oct 6, 2022
bf35af7
refactor: add vad event
hugohutri Oct 6, 2022
3ac74e2
fix: undefined localcallfeed
hugohutri Oct 6, 2022
4b920b1
style: rename cooldown and remove comments
hugohutri Oct 6, 2022
b546985
test: add voice activity cooldown tests
hugohutri Oct 6, 2022
6db68a9
update branch
DashieTM Oct 8, 2022
b160979
update branch
DashieTM Oct 8, 2022
e5ea935
refactor: add groupcall method for treshold
hugohutri Oct 8, 2022
9721750
style: remove unused file
hugohutri Oct 8, 2022
a9ea691
refactor: remove legacy code
DashieTM Oct 8, 2022
3166245
Update spec/unit/webrtc/callFeed.spec.ts
DashieTM Oct 8, 2022
30b75fd
Update spec/unit/webrtc/callFeed.spec.ts
DashieTM Oct 8, 2022
44bbf2a
Update spec/unit/webrtc/callFeed.spec.ts
DashieTM Oct 8, 2022
217ed4f
Update spec/unit/webrtc/callFeed.spec.ts
DashieTM Oct 8, 2022
be4fac6
refactor: Mute Cooldown on stop speaking
DashieTM Oct 8, 2022
0b5a908
style: change naming
DashieTM Oct 8, 2022
6b43fea
style: add comments to tests
DashieTM Oct 8, 2022
cc6f7c9
style: unify test style
DashieTM Oct 8, 2022
7d6122b
style: add gainnode / delaynode comment
DashieTM Oct 8, 2022
266893e
Update spec/unit/webrtc/callFeed.spec.ts
DashieTM Oct 9, 2022
dc92b26
Update spec/unit/webrtc/callFeed.spec.ts
DashieTM Oct 9, 2022
96ad11a
Update spec/unit/webrtc/callFeed.spec.ts
DashieTM Oct 9, 2022
28fdc47
Update src/webrtc/callFeed.ts
DashieTM Oct 9, 2022
b5ac607
refactor: constant audio delay
DashieTM Oct 9, 2022
ce3254b
refactor: constant audio delay
DashieTM Oct 9, 2022
eaead7d
refactor: vad tests
DashieTM Oct 9, 2022
b7c0e9e
Update spec/unit/webrtc/callFeed.spec.ts
DashieTM Oct 9, 2022
24862b6
refactor: vad tests
DashieTM Oct 9, 2022
bee03fb
refactor: vad tests
DashieTM Oct 9, 2022
b37c93b
gfix: improve delay and increase cooldown
DashieTM Oct 9, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 120 additions & 0 deletions spec/unit/webrtc/callFeed.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -85,5 +85,125 @@ describe("CallFeed", () => {
expect(feed.isVideoMuted()).toBeTruthy();
});
});

describe("voice activity detection", () => {
hugohutri marked this conversation as resolved.
Show resolved Hide resolved
it("If voice activity is disabled we should not mute/unmute with it", () => {
DashieTM marked this conversation as resolved.
Show resolved Hide resolved
// @ts-ignore Mock
feed.stream.addTrack(new MockMediaStreamTrack("track", "audio", true));

// set threshold to infinity, this ensures we never hit the threshold.
feed.setVoiceActivityThreshold(Infinity);
// the number doesn't matter, anything below infinity is ok.
feed.speakingVolumeSamples = [-60];
// we expect no mute/unmute behavior from vad from here on.
feed.VADEnabled = false;

setTimeout(() => {
expect(feed.stream.getAudioTracks()[0].enabled).toBe(true);
}, 1000);
});

it("enables track when volume is above threshold", () => {
// @ts-ignore Mock
feed.stream.addTrack(new MockMediaStreamTrack("track", "audio", true));

// set the threshold and the samples to ensure the user is unmuted at the start.
feed.setVoiceActivityThreshold(-80);
feed.speakingVolumeSamples = [-60];

// user has -40db which is louder than -50db, so the user should be unmuted.
setTimeout(() => {
expect(feed.stream.getAudioTracks()[0].enabled).toBe(true);
}, 1000);
});

it("disables track when volume is below threshold", () => {
// @ts-ignore Mock
feed.stream.addTrack(new MockMediaStreamTrack("track", "audio", true));

// set the threshold and the samples to ensure the user is muted at the start.
feed.setVoiceActivityThreshold(-80);
feed.speakingVolumeSamples = [-90];

// the user is too quiet, user should be muted.
setTimeout(() => {
expect(feed.stream.getAudioTracks()[0].enabled).toBe(false);
}, 1000);
});

it("should not disable audio track after a few milliseconds", async () => {
// Someone speaks
// Stops speaking for a few milliseconds
// -> Is not muted before cooldown -> (VAD_COOLDOWN)

// @ts-ignore Mock
feed.stream.addTrack(new MockMediaStreamTrack("track", "audio", true));

// set the threshold and the samples to ensure the user is unmuted at the start.
feed.setVoiceActivityThreshold(-80);
feed.speakingVolumeSamples = [-60];

// pretend the user is silent after 100ms
setTimeout(() => {
feed.speakingVolumeSamples = [-Infinity];
}, 100);

// the user should still be unmuted after another 50ms.
// Cooldown is 200ms, so this is within the range.
setTimeout(() => {
expect(feed.stream.getAudioTracks()[0].enabled).toBe(true);
}, 150);
});

it("should disable audio track after cooldown", async () => {
// Someone speaks
// Stops speaking
// -> Is muted after cooldown -> (VAD_COOLDOWN)

//@ts-ignore Mock
feed.stream.addTrack(new MockMediaStreamTrack("track", "audio", true));

// set the threshold and the samples to ensure the user is unmuted at the start.
feed.setVoiceActivityThreshold(-80);
feed.speakingVolumeSamples = [-60];

// pretend the user is silent after 100ms
setTimeout(() => {
feed.speakingVolumeSamples = [-Infinity];
}, 100);

// The user should be muted after another 210ms.
// 200ms is the cooldown, so we are outside of that range.
setTimeout(() => {
expect(feed.stream.getAudioTracks()[0].enabled).toBe(false);
}, 310);
});

it("cooldown should be reset when speaking", async () => {
// Cooldown is reset after speaking again

// @ts-ignore Mock
feed.stream.addTrack(new MockMediaStreamTrack("track", "audio", true));

// set the threshold and the samples to ensure the user is unmuted at the start.
feed.setVoiceActivityThreshold(-80);
feed.speakingVolumeSamples = [-60];

// pretend the user is silent after 100ms
setTimeout(() => {
feed.speakingVolumeSamples = [-Infinity];
}, 100);

// pretend the user starts speaking again after another 100ms
setTimeout(() => {
feed.speakingVolumeSamples = [-60];
}, 200);

// after yet another 100ms check if the user is still unmuted.
setTimeout(() => {
expect(feed.stream.getAudioTracks()[0].enabled).toBe(true);
}, 310);
});
});
});
});
104 changes: 94 additions & 10 deletions src/webrtc/callFeed.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,12 @@ import { RoomMember } from "../models/room-member";
import { logger } from "../logger";
import { TypedEventEmitter } from "../models/typed-event-emitter";

const POLLING_INTERVAL = 200; // ms
const POLLING_INTERVAL = 1; // ms
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
export const SPEAKING_THRESHOLD = -60; // dB
const VAD_THRESHOLD = -100; //dB
const SPEAKING_SAMPLE_COUNT = 8; // samples
const VAD_COOLDOWN = 200; // ms
const VAD_AUDIO_DELAY = 0.001; // ms

export interface ICallFeedOpts {
client: MatrixClient;
Expand All @@ -39,6 +42,11 @@ export interface ICallFeedOpts {
* Whether or not the remote SDPStreamMetadata says video is muted
*/
videoMuted: boolean;
/**
* set the Tracks to muted when volume threshold has not been reached
* This does not show the user as muted.
*/
VADEnabled?: boolean;
}

export enum CallFeedEvent {
Expand All @@ -47,6 +55,8 @@ export enum CallFeedEvent {
LocalVolumeChanged = "local_volume_changed",
VolumeChanged = "volume_changed",
Speaking = "speaking",
VoiceActivityThresholdChanged = "voice_activity_threshold_changed",
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
VADMuteStateChanged = "vad_mute_state_changed",
Disposed = "disposed",
}

Expand All @@ -56,19 +66,25 @@ type EventHandlerMap = {
[CallFeedEvent.LocalVolumeChanged]: (localVolume: number) => void;
[CallFeedEvent.VolumeChanged]: (volume: number) => void;
[CallFeedEvent.Speaking]: (speaking: boolean) => void;
[CallFeedEvent.VoiceActivityThresholdChanged]: (threshold: number) => void;
[CallFeedEvent.VADMuteStateChanged]: (VADMuted: boolean) => void;
[CallFeedEvent.Disposed]: () => void;
};

export class CallFeed extends TypedEventEmitter<CallFeedEvent, EventHandlerMap> {
public stream: MediaStream;
public volumeLooperStream: MediaStream;
public sdpMetadataStreamId: string;
public userId: string;
public purpose: SDPStreamMetadataPurpose;
public speakingVolumeSamples: number[];
public voiceActivityThreshold: number;
public VADEnabled = false;
public maxVolume = -Infinity;

private client: MatrixClient;
private roomId: string;
private audioMuted: boolean;
private vadAudioMuted: boolean;
private videoMuted: boolean;
private localVolume = 1;
private measuringVolumeActivity = false;
Expand All @@ -77,7 +93,15 @@ export class CallFeed extends TypedEventEmitter<CallFeedEvent, EventHandlerMap>
private frequencyBinCount: Float32Array;
private speakingThreshold = SPEAKING_THRESHOLD;
private speaking = false;
private vadSpeaking = false;
private audioDelay: DelayNode;
private gainNode: GainNode;
private volumeLooperTimeout: ReturnType<typeof setTimeout>;
/**
* Cooldown for voice activity detection, so that we don't mute immediately when the user stops speaking
* But when he has been silent 200ms
*/
private VADCooldownStarted = new Date();
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
private _disposed = false;

constructor(opts: ICallFeedOpts) {
Expand All @@ -91,14 +115,26 @@ export class CallFeed extends TypedEventEmitter<CallFeedEvent, EventHandlerMap>
this.videoMuted = opts.videoMuted;
this.speakingVolumeSamples = new Array(SPEAKING_SAMPLE_COUNT).fill(-Infinity);
this.sdpMetadataStreamId = opts.stream.id;
this.voiceActivityThreshold = VAD_THRESHOLD;

this.updateStream(null, opts.stream);

this.VADEnabled = opts.VADEnabled ?? false;

if (this.hasAudioTrack) {
this.initVolumeMeasuring();
}
}

public setVoiceActivityThreshold(threshold: number): void {
this.voiceActivityThreshold = threshold;
if (threshold === -100) {
this.VADEnabled = false;
} else {
this.VADEnabled = true;
}
}

private get hasAudioTrack(): boolean {
return this.stream.getAudioTracks().length > 0;
}
Expand Down Expand Up @@ -132,9 +168,23 @@ export class CallFeed extends TypedEventEmitter<CallFeedEvent, EventHandlerMap>
this.analyser.fftSize = 512;
this.analyser.smoothingTimeConstant = 0.1;

const mediaStreamAudioSourceNode = this.audioContext.createMediaStreamSource(this.stream);
this.volumeLooperStream = this.stream.clone();
const mediaStreamAudioSourceNode = this.audioContext.createMediaStreamSource(this.volumeLooperStream);
mediaStreamAudioSourceNode.connect(this.analyser);

// If we handle audio manually then we need a gainnode, streamnode and a destination
// in this case we also add a delaynode to ensure we enable / disable vad at the right time.
// The delaynode needs to be created in the class to ensure functionality.
// Without the gainnode, the audio doesn't appear to work correctly
const streamNode = this.audioContext.createMediaStreamSource(this.stream);
this.audioDelay = this.audioContext.createDelay(VAD_AUDIO_DELAY);
this.gainNode = this.audioContext.createGain();
streamNode.connect(this.gainNode);
this.gainNode.connect(this.audioDelay);
const destination = this.audioContext.createMediaStreamDestination();
this.audioDelay.connect(destination);
this.stream = destination.stream;

this.frequencyBinCount = new Float32Array(this.analyser.frequencyBinCount);
}

Expand Down Expand Up @@ -207,6 +257,13 @@ export class CallFeed extends TypedEventEmitter<CallFeedEvent, EventHandlerMap>
}
if (videoMuted !== null) this.videoMuted = videoMuted;
this.emit(CallFeedEvent.MuteStateChanged, this.audioMuted, this.videoMuted);
this.VADEnabled = !audioMuted;
}

public setVoiceActivityDetectionMuteLocal(audioMuted: boolean | null): void {
if (audioMuted !== null) {
this.vadAudioMuted = audioMuted;
}
}

/**
Expand Down Expand Up @@ -237,17 +294,39 @@ export class CallFeed extends TypedEventEmitter<CallFeedEvent, EventHandlerMap>

this.analyser.getFloatFrequencyData(this.frequencyBinCount);

let maxVolume = -Infinity;
let maxCurrentVolume = -Infinity;
for (let i = 0; i < this.frequencyBinCount.length; i++) {
if (this.frequencyBinCount[i] > maxVolume) {
maxVolume = this.frequencyBinCount[i];
if (this.frequencyBinCount[i] > maxCurrentVolume) {
maxCurrentVolume = this.frequencyBinCount[i];
}
}

this.maxVolume = maxCurrentVolume;
this.speakingVolumeSamples.shift();
this.speakingVolumeSamples.push(maxVolume);

this.emit(CallFeedEvent.VolumeChanged, maxVolume);
this.speakingVolumeSamples.push(this.maxVolume);

this.emit(CallFeedEvent.VolumeChanged, this.maxVolume);

// Handle voice activity detection only if it is enabled and user has not manually muted themselves
if (this.VADEnabled) {
// If the user is speaking
if (this.maxVolume > this.voiceActivityThreshold) {
this.vadSpeaking = true;

if (this.vadAudioMuted) {
this.emit(CallFeedEvent.VADMuteStateChanged, false);
}
} else if (!this.vadAudioMuted) {
// User stops speaking

if (this.vadSpeaking) {
this.VADCooldownStarted = new Date();
this.vadSpeaking = false;
} else if (!this.isVADInCooldown()) {
// user has been silent for X milliseconds
this.emit(CallFeedEvent.VADMuteStateChanged, true);
}
}
}

let newSpeaking = false;

Expand All @@ -268,6 +347,11 @@ export class CallFeed extends TypedEventEmitter<CallFeedEvent, EventHandlerMap>
this.volumeLooperTimeout = setTimeout(this.volumeLooper, POLLING_INTERVAL);
};

private isVADInCooldown(): boolean {
return (
new Date().getTime() - this.VADCooldownStarted.getTime() < VAD_COOLDOWN);
}

public clone(): CallFeed {
const mediaHandler = this.client.getMediaHandler();
const stream = this.stream.clone();
Expand Down
Loading