Home > Posts > HowTo: Pause and Resume Speech Recognition with Microsoft engines

HowTo: Pause and Resume Speech Recognition with Microsoft engines

At SpeechTurtle application, I’ve just added speech feedback (voicing of a command) when an available command is executed using a mouse click on its name.

That could also help the user learn the expected pronunciation in English in case the speech recognition engine doesn’t understand some of the commands as voiced by the user. One can assume most of what the Speech Synthesis engine outputs to be recognizable by the Speech Recognition engine.

An issue with this approach though, is that the Speech Recognition can be fired accidentally by the speech synthesis commands, if the speech recognition engine doesn’t handle this case automatically, ignoring synthesized speech that is being generated in parallel by the speech engine.

In fact this can also be a security issue, with a malicious agent delivering voice commands to your system via some audio or video file/stream they lure you into listening/watching, or some web page they lure you into visiting (even if a webpage is not malicious, it might have been served and hosting a malicious ad by an ad network).

So, we need some way to pause the speech recognition while speaking, to avoid misfiring of recognition, since from my experience, the speech synthesis and recognition engines from .NET’s System.Speech namespace on recent Windows versions (tried with Windows 10) do have this issue.

In SpeechLib (that SpeechTurtle uses via the SpeechLib NuGet package), I’ve added commands Pause and Resume to the ISpeechRecognition interface (defined in SpeechLib.Models project and respective NuGet package and implemented at SpeechLib.Recognition and SpeechLib.Recognition.KinectV1 projects and NuGet packages).

So, in SpeechTurtle, I can do:

public void SpeakCommand(string command)
{   
  speechRecognition.Pause(); //pause the speech recognizer
  speechSynthesis.Speak(command);   
  speechRecognition.Resume();
}

Note the pattern used in SpeechRecognition.cs to retry 10 times to pause the speech recognition engine, since errors are thrown if one tries to Stop it or Set its audio input to none while it is trying to perform some recognition.

public void Pause()
{   
  for (int i=0; i<10; i++) //(re)try 10 times
  //(since we wait 100 ms at failure below before retrying, max wait is 1000ms=1sec)
    try
    {       
      SetInputToNone();
      return; //exit retry loop if succeeded
    }
  catch //catch and ignore any error saying that recognition is currently running
    {       
      Thread.Sleep(100); //retry in 100ms
    }
}

Update 1:

After more testing, it seems the above approach with the loop and try/catch won’t work

if one uses the async versions of Speech Recognition methods, since the exceptions are thrown from another thread. In that case one need to add a global exception handler.

Update 2:

After lots of trial and error, I ended up with this working pattern for Pause and Resume in SpeechLib’s SpeechRecognition.cs (note that paused is a bool(ean) field of that class, defaulting to false and PAUSE_LOOP_SLEEP is a const(ant) int(eger) set to 10 (msec):

public void Pause()
{   
  paused = true;
  speechRecognitionEngine.RequestRecognizerUpdate();
}
 
public void Resume()
{   
  paused = false;
}

At the constructor of that SpeechRecognition class I do:

  speechRecognitionEngine.RecognizerUpdateReached +=
(s, e) => {
while (paused) Thread
.Sleep(PAUSE_LOOP_SLEEP); };

I do a loop at RecognizerUpdateReached event handler to make sure the Speech Recognition

thread is waiting for the pause field to change value back to false. That event occurs after the call to RequestRecognizerUpdate in Pause method (which is done after first setting paused=true there).

  1. No comments yet.
  1. No trackbacks yet.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.