Introduction
The purpose of this article is it to give you a small insight of the capabilities of theSystem.Speech
assembly. In detail, the usage of the SpeechRecognitionEngine
class. The MSDN documentation of the class can be foundhere.Background
I read several articles about how to use Text to Speech, but as I wanted to find out how to do it the opposite way, I realized that there is a lack of easily understandable articles covering this theme, so I decided to write a very basic one on my own and share my experiences with you.The Solution
So now let's start. First of all you need to reference theSystem.Speech
assembly in your application located in the GAC.This is the only reference needed containing the following namespaces and its classes. The
System.Speech.Recognition
namespace contains the Windows Desktop Speech technology types for implementing speech recognition.System.Speech.AudioFormat
System.Speech.Recognition
System.Speech.Recognition.SrgsGrammar
System.Speech.Synthesis
System.Speech.Synthesis.TtsEngine
SpeechRecognitionEngine
, you have to set up several properties and invoke some methods: in this case I guess, code sometimes says more than words ...Hide Copy Code
// the recognition engine
SpeechRecognitionEngine speechRecognitionEngine = null;
// create the engine with a custom method (i will describe that later)
speechRecognitionEngine = createSpeechEngine("de-DE");
// hook to the needed events
speechRecognitionEngine.AudioLevelUpdated +=
new EventHandler<AudioLevelUpdatedEventArgs>(engine_AudioLevelUpdated);
speechRecognitionEngine.SpeechRecognized +=
new EventHandler<SpeechRecognizedEventArgs>(engine_SpeechRecognized);
// load a custom grammar, also described later
loadGrammarAndCommands();
// use the system's default microphone, you can also dynamically
// select audio input from devices, files, or streams.
speechRecognitionEngine.SetInputToDefaultAudioDevice();
// start listening in RecognizeMode.Multiple, that specifies
// that recognition does not terminate after completion.
speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
In detail now, the function createSpeechEngine(string preferredCulture)
. The standard constructor and its overloads are the following:SpeechRecognitionEngine()
: Initializes a new instance using the default speech recognizer for the system.SpeechRecognitionEngine(CultureInfo)
: Initializes a new instance using the default speech recognizer for a specified locale.SpeechRecognitionEngine(RecognizerInfo)
: Initializes a new instance using the information in aRecognizerInfo
object to specify the recognizer to use.SpeechRecognitionEngine(String)
: Initializes a new instance of the class with a string parameter that specifies the name of the recognizer to use.
CultureInfo
that is used by theSpeechRecognitionEnginge
but as far as I know, it is only supported on Win7 Ultimate/Enterprise.Hide Copy Code
private SpeechRecognitionEngine createSpeechEngine(string preferredCulture)
{
foreach (RecognizerInfo config in SpeechRecognitionEngine.InstalledRecognizers())
{
if (config.Culture.ToString() == preferredCulture)
{
speechRecognitionEngine = new SpeechRecognitionEngine(config);
break;
}
}
// if the desired culture is not installed, then load default
if (speechRecognitionEngine == null)
{
MessageBox.Show("The desired culture is not installed " +
"on this machine, the speech-engine will continue using "
+ SpeechRecognitionEngine.InstalledRecognizers()[0].Culture.ToString() +
" as the default culture.", "Culture " + preferredCulture + " not found!");
speechRecognitionEngine = new SpeechRecognitionEngine();
}
return speechRecognitionEngine;
}
The next step is it to set up the used Grammar
that is loaded by the SpeechRecognitionEngine
. In our case, we create a custom text file that contains key-value pairs of texts wrapped in the custom class SpeechToText.Word
because I wanted to extend the usability of the program and give you a little showcase on what is possible with SAPI. That is interesting because in doing so, we are able to associate texts or even commands to a recognized word. Here is the wrapper class SpeechToText.Word
.Hide Copy Code
namespace SpeechToText
{
public class Word
{
public Word() { }
public string Text { get; set; } // the word to be recognized by the engine
public string AttachedText { get; set; } // the text associated with the recognized word
public bool IsShellCommand { get; set; } // flag determining whether this word is an command or not
}
}
Here is the method to set up the Choices
used by the Grammar
. In the foreach
loop, we create and insert theWord
classes and store them for later usage in a lookup List<Word>
. Afterwards we insert the parsed words into the Choices
class and finally build the Grammar
by using a GrammarBuilder
and load it synchronously with theSpeechRecognitionEngine
. You could also simply add string
s to the choices class by hand or load a predefined XML-file. Now our engine is ready to recognize the predefined words.Hide Shrink Copy Code
private void loadGrammarAndCommands()
{
try
{
Choices texts = new Choices();
string[] lines = File.ReadAllLines(Environment.CurrentDirectory + "\\example.txt");
foreach (string line in lines)
{
// skip commentblocks and empty lines..
if (line.StartsWith("--") || line == String.Empty) continue;
// split the line
var parts = line.Split(new char[] { '|' });
// add word to the list for later lookup or execution
words.Add(new Word() { Text = parts[0], AttachedText = parts[1],
IsShellCommand = (parts[2] == "true") });
// add the text to the known choices of the speech-engine
texts.Add(parts[0]);
}
Grammar wordsList = new Grammar(new GrammarBuilder(texts));
speechRecognitionEngine.LoadGrammar(wordsList);
}
catch (Exception ex)
{
throw ex;
}
}
To start the SpeechRecognitionEngine
, we callSpeechRecognitionEngine.StartRecognizeAsync(RecognizeMode.Multiple)
. This means that the recognizer continues performing asynchronous recognition operations until the RecognizeAsyncCancel()
orRecognizeAsyncStop()
method is called. To retrieve the result of an asynchronous recognition operation, attach an event handler to the recognizer's SpeechRecognized
event. The recognizer raises this event whenever it successfully completes a synchronous or asynchronous recognition operation.Hide Copy Code
// attach eventhandler
speechRecognitionEngine.SpeechRecognized +=
new EventHandler<SpeechRecognizedEventArgs>(engine_SpeechRecognized);
// start recognition
speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
// Recognized-event
void engine_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
txtSpoken.Text += "\r" + getKnownTextOrExecute(e.Result.Text);
scvText.ScrollToEnd();
}
And here comes the gimmick of this application, when the engine recognizes one of our predefined words, we decide whether to return the associated text, or to execute a shell command. This is done in the following function:Hide Copy Code
private string getKnownTextOrExecute(string command)
{
try
{ // use a little bit linq for our lookup list ...
var cmd = words.Where(c => c.Text == command).First();
if (cmd.IsShellCommand)
{
Process proc = new Process();
proc.EnableRaisingEvents = false;
proc.StartInfo.FileName = cmd.AttachedText;
proc.Start();
return "you just started : " + cmd.AttachedText;
}
else
{
return cmd.AttachedText;
}
}
catch (Exception)
{
return command;
}
}
That is it! There are plenty of other possibilities to use the SAPI for, maybe a Visual Studio plug-in for coding? Let me know what ideas you guys have! I hope you enjoyed my first article.Update Contact :
No Wa/Telepon (puat) : 085267792168
No Wa/Telepon (fajar) : 085369237896
Email : Fajarudinsidik@gmail.com
No Wa/Telepon (puat) : 085267792168
No Wa/Telepon (fajar) : 085369237896
Email: Fajarudinsidik@gmail.com
atau Kirimkan Private messanger melalui email dengan klik tombol order dibawah ini :