Sunday, April 21, 2013

Introduction to Speech Capabilities in Windows Phone 8 – Part 3

Today I am going to post last part of this series, Hope Part1 and Part2 went well with you and hope you tried out the Speech Capabilities of Windows Phone 8. We already talked a lot about the different approaches of Text To Speech, In this we will see exactly reverse approach that its Speech To Text. Most of the time it is considered to be vary hard and difficult to implement, But with the given APIs in Windows Phone 8 makes them pretty easy to implement.So Lets see how you can build such app quickly.

Take a new Windows Phone 8 Application Project. Design I will leave that to you, Right now I have put this inside a Pivot item like this :

XAML :

<phone:PivotItem Header="Speech2Text" DoubleTap="LoadSpeechToText">               
</phone:PivotItem>

C# Code :

private async void LoadSpeechToText(object sender, RoutedEventArgs e)
       {
           SpeechRecognizerUI myspeechRecognizer = new SpeechRecognizerUI();
           myspeechRecognizer.Settings.ExampleText = "Ex. Call,Search,Run";
           myspeechRecognizer.Settings.ListenText = "Listening...";
           ….

       }

This will bring up the Popup where you need to suppose to talk or give command, To enable this functionality, you need to add two more lines of code

myspeechRecognizer.Settings.ReadoutEnabled = true;
myspeechRecognizer.Settings.ShowConfirmation = true;

Once you run this you will see our regular Speech Popup like this :

Launching

But just showing this screen is not sufficient, we need to capture the Text and display it to user, So for this we need to add few more lines of code

SpeechRecognitionUIResult Speechresult = await speechRecognizer.RecognizeWithUIAsync();
if (Speechresult.ResultStatus == SpeechRecognitionUIStatus.Succeeded)
{
    MessageBox.Show(Speechresult.RecognitionResult.Text);
}

Now you can see the result on a MessageBox, While doing the test I said “Nokia” and you can see the result on the screen.

Launchingheardsay

And here you can see the Result on MessageBox.  finalNow its your decision where you want this piece of code to be use, There are lot of Business cases where you can use this kind of Speech Recognition. You can use this to launch certain Commands in your application or you can use to record voice as well and convert to text for any purpose.Hope you will find this useful and thus quick end of my Speech Capability series, I have kept it short but in coming days I am going to put detail article on these features by taking a Business Case.Till then Happy Coding, I will soon post a Calendar related article and we will see how we can use that API at our best.

Vikram.

Tuesday, April 2, 2013

Introduction to Speech Capabilities in Windows Phone 8 – Part 2

Hope you enjoyed my last article on Speech Capability in Windows Phone 8, Today I am posting another part or you can say little extension to what I did in Part 1.

In the first part we saw how we can incorporate the built in Speech Capability with the given set of Speech APIs in Windows Phone 8 SDK and how they have edge over earlier Windows Phone builds like 7 and above.We saw I simple Hello World kind of demo, Today I am going to demonstrate how we can leverage the SSML (Speech Synthesis Markup Language) using Speech APIs in Windows Phone.

What is SSML ? :

As per W3C, SSML can be defined as :

SSML is part of a larger set of markup specifications for voice browsers developed through the open processes of the W3C. It is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in Web and other applications.

Possible Scenarios of SSML Implementation : This is very useful in a multilingual app where you need to implement Text to Speech of the content in different languages. Also it provides high level control over the grammer, choice of language, voice of male or female etc. with the help of tags defined in SSML.So let’s see a simple demo of incorporating SSML in Windows Phone 8, Then how you will use that in your app, Its your call !

Namespaces :

using Windows.Phone.Speech.Synthesis;

Design (XAML) :

<phone:PivotItem Header="SSML" DoubleTap="LoadSSML">
                <TextBlock x:Name="TTSSSML" HorizontalAlignment="Left" Height="500" Margin="33,26,0,0" TextWrapping="Wrap" VerticalAlignment="Top" Width="389"/>
</phone:PivotItem>

C# Code :

private async void LoadSSML(object sender,RoutedEventArgs e)
{ … }

I am using an async method here which have 2 parts, First will just display the Text on the Textblock and second part will actually reading of that SSML markup using Speech Synthesizer, Here is the first part :

//Speech Synthesis Markup Language for Display
          TTSSSML.Text = @"<speak version=""1.0""
           xmlns=""http://www.w3.org/2001/10/synthesis"" xml:lang=""ja-JP"">
           <voice gender=""male"">       
               趣味は日本語を勉強することです
               趣味はいろんな新しい食べ物に挑戦することです
               パソコンいじりが得意なので、何か手伝えることがありましたら声をかけて下さい。               
           </voice>                       
           </speak>";

Here you can see the SSML Markup, I agree, I am not SSML Expert and I have taken this piece of SSML tags by doing some research over internet and I spend little time to convert it to Japanese (I actually can read and write Japanese :) ..its a different story ) instead of keeping it in simple English. In your scenario all you need to do is change the “ja-JP” attribute to your own language like en-US etc and try out with that specific language content.You can also change gender to male or female with <voice gender=”<value>> attribute. All assumption is you have Speech enabled on your phone and also you have marked or enabled Speech in manifest file as I have demonstrated in my first article. Then rest is just routine coding nothing else.Now I am showing part two of this snippet, After looking at it, you will realize that I hardly making any changes here :

//Actual Speech in Japanese Language using SSML
            var ttsJP = new SpeechSynthesizer();
            await ttsJP.SpeakSsmlAsync(@"<speak version=""1.0""
            xmlns=""http://www.w3.org/2001/10/synthesis"" xml:lang=""ja-JP"">
            <voice gender=""male"">       
                趣味は日本語を勉強することです
                趣味はいろんな新しい食べ物に挑戦することです
                パソコンいじりが得意なので、何か手伝えることがありましたら声をかけて下さい。
            </voice>                      
            </speak>");

All set ! Now just press F5 and Enjoy ! here are few screenshots if you are trying to visualize how it will look on device.

In English version of SSML :

SSML

In Japanese version of SSML

JPSSML

That’s all ! Hope you like this part, Till now in both parts we actually saw Text To Speech Capability in a nutshell, In my next article which might be last in the short speech capability series, I am going to talk on Speech To Text. Post these parts, I will move to Maps for a while and then will come back with few more interesting and deep dive articles.Till then..enjoy Windows Phone 8

Vikram.

Introduction to Speech Capabilities in Windows Phone 8 – Part 1

After a long..I am writing blog, I hope and I wish I will resume blogging like I use to in past. Lots of things happened in past few months. I changed my job,got married and what not ! Well, Life !

Today I am going to share few things about Speech Capabilities in Windows Phone 8, Although I haven’t talked about it in past for Windows Phone 7-7.5 just because there were lots of limitations in this area in terms of APIs and Accuracy as well.With Phone 8 things are totally different. Earlier till 7.5 it was totally dependent on Bing Service which has to be online and network or internet connection was mandatory to have.Now it works offline without having any data/internet connection. Thanks to Microsoft for this improvement. Little Thanks to Microsoft MVPs like me ! (little pat on back)..surprised? Well I was a volunteer and part of a Secret mission, Proud to get associated with it, Although our contribution was small compare to efforts taken by Microsoft Product Group Members but it was got recognized in recent Microsoft TechEd 2013 at Pune, India by Sanket Akerkar,Managing Director, Microsoft India at Microsoft.

SanketAkerkar

Well, Lets come back to main topic, So I am actually planning to write a big article but now plan to break it in few,So today let’s build a Hello World type App to understand TTS (Text To Speech) Capabilities.

Initial Work :

Open a brand new Windows Phone Project from Visual Studio 2012

Open

Choose Windows Phone OS 8.0

OSChoice

Design :

<phone:Pivot Title="Speech Capability">
            <!--Pivot item one-->
            <phone:PivotItem Header="howdy">
                <Button x:Name="TTSHowdy" Content="Hello World !" HorizontalAlignment="Left" Width="456" Height="87" VerticalAlignment="Top" Margin="0,82,0,0" Click="TTSHowdy_Click"/>
            </phone:PivotItem>

</phone:Pivot>

I am actually putting it in a Pivot Navigation as I wish to demonstrate couple of more features of Speech within a single app, In your design you can very well change the layout.

Namespace Required :

using Windows.Phone.Speech.Synthesis;
using Windows.Phone.Speech.Recognition;

C# Code :

private async void TTSHowdy_Click(object sender, RoutedEventArgs e)
        {
            var TTS = new SpeechSynthesizer();
            await TTS.SpeakTextAsync("Welcome to Microsoft TechEd India 2013 in Pune");
        }

So SpeakTextAsync basically an async method which take 2 parameters as Content and Content and ObjectState. So similarly we can pass big string or textblock data to this method so that it will speak the content for you with the default voices installed on your phone.

Here is the output : (On actual device/emulator, you can hear the Sound )

howdy

Now after this Hello World, Lets build another Pivot which will display as well as play all the voices installed on your phone. To showcase this, I am making use of “Long List Selector” on my UI.

Design :

<phone:PivotItem Header="voices" DoubleTap="LoadTTSAllVoices">
               <phone:LongListSelector  x:Name="llstNames" HorizontalAlignment="Left" Width="456" Height="232" VerticalAlignment="Top" Margin="0,3,0,0"/>              
           </phone:PivotItem>

C# Code :

List<string> lstVoices = new List<string>();

private async void LoadTTSAllVoices(object sender, RoutedEventArgs e)
       {           
           //Get all the Voices
           foreach (var voice in InstalledVoices.All)
           {
               lstVoices.Add(voice.DisplayName + ", " + voice.Language + ", " + voice.Gender);
               using (var text2speech = new SpeechSynthesizer())
               {
                   text2speech.SetVoice(voice);
                   await text2speech.SpeakTextAsync("Hello world! I'm " + voice.DisplayName + ".");
               }

               llstNames.ItemsSource = lstVoices.ToList();
           }
       }

Basically, This async methods loops over collection of Voices installed and add each one to the List<T>. So once the voice is picked and set in the SetVoice Method, We can then use the same method SpeakTextAsync which we used above to read the text content.So after reading via each of the voice, We add the voice reader information to a List<T> and bind it further to Long List Selector. So it reads the content and add each voice to the list one after the another.

Here is the Output : (On actual device/emulator, you can hear the Sound )

voices

So that all I want to cover in Part –1, I will post another interesting stuff in upcoming parts, I am actually planning to post 2-3 more.Meanwhile you can try this and check the point to remember or conclusion :

1. Your PC/Laptop Speakers should be on to experience the voices coming out

2. There is no separate SDKs or Tools to be installed, These Speech APIs comes by default with the Phone SDK.

3. You need to Turn On Microphone and Speech Capability option from WMAppManifest.xml like this :

Capabilities 

So that all I want to cover in Part –1 , I am already in progress for Part 2 and expect few more deep dive stuff on Speech Capabilities in coming parts as we progress. Do enjoy and try out the above capabilities and feel free to share your feedbak.

Vikram.