Alexa Voice

Notes from Voice Summit 2019

I spent some of this past week at the massive Voice Summit 2019.  Thousands attended to learn and share their experience in the Voice space.  Attendees and speakers were all from a broad set of backgrounds. A lot of the Summit’s speakers and workshops were given by the companies that provide smart speakers and digital Assistants, like Amazon with Alexa and Google.  In addition to helping to market the event, Amazon Alexa staffed a lot of Workshops and Breakout sessions, most of which were aimed at helping developers hone their skills in the voice space. Never used AWS services to boost your skills feature set? They held sessions to walk attendees through exactly what to do.  Samsung's Bixby and Microsoft’s Cortana also made appearances at the summit, by also hosting workshops and announcing new feature releases on the big stage.

It appears that the major Voice Platforms are going about their devices differently. However, one big takeaway from the event is the area that all the platforms seem to agree on. These devices require serious computing services.  Between storage, compute power, scalability, etc. the main platforms agree, you need a strong back-end to make a powerful voice application. Amazon, through their AWS services, Google through its “Cloud Computing” and Microsoft through Azure, all offer options to support an amazing voice experience.  It became a major theme of the Summit. Voice is able to work because of a culmination of technology. Great 4G or WiFi allows the voice devices to communicate with servers in the cloud to quickly give the user helpful information. It is the expertise of all these things that make the most compelling voice applications. This was one side of the conference, the other was all about making great dialog.

The consensus seemed to be, in order to make a compelling voice application, you need great computing, and you need great dialog. Voice applications are easy enough to make and implement; they are hard to make helpful and sticky.  On Alexa alone, a massive amount (over 90%) of skills are used once by a user, and they never return. The emphasis was put on the computing, you need useful information and tasks for a compelling skill, but more than that, you need to get the user in a compelling way.  The voice interaction needs to be quick and to the point.

Alexa and other devices allow for you to change the native assistant’s voice and replace it with actual human voice overs. The number of voice actors at the Summit was quite astonishing.  I counted at least 5 sessions that were focused on voice acting for the voice assistant age. Speakers talked about how the use of human voice differentiated their voice application and made it feel more natural.  

Many other speakers talked about how they use context to help speed up their voice experiences.  By remembering users, skills can quickly repeat actions without re-entering data. Every time I check the weather, I shouldn’t need to tell Alexa where I am.  Voice applications, that take advantage of this have succeeded. Voice applications even allow for third party integration. Want to know the balance of your bank account? Use your username and password once, then get the answer quickly every time you ask.  Similar to other Applications driven by User Experience, voice applications that are able to reduce customer friction have found the most success to date. Just because its called Conversational UI doesn’t mean you have to architect a half hour gab session just to get some basic information from your device.  Interactions designed for efficiency are getting the most utilization.

Find out more about how Voice could help you by reaching out to our team Alexa Subject Matter Experts, VUI designers and Developers:

https://www.bluefintechnologypartners.com/voice-interface-development

5 Tips for Giving Alexa a New Voice

BlogPic.png

                       Do you know someone that would make for a great Alexa Voice? As much as we enjoy a good Joke from Alexa or a quick update on the weather, her voice playback doesn’t always align in providing the unique brand experience you might want to create. Fortunately, the platform with the right voice talent capabilities allows you to substitute Alexa’s voice for something a bit more...

Customizing Alexa’s Voice

Recently, we’ve worked with our great partner at Cecilia.fm to build out a new skill for a client that is uniquely voice-branded. As part of the project, we thought we share a couple of tips for changing Alexa’s voice.

  1. Write out your complete script and build it with Alexa’s voice first

While writing out your script first seems obvious, building it using Alexa’s native voice first is a must. It will save you time in the long run and help you to plan out each transition in the conversation, so you can build out a very complete Voice asset list.

  1. Don’t forget the unexpected while you have your Voice Talent

Outside of your expected dialogues in your conversation map, don’t forget that errors can occur during a conversation if you don’t have the correct Audio files. Consequently, this will make for an awkward experience transitioning from your voice actor to Alexa voice and back. While cutting audio in the studio, make sure you account for the “Oops”, “That wasn’t supposed to happen”, and all those unexpected awkward moments.

  1. Keep your response pieces under 90 seconds

The only way you will be able to change out of Alexa’s voice is to use the SSML <audio> tag. The amount of SSML Audio playback for a given response is a hard limitation of 90 seconds. Likewise, even though you can use multiple <Audio> tags in a single response (up to 5), the sum of all that audio cannot be greater than 90 seconds.

  1. Using Audio over 90 seconds

If your skill requires Audio that is longer than 90 seconds, you need to use the Audio Player inside your code and your user will be exited out of the skill once the audio finishes. Unfortunately, this is a limitation to the Audio Player for now. If you need to use the Audio Player, it may be helpful to let the user know that they are leaving the skill at the end of the Audio file that plays. Also it is a good practice to kindly remind them on how to get back into the skill to encourage repeat visits.

  1. Using SSML in Slot Filling

If your skill needs to use Slots to capture key data from your users, you can use SSML <Audio> in Alexa Speech Prompts for Slot Filling and Slot Confirmation. However, there can be issues with the SSML formatting if you are entering it using the Alexa Skill Kit interface. We recommend you use the JSON Editor to make the changes in the JSON code directly to ensure the SSML tag is formatted properly. This approach also allows you to make sure you are escaping characters properly to prevent any extra spaces in between you Slot variables and the SSML tag.

BlogPic2.png

Snippet of Code

For example, here’s a snippet of a JSON file where we used SSML to playback a custom Audio file based on a prior input from the user. In the skill, we ask the user for the length of time they would like and store it in the variable {time}. We then use that {time} variable as part of an URL construct to call a custom audio file.  

Creating a Unique Voice Experience

Leveraging a custom voice can create a very unique and engaging experience for your end-users.  The effort to do so requires a bit more planning and some massaging of the code, but it is worth it at the end. The developer forum at Amazon is a great resource to leverage and we’d be happy to help you if needed. Feel free to drop us a line.

Cheers -

Jay

Photo Source: https://www.pexels.com/photo/black-amazon-echo-on-table-977296/

Owner: Fabian Hurnaus

Site: Pexels.com

Edited with Adobe XD