Do you know someone that would make for a great Alexa Voice? As much as we enjoy a good Joke from Alexa or a quick update on the weather, her voice playback doesn’t always align in providing the unique brand experience you might want to create. Fortunately, the platform with the right voice talent capabilities allows you to substitute Alexa’s voice for something a bit more...
Customizing Alexa’s Voice
Recently, we’ve worked with our great partner at Cecilia.fm to build out a new skill for a client that is uniquely voice-branded. As part of the project, we thought we share a couple of tips for changing Alexa’s voice.
Write out your complete script and build it with Alexa’s voice first
While writing out your script first seems obvious, building it using Alexa’s native voice first is a must. It will save you time in the long run and help you to plan out each transition in the conversation, so you can build out a very complete Voice asset list.
Don’t forget the unexpected while you have your Voice Talent
Outside of your expected dialogues in your conversation map, don’t forget that errors can occur during a conversation if you don’t have the correct Audio files. Consequently, this will make for an awkward experience transitioning from your voice actor to Alexa voice and back. While cutting audio in the studio, make sure you account for the “Oops”, “That wasn’t supposed to happen”, and all those unexpected awkward moments.
Keep your response pieces under 90 seconds
The only way you will be able to change out of Alexa’s voice is to use the SSML <audio> tag. The amount of SSML Audio playback for a given response is a hard limitation of 90 seconds. Likewise, even though you can use multiple <Audio> tags in a single response (up to 5), the sum of all that audio cannot be greater than 90 seconds.
Using Audio over 90 seconds
If your skill requires Audio that is longer than 90 seconds, you need to use the Audio Player inside your code and your user will be exited out of the skill once the audio finishes. Unfortunately, this is a limitation to the Audio Player for now. If you need to use the Audio Player, it may be helpful to let the user know that they are leaving the skill at the end of the Audio file that plays. Also it is a good practice to kindly remind them on how to get back into the skill to encourage repeat visits.
Using SSML in Slot Filling
If your skill needs to use Slots to capture key data from your users, you can use SSML <Audio> in Alexa Speech Prompts for Slot Filling and Slot Confirmation. However, there can be issues with the SSML formatting if you are entering it using the Alexa Skill Kit interface. We recommend you use the JSON Editor to make the changes in the JSON code directly to ensure the SSML tag is formatted properly. This approach also allows you to make sure you are escaping characters properly to prevent any extra spaces in between you Slot variables and the SSML tag.
Snippet of Code
For example, here’s a snippet of a JSON file where we used SSML to playback a custom Audio file based on a prior input from the user. In the skill, we ask the user for the length of time they would like and store it in the variable . We then use that variable as part of an URL construct to call a custom audio file.
Creating a Unique Voice Experience
Leveraging a custom voice can create a very unique and engaging experience for your end-users. The effort to do so requires a bit more planning and some massaging of the code, but it is worth it at the end. The developer forum at Amazon is a great resource to leverage and we’d be happy to help you if needed. Feel free to drop us a line.
Owner: Fabian Hurnaus
Edited with Adobe XD