One small step for man, one giant leap for robot-kind? Well, sort of. After years, we are starting to see AI technology become more enjoyable to interact with. Amazon released five new language nuance enhancements to the Alexa API last week. The Semantic Syntax Markup Language (SSML) is what Amazon has defined for performing text-to-speech with the Alexa voice service. We are excited about these API updates, because they help us give Alexa more vocal character than the bridge computer on Star Trek. Here are the new Alexa SSML tags:
- Whispers — Convey a softer dialog
- Prosody — Controls volume, pitch, and rate of speech
- Expletive beeps — Bleep out words @#%!
- Sub — Have Alexa to say something other than what’s written
- Emphasis — Change the rate and volume of speech
You can learn more about taking greater control of Alexa’s speech on the Alexa Skill Kit SSML Reference page.
Contents
So, why are these additions so important?
When you are writing an Alexa Interaction Model, there’s more at stake than simply defining a human-computer interface that maps successfully to your back-end server configuration. All software applications are a human-computer interface. User Experience (UX) and its varied incarnations help build an application that is easy and intuitive to use, and Alexa should always have great UX. After all, I am not sure I define conversation with a digital assistant — a great experience. Making Alexa more and more human-like is the answer, but it doesn’t mean it gets less creepy (take a look at the “whisper” for example).
We couldn’t help but give each of them a little test drive.
Whisper
<speak> <amazon:effect name="whispered"> It really works. </amazon:effect> Don't you think? </speak>
“It really works” came out too fast, I think. Can we combine it with other SSML? Probably.
Prosody
Prosody controls the rate of speech (in rate, pitch, or volume) and takes a
few arguments.
<speak> <amazon:effect name="whispered"> It really works. </amazon:effect> <prosody rate="slow"> Don't you think? </prosody> </speak>
Composability: we can have two inside of a speak tag. Can they be nested?
<speak> <prosody rate="slow"> Don't you think? <amazon:effect name="whispered"> It really works. </amazon:effect> </prosody> </speak>
Great! We should expect this, as “Markup Language” suggests that the elements are composable. Good news, they are!
Expletive Bleeping
!@#%@
<speak> Frankly, my dear, I don't give a <say-as interpret-as="expletive">damn</say-as>. </speak>
Sub
Sub is crucial for the creation of Data-Driven language from Alexa:
string_map = {'a':'stop','b':'drop','c':'roll'} phrase = '<speak>Remember to <sub alias="{a}">a</sub>, \ <sub alias="{b}">b</sub>, and <sub alias="{c}">c</sub></speak>' for string in string_map: phrase = phrase.replace('{' + string + '}',string_map[string]) print(phrase)
<speak> Remember to <sub alias="stop">a</sub>, <sub alias="drop">b</sub>, and <sub alias="roll">c</sub> </speak>
Emphasis
<speak> <prosody rate="slow"> <amazon:effect name="whispered"> I have a secret to tell you. </amazon:effect> </prosody> Careful and thoughtful phrase construction is extremely <emphasis level="strong"> important </emphasis> to creating compelling Alexa voice experiences. </speak>
We are always excited to try out new things with Alexa. This isn’t Amazon’s first set of improvements to Alexa’s ability to better emulate real spoken word. Back in February, Amazon added “speechcons,” one-word idioms. And, as always, Amazon has more in the works. Each release gets us closer to a digital assistant users naturally interact with.
Want to try it out for yourself? Check out the API document.
If you’re still new to Alexa and all that it can do, check out our blog post “So, You Want to Add Alexa Control to Your Thing?“