Tips & Tricks for Playing Video with APL and Alexa

I am currently in the process of building an Alexa skill that contains all of the knowledge of the Star Wars Universe.  This includes characters, droids, weapons, vehicles, planets, creatures, and even different species and organizations.  It also includes the ability to request the opening crawl videos from each of the movies in the Star Wars saga, and the trailers for the movies, television shows, and video games.

It’s the videos that have brought me here to share what I have learned.

Alexa is available in a wide variety of devices.  Some small, some big, some with screens, others without.  For those devices with screens, I want to be able to provide my users with a simple workflow.

  1. Ask for a specific video.
  2. View the requested video.
  3. Continue the conversation when the video ends.

For the first two steps, this was surprisingly easy to implement using Alexa Presentation Language (APL.) . For the third step, it required some research and trial and error, but I have it working successfully now.

Identifying the Video a User Requested

While there is nothing complicated about identifying a user’s request, I’ll show you how I am handling this so that if you want to build your own version of this, you have everything you need.

In my Interaction Model, I have an intent called “CrawlIntent.”  This is there to handle all of the ways a user might ask to see the opening crawl of a specific film.  It looks like this:

{
  "name": "CrawlIntent",
  "slots": [
  {
    "name": "media",
    "type": "Media"
  }
  ],
  "samples": [
    "show me the {media} crawl",
    "{media} crawl",
    "can I see the {media} crawl",
    "show the crawl for {media}",
    "for the {media} crawl",
    "to show the crawl for {media}",
    "show me the {media} opening crawl",
    "{media} opening crawl",
    "can I see the {media} opening crawl",
    "show the opening crawl for {media}",
    "for the {media} opening crawl",
    "to show the opening crawl for {media}",
    "play the {media} opening crawl",
    "play the {media} crawl"
  ]
}

When a user says something to my skill like one of the utterances above, I can be confident they are looking for the opening crawl video for a specific film.  I also have a slot, called media that contains a list of all of the films and shows that I want my skill to be aware of.

{
  "values": [
    {"name": { "value": "Battlefront 2","synonyms": ["battlefront 2", "battlefront"]}},
    {"name": { "value": "Clone Wars","synonyms": ["the clone wars"]}},
    {"name": { "value": "Episode 1","synonyms": ["the phantom menace"]}},
    {"name": { "value": "Episode 2","synonyms": ["attack of the clones"]}},
    {"name": { "value": "Episode 3","synonyms": ["revenge of the sith"]}},
    {"name": { "value": "Episode 4","synonyms": ["a new hope", "new hope"]}},
    {"name": { "value": "Episode 5","synonyms": ["empire", "the empire strikes back", "empire strikes back"]}},
    {"name": { "value": "Episode 6","synonyms": ["return of the jedi", "jedi"]}},
    {"name": { "value": "Episode 7","synonyms": ["the force awakens", "force awakens"]}},
    {"name": { "value": "Episode 8","synonyms": ["the last jedi", "last jedi"]}},
    {"name": { "value": "Episode 9","synonyms": ["rise of skywalker", "the rise of skywalker"]}},
    {, "name": { "value": "Rebels","synonyms": ["star wars rebels"]}},
    {"name": { "value": "Resistance","synonyms": ["star wars resistance"]}},
    {"name": { "value": "Rogue One","synonyms": ["rogue one a star wars story"]}},
    {"name": { "value": "Solo","synonyms": ["han solo movie", "solo a star wars story"]}},
    {"name": { "value": "The Mandalorian","synonyms": ["the mandalorian"]}}
],
"name": "Media"
}
This slot allows me to match the user’s request against the list of items my skill can handle, using Entity Resolution.  This allows me to be certain that I’m choosing the right video for their request.

 

Playing A Video Using APL

For the code of my skill, I am using the Alexa Skill Kit SDK.  This makes parsing through the JSON that Alexa provides far easier, and gives me greater control over building responses for my users.

To add APL to my skill’s response, I do something like this:

var apl = require("apl/videoplayer.json");
apl.document.mainTemplate.items[0].items[0].source = media.fields.Crawl;
handlerInput.responseBuilder.addDirective({
  type: 'Alexa.Presentation.APL.RenderDocument',
  token: '[SkillProvidedToken]',
  version: '1.0',
  document: apl.document,
  datasources: apl.datasources
})
handlerInput.responseBuilder.getResponse();

Line #1 refers to the location of my APL document.  This document is the markup that tells the screen what to show.  Line #2 is dynamically updating the source of the video file to be played, so that we can play the appropriate video for the appropriate request.

As you’ll see in the APL document below, we define a Video element, and include a source property that indicates a specific URL for our video.

The important lesson I learned when building this is that I don’t want to include any speech or reprompts to my user in this response.  I can send this APL document to the user’s device, which immediately starts playing the video.  This is completely counter-intuitive to everything I’ve ever considered when building an Alexa skill, but it makes sense.  I’m sending them a video to watch…not trying to continue our conversation.

Adding an Event to the Video When It Is Finished

Finally, I had to do some exploration to figure out how to not only identify when the video has concluded, but also prompt my skill to speak to the user in order to continue the conversation.  This is done using the onEnd event on the Video element that we created earlier.  Here is the entire APL document.

{
  "document": {
    "type": "APL",
    "version": "1.1",
    "settings": {},
    "theme": "dark",
    "import": [],
    "resources": [],
    "styles": {},
    "onMount": [],
    "graphics": {},
    "commands": {},
    "layouts": {},
    "mainTemplate": {
      "parameters": [
        "payload"
      ],
      "items": [
      {
        "type": "Container",
        "items": [
          {
            "type": "Video",
            "width": "100%",
            "height": "100%",
            "autoplay": true,
            "source": "https://starwarsdatabank.s3.amazonaws.com/openingcrawl/Star+Wars+Episode+I+The+Phantom+Menace+Opening+Crawl++StarWars.com.mp4",
            "scale": "best-fit",
            "onEnd": [
            {
              "type": "SendEvent",
              "arguments": [
                "VIDEOENDED"
              ],
              "components": [
                "idForTheTextComponent"
              ]
            }
            ]
          }
          ],
          "height": "100%",
          "width": "100%"
        }
        ]
      }
    },
    "datasources": {}
}
This is the second lesson that I learned when building this.  By adding this onEnd event, when the video finishes playing, it will send a new kind of request type to your skill: Alexa.Presentation.APL.UserEvent. You will need to handle this new event type, and prompt the user to say something in order to continue the conversation. I included the argument “VIDEOENDED” so that I’d be confident I was handling the appropriate UserEvent. Here is my example code for handling this:
const VideoEndedIntent = {
  canHandle(handlerInput) {
    return Alexa.getRequestType(handlerInput.requestEnvelope) === 'Alexa.Presentation.APL.UserEvent'
    && handlerInput.requestEnvelope.request.arguments[0] === 'VIDEOENDED';
  },
  handle(handlerInput) {
    const actionQuery = "What would you like to know about next?";
    return handlerInput.responseBuilder
      .speak(actionQuery)
      .reprompt(actionQuery)
      .getResponse();
  }
};
With these few additions to my Alexa skill, I was able to play videos for my users, but bring them back to the conversation once the video concludes.
Have you built anything using APL?  Have you published an Alexa skill?  I’d love to hear about it.  Share your creations in the comments!