For those of you that have been following me for a bit, you know I’ve been live-coding Alexa skill development on Twitch for a few months now. One of my favorite tools for development these days is Airtable, which is mostly just a cloud database, but its user interface, api, and documentation is one of the industry’s best. Seriously.
For many of the Alexa things I build on Airtable, whether that’s a trivia game, or just the speech data that I want to say to my users, I always come back to one fundamental problem:
Airtable doesn’t provide a way for me to retrieve a single random record from the database.
There are LOTS of ways for me to retrieve my data. Filter formulas, sort orders, and primary keys give me lots of options. I have access to created and modified dates for each record. If I know what I’m looking for, it’s incredibly easy. But if I want one random trivia question out of the hundreds I have in my table, I typically have to grab ALL of them, and choose one randomly from inside my software. This is a potentially huge, potentially slow process, and one I’m desperately trying to avoid, but when most of my tables have fewer than 100 records, it’s also not destructive. Yet.
Enter sound effects. The skill I’m building has a library of over 1000 sound effects that a user can request. Sometimes, they will say “play the cat meow sound effect,” and I can determine which one that is, retrieve the record, and play it for them. Other times, they will say “play a random sound effect,” and expect to hear something new. How do I solve this randomization problem? 1000 records is too many to grab every time I need to do this.
Here’s how I “solved” this. Each record has a unique identifier, which I call the RecordId. (You can see the entire dataset here.) Below are ten examples of the RecordIds:
They all start with “rec”, and then contain 14 additional random characters. Because I can apply filters to my queries, my first thought was to filter the records based on a single character. So, if I filter on “a” in this example, I’d end up with only two of the ten records above (#6 & #10). (It’s case insensitive.)
When I tried this against the entire dataset, it reduced the number of records I was pulling from 1,000 to around 350, pretty consistently. 350ish is better, but it’s really still way too many. In my next experiment, I tried two letter combinations. I assumed there would be cases, even in 1,000 records, where some two letter combination wouldn’t exist, but I was delighted to find that about 15 records matched every two letter combination as well. 15 randomly selected records is something I’m willing to accept. I can pick from one of those and give my user what they asked for. (Three letter combinations resulted in ZERO records most of the time.)
There are a couple of obvious exceptions to this rule: “re” and “ec” each bring back 1,000 records, since every record contains those letters (“rec”). So I just decided to eliminate “e” from my random alphabet pair generator.
Now, when I need a random record from this collection, I pass in filtering rules on the RecordId for a randomly generated pair of letters, and I get a very small collection of data to work with. It’s not perfect (I’d still love to be able to get just one), but this is FAR better than dragging 1000 records across the internet every time I need just one.
Now for the real question: How would YOU solve this problem? Mine is almost certainly not the best solution, it’s “a” solution.