Lots of articles (some with shiny infographics) will tell you about how much data we’re now creating, and how it’s increasing at a stunning rate every year. And yet, it’s still not enough data to make algorithms actually useful most of the time.
When I first started talking with people in the content industry about what was happening with semantic technology about a decade ago, occasionally people wondered with concern if things like natural language processing and artificial intelligence were going to make human content professionals obsolete.
My feeling at the time was “not any time soon.” These technologies seemed useful for assisting people, especially for managing data at scale, but they were always going to need to be guided and tweaked by people.
The basic expectation that most content professionals have is that algorithms will help us understand what people are interested in, and this information will be used to dynamically serve up more content that will be of interest. Some organizations may even use this information to guide content creation. Ideally, smart systems will even provide some level of assistance in producing that content.
The bots are coming!
There have been many examples reinforcing that this tech-driven intelligent content ecosystem is not quite there yet. Some are fascinating, artsy experiments, like Sunspring, a science-fiction movie written by an AI. Or paint colors created by a neural net. Or funny examples like image recognition APIs that can’t distinguish between blueberry muffins and chihuahuas. And most of us have probably played silly games with our phone’s autocomplete feature at some time or another.
My gender is the main reason I thought you were going to send me a picture of the Fishermen.— Rachel Lovinger (@rlovinger) December 29, 2018
Then, there are less benign examples. Google Photos excluded “gorilla” from it’s possible tags after learning that it’s API was applying the term to photos of black people. Microsoft shut off a chatbot after the Internet taught her to be racist in less than 24 hours. A later iteration, designed to block conversations about potentially volatile topics, had it’s own set of shortcomings. YouTube purged a whole bunch of content and channels after James Bridle wrote about the vast number of creepy and alarming children’s videos that appeared to be both created and recommended based on loopholes and misuse of bots and algorithms.
But, I’m not here today to go down the rabbit hole of horrifying, pre-apocalyptic examples of AI gone wrong. I’m not even here to talk about the trashy link-bait promos that have infected most online journalism like a plague. I want to talk about how, even in their most mundane, limited functions, algorithms just aren’t hitting the mark as much as I’d have expected them to by now.
The bots are boring!
My complaint is with Google Now. My Android phone knows more about me than any technology really should. It knows what I search for, it knows where I go, it reads my email and knows (among other things) what movie tickets I bought. So, in theory, it should be able to show me some interesting things in the daily feed. I mean, specifically interesting to me, based on my actual interests.
Sometimes it works. It showed me an interview with Wim Wenders about the recently remastered “Wings of Desire” after I bought tickets to see the movie. That was a pretty cool article that I wouldn’t have even guessed was available. But generally the feed is roughly 80-90% things that are completely uninteresting to me.
To some degree, this is because I do a limited set of things on my phone, even within the larger realm of things I do online. For example, I always look up Fortnite hints and maps on my phone because my computer is too far away when I’m in the living room using the Xbox. So, my phone obviously thinks I’m a huge Fortnite fan and it now constantly shows me news updates, leaks, and articles about fan suggestions for the game.
I also wonder if there’s some kind of crossed-signal demographic effect going on (“people who like Fortnite also like XYZ”), because my feed recently included a string of stories about various football figures, even though I have never once shown any interest in football in anything I’ve done online. I had to manually mark a whole bunch of topics as “not interested.”
However, the deeper source of failure really seems to be that there isn’t the volume of unique, high-quality content out there to meet the need that Google is trying to fill with this tailored feed. When I first started using Google Now, I noted that there were a lot of cases where I would read an article and then it would show me “similar” articles which were really just summaries that other sites had written of the original article.
Lately I’ve noticed a different trend, which I’m sure is also influenced by these algorithms and metrics. While Google has previously gone to great efforts to cut down on content farms, it has also created an appetite for nutritionless content. And there are plenty of sources ready to jump in and fill that hungry void.
For example, let’s take Avengers: Infinity War. I was very interested in seeing this movie, but I didn’t particularly read a lot about it. I probably looked up the release date at some point before it came out, watched the trailer when it was released, and then bought tickets to see it in a theater. After seeing it, I looked up the expected release date for the sequel. It’s possible (even likely) that I did all of these things on my phone.
Since the movie came out, last April, my phone has shown me content about it every single day. At first it was explanations of the ending, and analysis of the poster showing how it secretly contained spoilers for what happened in the movie. But it very quickly became a stream of non-stop speculation, fan theories, hints, spoilers, and occasionally legitimate news about the sequel (which is coming out this coming April).
I cannot tell you how uninterested I am in almost all of this. I definitely didn’t want to read about Avengers for an entire year between movies. I have zero interest in fan theories that explain some speculative aspect of the sequel. Sure, it ended with a dramatic cliff-hanger, but I just want to quietly go about my business for a year and then go see part 2 when it’s ready and I can enjoy the culmination of 10+ years of Marvel Cinematic Universe storytelling. Sure, I could mark this topic as “not interested” but that’s not the case. I am interested in it. Just not to that degree, and not wild speculation and rumors.
Maybe if Google Now knew more about my other interests, the feed would be more balanced. But my guess is that this topic, being broadly popular, has a more steady stream of source material than the obscure “long tail” topics I’m interested in.
So, is the failing with the algorithms, or is the failing with the sources of content? Or is it some kind of dysfunctional way that they learn from and influence each other? In all of the examples described above, from spectacularly disturbing to humdrum disappointing, the problem seems to call for the capability to course correct, to monitor the algorithms and tune them to be more discerning. That gets into some very subjective areas that our AIs are not quite read for. More likely, we will just keep feeding them whatever they demand and hope for the best.