The Algorithms are Hangry

Lots of articles (some with shiny infographics) will tell you about how much data we’re now creating, and how it’s increasing at a stunning rate every year. And yet, it’s still not enough data to make algorithms actually useful most of the time.

When I first started talking with people in the content industry about what was happening with semantic technology about a decade ago, occasionally people wondered with concern if things like natural language processing and artificial intelligence were going to make human content professionals obsolete.

My feeling at the time was “not any time soon.” These technologies seemed useful for assisting people, especially for managing data at scale, but they were always going to need to be guided and tweaked by people.

The basic expectation that most content professionals have is that algorithms will help us understand what people are interested in, and this information will be used to dynamically serve up more content that will be of interest. Some organizations may even use this information to guide content creation. Ideally, smart systems will even provide some level of assistance in producing that content.

The bots are coming!

There have been many examples reinforcing that this tech-driven intelligent content ecosystem is not quite there yet. Some are fascinating, artsy experiments, like Sunspring, a science-fiction movie written by an AI. Or paint colors created by a neural net. Or funny examples like image recognition APIs that can’t distinguish between blueberry muffins and chihuahuas. And most of us have probably played silly games with our phone’s autocomplete feature at some time or another.

My gender is the main reason I thought you were going to send me a picture of the Fishermen.— Rachel Lovinger (@rlovinger) December 29, 2018

Then, there are less benign examples. Google Photos excluded “gorilla” from it’s possible tags after learning that it’s API was applying the term to photos of black people. Microsoft shut off a chatbot after the Internet taught her to be racist in less than 24 hours. A later iteration, designed to block conversations about potentially volatile topics, had it’s own set of shortcomings. YouTube purged a whole bunch of content and channels after James Bridle wrote about the vast number of creepy and alarming children’s videos that appeared to be both created and recommended based on loopholes and misuse of bots and algorithms.

But, I’m not here today to go down the rabbit hole of horrifying, pre-apocalyptic examples of AI gone wrong. I’m not even here to talk about the trashy link-bait promos that have infected most online journalism like a plague. I want to talk about how, even in their most mundane, limited functions, algorithms just aren’t hitting the mark as much as I’d have expected them to by now.

The bots are boring!

My complaint is with Google Now. My Android phone knows more about me than any technology really should. It knows what I search for, it knows where I go, it reads my email and knows (among other things) what movie tickets I bought. So, in theory, it should be able to show me some interesting things in the daily feed. I mean, specifically interesting to me, based on my actual interests.

Sometimes it works. It showed me an interview with Wim Wenders about the recently remastered “Wings of Desire” after I bought tickets to see the movie. That was a pretty cool article that I wouldn’t have even guessed was available. But generally the feed is roughly 80-90% things that are completely uninteresting to me.

To some degree, this is because I do a limited set of things on my phone, even within the larger realm of things I do online. For example, I always look up Fortnite hints and maps on my phone because my computer is too far away when I’m in the living room using the Xbox. So, my phone obviously thinks I’m a huge Fortnite fan and it now constantly shows me news updates, leaks, and articles about fan suggestions for the game.

I also wonder if there’s some kind of crossed-signal demographic effect going on (“people who like Fortnite also like XYZ”), because my feed recently included a string of stories about various football figures, even though I have never once shown any interest in football in anything I’ve done online. I had to manually mark a whole bunch of topics as “not interested.”

However, the deeper source of failure really seems to be that there isn’t the volume of unique, high-quality content out there to meet the need that Google is trying to fill with this tailored feed. When I first started using Google Now, I noted that there were a lot of cases where I would read an article and then it would show me “similar” articles which were really just summaries that other sites had written of the original article.

Lately I’ve noticed a different trend, which I’m sure is also influenced by these algorithms and metrics. While Google has previously gone to great efforts to cut down on content farms, it has also created an appetite for nutritionless content. And there are plenty of sources ready to jump in and fill that hungry void.

For example, let’s take Avengers: Infinity War. I was very interested in seeing this movie, but I didn’t particularly read a lot about it. I probably looked up the release date at some point before it came out, watched the trailer when it was released, and then bought tickets to see it in a theater. After seeing it, I looked up the expected release date for the sequel. It’s possible (even likely) that I did all of these things on my phone.

Since the movie came out, last April, my phone has shown me content about it every single day. At first it was explanations of the ending, and analysis of the poster showing how it secretly contained spoilers for what happened in the movie. But it very quickly became a stream of non-stop speculation, fan theories, hints, spoilers, and occasionally legitimate news about the sequel (which is coming out this coming April).

I cannot tell you how uninterested I am in almost all of this. I definitely didn’t want to read about Avengers for an entire year between movies. I have zero interest in fan theories that explain some speculative aspect of the sequel. Sure, it ended with a dramatic cliff-hanger, but I just want to quietly go about my business for a year and then go see part 2 when it’s ready and I can enjoy the culmination of 10+ years of Marvel Cinematic Universe storytelling. Sure, I could mark this topic as “not interested” but that’s not the case. I am interested in it. Just not to that degree, and not wild speculation and rumors.

Maybe if Google Now knew more about my other interests, the feed would be more balanced. But my guess is that this topic, being broadly popular, has a more steady stream of source material than the obscure “long tail” topics I’m interested in.

So, is the failing with the algorithms, or is the failing with the sources of content? Or is it some kind of dysfunctional way that they learn from and influence each other? In all of the examples described above, from spectacularly disturbing to humdrum disappointing, the problem seems to call for the capability to course correct, to monitor the algorithms and tune them to be more discerning. That gets into some very subjective areas that our AIs are not quite read for. More likely, we will just keep feeding them whatever they demand and hope for the best.

Archive: Resources, October 2009

I realized recently that I haven’t updated the Resources page in three years. Obviously, there are a lot more recent, more interesting resources out there now. So many, in fact, that I should probably entirely replace what was there. But I do want to retain that info, so here it is in a post. Refreshed Resources page, coming soon.

[As of 10/7/09]

I gathered the following resources to be a handout to go along with my “Content Gone Wild!” talk at the MIMA Summit 2009. These articles and sites support the Content Strategy practices discussed in the examples from the presentation. These are not the only resources, and they’re not necessarily the final word on these topics, but they should provide some good information and get you started with practical techniques and tips.

General Content Strategy Resources

Research

Content Assessment

Writing for the Web

Voice / Tone

Taxonomy & Metadata

Social Media Strategy

Corporate Blog Strategy

Globalization

Looking for Taxonomy & Metadata Resources?

Here are some resources I gathered about metadata, taxonomy and ontology data.

Glossaries

Making a Business Case

Working with Existing Data

  • Thousands of OWL documents are indexed in Google. Add “filetype:owl” to your search and see what comes up.
  • Piggy Bank – open source tool for scraping data from a website.

Prototyping – test it out

  • MindJet® MindManger® – commercial mind mapping software
  • FreeMind – open source mind mapping software
  • Bubbl.us – an online brainstorming tool
  • TopBraid Composer™ – a commercial tool for building ontologies and semantic web applications. TopBraid Ensemble™ adds a layer that makes it more user-friendly for content providers, and may also be useful in prototyping.
  • Protégé – an open source ontology editor
  • Knoodl.com – a semantic wiki, combining collaborative editing with ontology models

Shared Knowledge – join a community

Information Design

Announcing: Nimble!

Since the beginning of the year I’ve been researching, writing, and editing a report called Nimble: A Razorfish report on publishing in the digital age. It launched this week, and so far the response has been really great. I’ve written about it over on Scatter/Gather and you can view or download the report itself at http://nimble.razorfish.com. There’s even a Twitter account for it (@NimbleRF).

In June I’ll be doing a presentation about report at the Semantic Technology Conference in San Francisco. And there will be other presentations and developments in the coming months.

What else? I wrote a couple other pieces for Scatter/Gather:

And I’m helping to organize two interesting events for Internet Week next week:

Hope to see you there!

SXSW Panel Picker: Please Vote!

Vote for my PanelPicker idea! This year I’m determined to present at SXSW. To that end, I’m involved in five (5!) proposals. Two of them are talks, and the rest are panels submitted by other people that, SXSW-gods willing, I will be participating in.

SXSW likes to have the community get involved in deciding what panels will be chosen for the conference, so they use this Panel Picker to let people indicate which ones are of greatest interest. It’s free and easy to register to vote, so please consider voting for these proposals:

While you’re in there, here are some other really interesting panels by some of my friends and colleagues. Please consider voting for these as well!

There are many others that will probably be amazing, and I haven’t even touched on all the ones about the Semantic Web (will have to write a separate post for that), so get started voting now – you only have until September 4th!

Semantic Web for Publishers

When I got back from the Semantic Technology Conference last month, I helped my colleague, Domenic Venuto, write a piece for MinOnline about the things magazine publishers should know about the Semantic Web. I summed up some of the most relevant presentations at SemTech this year, and why I think these things should be important to publishers. Domenic put it all into the context of the work we do with our Media and Entertainment clients, and we worked together to try to express why they should really get moving on this stuff now!

After the article came out, Semantic Universe posted video from a lot of the talks that I mentioned. Very interesting, if you want more detail:

Semantic Web takes root at the IA Summit

At the recent IA Summit, I was surprised and delighted to see how many talks there were about the Semantic Web. Before this emerging technology can really catch on, we will need more Information Architects and Interaction Designers who understand the potential and can design elegant solutions to real problems (both user problems and business problems). In some ways, I wish the conversation were further along, but I realize that it has to start somewhere. The fact that the subject exploded onto the scene in such a big way is a good indication that Web 3.0 is on a lot of people’s minds. 

These are the talks I saw: Continue reading “Semantic Web takes root at the IA Summit”

Semantic Web for Dummies

Jeff Pollock has just released a book called Semantic Web for Dummies. Over at Semantic Universe you can download a free chapter (registration required), order the book, or read Jeff’s blog posts. I haven’t read the book yet, but Jeff is a really smart person with the ability to speak plainly and compellingly. This book is bound to be useful for people who are trying to understand the Semantic Web, or are still struggling with how to explain it to others. I just put my copy on order.