flickr – Meaningful Data

A recent Pew report states that 47% of Americans have looked themselves up on Google or some other search engine. I like to practice a more specific form of vanity search – I use a couple of blog search sites to see where my flickr photos have been used. All my photos are posted under a “by-nc” Creative Commons license which states that they can be reused for non-commercial purposes, provided that I’m giving credit for the image.

I’d like to point out that I’m not doing this vanity search to check up on people, I just really enjoy seeing how my photos are used. It’s fun to see my images of celebrities, New York landmarks, comedians, or even something as mundane as cherry blossoms adorning someone’s blog post. Generally the authors attribute the photos to my screenname and link back to the originals on my flickr account. Some people even get in touch with me and ask for permission beforehand, though that’s not required with the CC license. I don’t really even pay much attention if the site has Google ads, even though, strictly speaking, that qualifies it as “commercial” usage.

There’s still the occasional shock, though, and the latest one is so deeply ironic I can’t really even comprehend it. I discovered a photo I took of the IAC building that was used, without any attribution at all, on a website called paidContent.org. Despite the “.org” in the name, this is clearly a commercial website. On the page that contains my photo, I counted 16 ad spots, 3 sponsored channels, and 3 calls to action to “ADVERTISE ON PAIDCONTENT.” On top of that, the article is about a Pre-Conference Reception being held by paidContent.org at the IAC building, and it features a prominent button inviting readers to buy tickets to the conference (plus 8 additional links to sponsors of the conference). So, not only is my photo being displayed on a commercial site, but it’s directly being used to promote the sale of tickets to an event (well, it was – the event is now in the past).

The most troubling part about this is that these are not people who can claim ignorance. This issue falls right in their area of expertise, which according to their site is “global coverage of the business of digital content.” Plus, their own website is published under a “by-nc-sa” CC license, which means it has the same restrictions as my photo, but also that if you reuse the content, you can only do so under the same license (in other words you can’t put their content into your own work and then copyright it).

Although, maybe they don’t understand Creative Commons as well as I would expect them to, because right under the message that says “This work is licensed under a CreativeCommons License.” there’s another message that says “Copyright ContentNext Media Inc. 2002—2007.”

Being, as I am, a Content Strategist and a consultant, paidContent.org seems like it would be exactly the type of organization that would appeal to me. But this experience makes me question how they could possibly claim to be experts in the field of digital content.

The first talk I was able to attend at Hack Day was by Flickr’s Aaron Straup Cope and Dan Catt, about Machine Tags. I’m really interested in this because it adds another layer of metadata to tags, allowing them to be read by machines. I’ve heard them described as triples, and in a way I suppose that’s true, but these are not like RDF triples. Basically, a machine tag consists of a namepace, a predicate, and a value organized in a certain syntax. It’s pretty simple, but should allow services to make use of the additional data pretty easily. I scribbled a note on my paper that says:

folksonomy :: taxonomy
machine tags :: RDF

That’s a simplification, of course, but it seemed to be a good way to describe the relationship. The two main issues that will affect the adaption of machine tags are:

What can you do with them? I think the answer to this one is pretty wide open. You can make apps that will use machine tags to express relationships between content, people, etc, and trigger all kinds of behaviors. Flickr’s API lets you query machine tags, and basically what you do with it is just limited by your imagination.
Where is the data coming from? The question is, though, will anyone aside from you, be adding the kind of machine tags that will make your application work? This is really two questions.
1. What’s going to make me go back into nearly 1500 photos and add more tags to them? Something needs to be done to make this a little easier or people will never do it. I’m a pretty dedicated information geek. I’ve spent an hour disambiguating two names on Wikipedia. But I’m already dragging my feet adding my backlog of Flickr photos to the map, I can’t see sift through all of them again.
2. Even if I do, what’s to say my machine tags will be compatible with your application? Do we need some kind of standards? Or are we expecting people to add new machine tags to their content for each application they want to contribute to?

Clearly, what’s needed is something that will assist users by automatically generating suggested machine tags that they can then revise, approve, or decline. Interesting things to think about at Hack Day…

One of the big winners of the day was a hack that used machine tags – Flickr Tunes by Steffan Jones. Basically, it was a Mac OX widget that used the BBC Muiscbrain database (I think) and the Flickr API to match machine tagged photos with a song. So, if a person took a photo that they felt illustrated a particular song, and they used the appropriate machine tags to capture the song name (and even a time code), then the images would display while the song played, as a sort of slide show, even keying to the specific moment in the song, if indicated.

Pretty cool, but as I mentioned, how much data would need to be entered to make it a valuable experience for fans of all different kinds of music?

Meaningful Data

Tag: flickr

Creative Commons is not “carte blanche”

Hack Day: Machine Tags