AI & Local News Challenge, Overtone
What can you, as a person or even as a publisher, know about an article online? When you have a url in front of you, there is little to tell you what is actually inside.
The data that does exist on an individual article is often focused on “derivative” metrics that don’t explore what the piece is, but how people engage with it. Some examples are time on page, click through rates, and shares.
These engagement metrics are the driving force behind huge parts of the internet economy, from social media algorithms that surface content to users in order to maximize them, to the calculations that bring ad revenue to news outlets. But from my time as a journalist, I know these metrics don’t serve news publishers well, especially local news. My co-founders Philip Allin and Reagan Nunnally, who have experience in tech and in advertising, know that a focus on vanity metrics also plagues the wider media ecosystem.
Here’s the problem: fundamentally those types of metrics are divorced from the actual experience of a reader reading an article.
Overtone has used AI and natural language processing to create metrics based on the text of an article itself, rather than things such as the clicks, what website it is on, who wrote it or who shared it. Our Overtone scores are based on the editorial differences between articles, differences that I knew during my time as a journalist and most editors would understand.
Our algorithm “reads” the text of a given article and looks for signals such as original reporting instead of aggregation, interviews rather than copying tweets, use of data, and exploration of different angles to a story. This creates a primary metric that can instantly tell whether an article is one with little added news value, (think a rant-y blog, recipe, or quick status update) or a deeply reported enterprise piece where the journalist may have spent weeks gathering and verifying facts – or somewhere in between.
This data we generate is separate from all the other facts like domain, author and engagement metrics. Because it comes from the text, it is also available the very second an article is published (or even in a CMS), meaning that it can be used immediately to decide where to distribute an article. The easiest way to do this is by connecting to the Overtone API, which can send our data anywhere in a publisher’s system. The Overtone score can act as another column on a spreadsheet that a journalist can use along with other data. For example, you can instantly select all the in-depth articles about sports in the last week and put them in a newsletter, or choose to automatically keep a certain type of article outside of your paywall. For a less tech-y product, we also offer visual “feeds” of articles curated for what a client wants to see from their own or other publishers’ articles.
Why is this useful? Because it allows publishers, without having to spend hours of time reading and re-reading their journalists work, to discover it, productize it and monetize it. In a recent case study with a publisher, we found that over several months, two-thirds of their articles that led to a conversion to a subscriber were ones with our higher Overtone scores. Not all articles should be of the same type, of course, though this suggests that readers, too, understand the editorial differences between articles and may want to subscribe because they value a certain type.
We are proud of the work we have done with our first model, in English, though with the help of NYC Media Lab we have been able to expand into our second language, Spanish. Algorithms based on engagement metrics risk doing more harm than good for local news. The same holds true for Spanish-language news and other marginalized communities in the U.S. Overtone wants to serve all communities, so this spring we brought Natalia Gutiérrez aboard as our Spanish language editor and made a proof of concept of our model in Spanish. We are pleased to announce that we have started to see results that approach the performance of the English model.
This enables us to highlight more great journalism that otherwise might not get the attention it deserves, based on the traditional metrics in use. A better, automatic understanding of what sorts of articles are in a publisher’s system will also help them provide more quality information to their communities. We hope this will lead to more sustainable journalism businesses, and ultimately a better, more sustainable internet.
A five person team from diverse backgrounds, we have all seen the need for new solutions to sort through articles online. Christopher and Natalia come from journalism, with experience in reporting as well as audience engagement. Philip and Reagan have spent time both at startups and marquee brands, where they have seen the limitations in the current approach as well as the promise of new innovation. Marley is an engineer with a strong background in machine learning. Through decades of experience they have seen the issues that plague the media ecosystem and have created a primary datapoint, rather than another derivative metric, that can help build sustainable news businesses.