Chartbeat Blog

What We Talk About When We Talk About Iran

How is Iran framed by the media? Is it the nuclear Iran, the religious Iran, or the oil-rich Iran? The contemporary Iran flourishing with cinema or the ancient empire of Persia?

It goes without saying that global media perceptions of Iran are multivalent, myriad, and always evolving. To investigate those perceptions, in all their complexity, is a task for computational journalism. By looking at both what Iran-related topics are covered in the media and how the proportional relationship of those topics changes in time, computers can model the macro-level trends which might give insight into media biases, patterns, or anomalies. So in the spirit of inquiry, the Chartbeat Data Science Team tried to answer our original question, “How is Iran framed by the media?” with some good, old-fashioned computational analysis.

We took a corpus of 5000 articles published between August 21st and October 28th by 16 major media outlets (Note: none of this analysis uses those publishers’ Chartbeat data, only the words on their article pages). Picking articles that contained named entities including “Iran,” “Iranian,” “Persia,” or “Persian,” we then searched for trends in two capacities: temporal and topical.

Temporal

You might assume that coverage of Iran is timed at random. Some current event transpires and that event precipitates a media response. This is far from the case. Fluctuations in daily article count are informed by several time-based patterns, including:

  1. The slow-varying nature of news media production objectives
  2. The cyclical nature of week-to-week reporting
  3. The spiking nature of breaking news

In practice, this means that the number of articles might vary either because news organizations generally write a certain number of articles about Iran per month, because news organizations produce more articles on certain days of the week, or because something newsworthy has transpired in the Middle East. So, if we were to understand how many articles were published about Iran on a given day, it is important to break up that information into the graphs.



The first graph corresponds to slow-varying production objectives, the second to cyclical pattern, and the final to breaking news items. You can see that this final graph is highly volatile. Some of the possible events which could explain that volatility are:

Now that we’ve mapped articles by temporal pattern, we can explore the topics that populate these articles and the proportions of those topics in global coverage.

Topical

Doing a topical analysis of content requires a good deal of math. For those of you interested, read on; those who don’t love applied statistics can just skip to the graph. We use Latent Dirichlet Allocation (LDA) to efficiently uncover essential statistical relationships that are useful for clustering, event detection and summarization. LDA models each article as a finite mixture of latent topics, where each topic is a distribution over words. It is a generative model: a document — a bag of words — is created by initially sampling an overall topical distribution. Each word in the document is then selected by first sampling a topic from this distribution, and then sampling a word from the topic-word distribution. Given a document, we can use Bayes’ rule to explicitly represent the document by its topic proportions. All that math results in this graph:

This graph illustrates the topics most commonly discussed in any article in our corpus whose subject matter is Iran. That is to say, for example, more media coverage of Iran in the time period of research focused on U.S. foreign policy than the proliferation of ISIL. But not all topics are created equal.

Some of these topics have high entropy and some have low entropy, meaning some topics are more specific than others. Most, if not all, articles about the Iran Agreement concerned exactly that — JCPOA. Other topics like Immigration & Refugees concerned different aspects of Iran and its relationship to immigration. “Iran Agreement” has low topic entropy and is more specific; “Immigration & Refugees” has higher entropy and is less specific.

Temporal meets Topical

Now that we’ve analyzed this corpus both with respect to the temporal patterns by which they were published and the topical concerns about which they were written, we can draw some conclusions and answer our original question more definitively. By combining temporal and topical patterns, we can answer questions like “When is the most popular times to write about Iranian culture?” or “Which Iranian-related breaking news items generate the most coverage?”

This first graph maps high-entropy (more vague) topics by date:

These graphs map low entropy (more specific) topics by date.

Looking at these graphs, you can see that discourse changes radically over the course of a few months — even though roughly the numerical output of articles was pretty much constant from month to month. August through to mid-September was dominated by American legal coverage of the JCPOA. From then on, Iran’s relationship within Russian-Syrian affairs dominated. In addition, we are able to isolate two Russia/Syria coverage peaks on October 1st and 8th corresponding to a series of Russian air strikes and heavy fighting in Raqqa and Aleppo, during which Iranian military officials were killed.

So, if you want to know how the media framed Iran from August through October, look no further.

Conclusions

Clearly, Iran pops up in media coverage both where you’d expect it to and where you might not. Although you might assume that the US Congress’s passage of the Iran Agreement accrued the most coverage in the past four months, it did not. In fact, articles concerning Russia & Syria more commonly cited Iran.

More broadly, this study can give journalists and data analysts alike into the importance of contextualizing data. This study comes down to context. Whether topical or temporal, the data illuminates huge issues in novel lights. But without any context — or even without the correct context — we are forced to rely on assumptions and guesses.