Select Page


March, 2018


The FiveThirtyEight feature The Next Bechdel Test is a wonderful piece of data journalism that we found inspirational, and a great jumping off point for us to use FLIQ to extend this work. We’re going to dramatically increase the scope of this analysis from 50 movies to over 1,900, analyze the trend over time and highlight the data gaps that still exist so that we can more clearly understand what the data does and does not tell us about gender balance in movies.

FiveThirtyEight is pretty much the best in the business and so rather than an indictment of their work – this is more instructive of the state of most things from a data quality and coverage standpoint. Great and complete data is really hard, and this is especially the case in movies. If you want to get a sense of how: try looking up the budget for Balls of Fury sometime, or search for ‘Justice Smith’ on IMDB.


Let’s start with the FiveThirtyEight article itself, which is highly encouraged reading. But if you haven’t, here is an incredibly reductive summary:

1. Bechdel test is an OK measure of gender balance in movies.
2. Bechdel test is a low bar and certainly not meant as a comprehensive measure.
3. 50 top-earning 2016 movies were selected for this feature.
4. Of these 50 movies: 32 pass the Bechdel test and 18 fail.
5. Using a panel, 12 additional tests were developed and these 50 movies were scored on them.
6. Results summary for every test:

Pass percentage for the 50 FiveThirtyEight movies for every ‘Next Bechdel Test’

6. Conclusion:
► Test or metric isn’t everything
► Setting meaningful goal is very difficult
► Information is imperfect in any case
► Trend is clear, there is a significant gender imbalance in movies
► Speaks to selection bias, power dynamics
► We have ways to go in understanding and addressing

That is thoughtful, and intellectually honest work. The conclusion is especially well stated. However, as is often the case with small sample sizes, we have more questions than answers at this stage. For instance, how is female participation rate changing over time? Is it different in lower-budget movies not covered in the initial analysis? And, how complete is the actor and crew data that the original feature was based on?

FiveThirtyEight Data Check

1. A sample of the top 50 earning movies from one year is very limited:
  a. Represents 80% of total Box Office revenue made that year – OK
  b. Represents 30% of major releases that year – Not Great
  c. One year may be outlier in either direction and doesn’t show movement – Bad

2. The person data for these 50 movies is incomplete:
  a. For actors, coverage is good with an average of 40 actors per movie – Good
  b. For the crew, it is so incomplete we thought there was a mistake when we first saw the data – Bad

Count of crew members by movie in the FiveThirtyEight sample, suspecting this is an issue with the data file

We should stress that getting Great Data, especially in the movie space is like achieving the speed of light: entirely theoretical and all you can really hope for is to get close. But we can build on the foundation FiveThirtyEight laid down by bringing to bear the data collected by FLIQ.

Extending FiveThirtyEight Data

1. We are able to create a sample of 1,973 movies that cover major releases 2005-2018
  a. Represents 98% of total box office revenue over that time – Good
  b. Represents 99% of wide releases over that time – Good
  c. Multi-year sample, we should be able to see trends – Good

2. Person data:
  a. For the 50 movie overlap, we have 15% more actors mapped – Good
  b. If we assume the public FiveThirtyEight crew file is busted, we’re adding about 10% more crew – OK

To get a sense of our data coverage, the dashboard screenshot below illustrates the total number of players mapped for every movie (green bars), and the percent gender we have mapped for those players.

Overview of the data coverage for this article. Green lines are total people in each movie.

Note that currently, we are unable to confidently map the gender for 25% of our players. This is a work in progress, but we do know that this unknown population leans disproportionately female. This means that for all the figures cited below we are likely underestimating the % of women by ~5%.

The Next Bechdel Test(s) – As Tests

One of the things that we liked about the introduction of so many tests by FiveThirtyEight is that it highlights how inadequate a single test is. Gender balance, and more broadly diversity, is a complex and nuanced subject. By looking at it through multiple facets we do get a better sense of the underlying dynamics. This is good.

The challenging part, from an analysis standpoint, is that these tests are highly varied in how quantitative vs qualitative they are. This is reflective of the subjectivity of original Bechdel test (check out the discussion if The Dark Knight Rises passes the test sometime, hint: it does not) and of analyzing movies in general. Some things we can measure quantitatively (how many women are in the production?), and some things are more judgment calls (does the primary female character cause a problem for the male protagonist?). It’s a spectrum, and both kinds of questions are informative, but one kind is easier to get more answers for faster. And ‘more answers faster’ is a good place to start =)

Here is how the tests stack from this perspective:

Here you can clearly see the tension of “what we want to know” vs “what we are able to know” playing out, which is really at the center of most good analysis. For our purposes in this post, we’re going to focus on the tests we’re able to address most readily: UPHOLD and KOEZE-DOTTLE.

Comparing Results

The most straightforward evaluations for us to replicate are the UPHOLD and  KOEZE-DOTTLE tests because we can do these two in the most data-driven fashion. The UPHOLD test is based on the following condition: if 50% of the on-set crew are women, the movie passes. The KOEZE-DOTTLE tests looks at the cast excluding featured actors and, similar to UPHOLD, a movie passes if more than 50% of this group are women.

If we look at the 50 movies in the FiveThirtyEight test set we see that we get similar results. Not a single one passes the UPHOLD test. For the KOEZE-DOTTLE test, however, we are diverging, as we have only 12% passing vs FiveThirtyEights 34% passing. This is where it’s instructive to look at the underlying numbers as we show on the right, as opposed to hard pass-fail criteria. You can see that for the KOEZE-DOTTLE we have many movies between 40-50% and if using different gender estimation techniques, or having a more or less complete actors list, or even different definition of what a featured actor is, can move these movies into the pass category.

UPHOLD and KOEZE-DOTTLE evaluations performed on the 50 movie FiveThirtyEight sample, using FLIQ data

By looking at the actual percentages we have a better sense of the gender balance in these films. That is, according to these two metrics, women’s participation in movie production crew is quite inequitable and so is their participation in acting non-featured roles, though not nearly to the same extent as the crew.

But what if we now wanted to extend this analysis to the rest of the 1900+ movies in our dataset. We would see the following result, These two charts represent how every major movie in the last 12 years scored on the UPHOLD and KOEZE-DOTTLE. It is a striking result:

UPHOLD Evaluation for major releases 2005-2018, from highest percentage to lowest

KOEZE-DOTTLE Evaluation for major releases 2005-2018, from highest percentage to lowest

For the percent of the crew, the thing that jumps out is how narrow the band of participation is, it’s basically between 15% and 30%. The movie with the highest % of women in the crew is Before Midnight, with 42%. No major movie in the last 13 years has cleared the 50% threshold.

For the cast, we see more of a power curve, with a few movies featuring a high percentage (relatively) of women in non-featured acting roles, and then a long tail of movies where the percentage is increasingly small. About 12% pass the 50% threshold.

This is where it is additionally helpful to look at the percent of representation, in addition to strict pass-fail criteria. It gives us a deeper understanding of how close or far many movies are from “goal”. It’s also useful to trend this data over time to understand how it may be changing:

This was surprising, we really expected to see more movement towards gender balance over time. There is an unmistakable upwards trend for the crew (UPHOLD), ignoring 2018 which is still largely incomplete, the last three years all represent successive all-time highs of women participation rate in movie production crews. We are talking slow and incremental change here but change nevertheless.

For the cast (KOEZE-DOTTLE) on the other hand, there is no discernable trend towards balance here, just year-to-year variance.

Digging In Deeper
Two quick charts to highlight where we would like to take this work next:

Women’s participation rate by different departments

Women’s participation rate by budget category

In the first chart, we see a breakout of the overall women’s participation rate by department, as well as the overall number of people working in that department (again, this is across our entire movie sample). This gives us a better sense of where the disparity is concentrated, which can point to root drivers of this imbalance.

For example, we see that the only department where women participate in a greater proportion than men are Costumes, which is also the only department that broadly aligns with traditional gender roles. We see the lowest female participation in the most technical departments, and overall women are underrepresented in every aspect of film production outside of wardrobe and makeup.

In the second chart, we perform a similar breakdown but by budget category. The big takeaway here is that the bigger the production, the fewer women are involved.

The more that we dig into this data, the more it points to massive systemic underrepresentation. But it is a complicated thing that deserves to be understood with nuance. We will be digging deeper into this topic in our next post, where we will break down role participation with more detail (would like to specifically hone in on department leads and positions of authority), understand the trend behind large vs small productions better, and develop a perspective on the impact of gender balance on outcomes (revenue, critics/audience ratings).

Understanding and tracking diversity is an ongoing endeavor, and we’d love to get some feedback on this work. We’re looking for anything from good questions to collaborators, so if you have some thoughts, please say hello =)

In the meantime, we have added two features in FLIQ that let us monitor the gender component at a glance. The first is a gender balance card for the movie detail screens, this is available for every movie in our database. The second is a trend card added to the main dashboard, which lets you see all the recent happenings in the movie space at a glance. Hopefully, these are helpful, but we’d love to hear what you think. You can log into right now and check these out:


@2017 FLIQ.AI - All Rights Reserved