For the release of “Rogue One,” may the force of NLP be with you!

16 Dec 2016
Blog

But wait, what is NLP again?

Explaining NLP (natural language processing), text mining, data extraction and text analysis to people outside the industry can be a challenge. While everyone expects fast, accurate and reliable search results, few understand the work and the tools behind them.

NLP is to search what an engine is to a car. When you buy a car, you expect it to run, and you trust the engine to be reliable. However, when you get to the dealership you probably don’t take the engine apart as part of your decision making process. Similarly, people rarely consider the inner workings of NLP in search but, like a car and its engine, good search doesn’t run without NLP.

NLP powers more than just search. Today we’re highlighting our European partner Precognox, who shows how data mining and text analytics can be used to analyze the Star Wars movie scripts! This is their gift to you for today’s “Rogue One: A Star Wars Story,” release.


For the premiere of Rogue One, Precognox applied sentiment and network-based (who talks with whom) analysis to the Star Wars universe. Their findings are presented with interactive data visualizations, providing an easy and illustrative view of the information, harvested automatically from the movie scripts themselves.

Precognox took the seven episodes of Star Wars, split them into scenes and analyzed the dialogues, categorizing them by participants and overall sentiment, or mood.

For instance, by analyzing conversations in the original trilogy, we can see that Luke, C-3PO, Han Solo, Leia, and Chewbacca are the dominant characters (ok, you may have guessed that without NLP). More interestingly, we can also see that Han Solo, has the most spoken interactions of any character in the film series and C-3PO is only the second most “chatty” character (surprise!). Unexpectedly, Darth Vader, while central to the films’ narrative, has only 50 conversations in the entire original trilogy.

The interactive graphs visualize the dialogues of the two trilogies and The Force Awakens as a network, searchable by character.

Since Rogue One is set just before Episode III, here is a screenshot of the sentiment of all the scenes in “A New Hope”. As you can see, Star Wars is actually quite a “darth” movie. In the diagram below, the dark purple bars reflect negative sentiment, and the yellow ones positive. You can also drill down by character or scene.

Sentiment scores of all the scenes in “A New Hope”

On another graph, Precognox created a network of the actors from all seven episodes in an attempt to identify the Kevin Bacon (the person with only six degrees of separation from any character) of the Star Wars Universe. Can you guess who it is? Check out the visualization to see for yourself.

Prepare yourself for “Rogue One” and rediscover the previous seven Star Wars episodes with an NLP angle. Find the entire analysis on the Precognox Blog.

Enjoy the movie and may the force of NLP be with you!