Mining Gold from Big Data with Text Analytics

15 Feb 2012
Blog

Sunday’s New York Times featured a news analysis article about the age of big data and how that means more analysis and technologies are being applied to domains which formerly seemed removed from data crunching—political science, sports, advertising, public health, and more. Technology reporter Steve Lohr highlights “a drift toward data-driven discovery and decision-making.”

Although the article emphasizes number crunching, at Basis Technology we’ve seen this same “drift” across a number of industries as companies attempt to extract value from massive amounts of unstructured text data. We consider this trend to be a validation of our approach to text analytics. Our earliest product customers were web search engines like Lycos, Google, and Bing—the first online technologies to encounter “big data”—and we now work with companies monitoring tweets, blogs and other social media. These companies, and the government agencies we also work with, all deal with the problem of slicing and dicing oceans of text data to find useful tidbits (search engines and compliance) or to come to an aggregate understanding of the whole data set (business intelligence and social media monitoring).

Recently, we’ve seen businesses sit up and take notice of one tool in particular, entity extraction—the automatic extraction of people, places, organizations and other “significant” categories from text. This text analytics tool has been around a long time, but it’s only now that we are seeing a broad range of industries adopting it. In social media analysis, entities are mapped to sentiment (think entities like “Dunkin Donuts” being linked to social media comments). Plugging entity extraction into government intelligence, may reveal trends and patterns based on the rise and ebb of entities. Publishing may use entities found in unstructured text to link disparate data sources via common entities.

Lohr’s article quotes a January report by the World Economic Forum in Davos, Switzerland which “declared data a new class of economic asset, like currency or gold.” As we’ve seen with our customers though, having big data, without an automated way to get through it all is like possessing a vein of gold embedded in a mountain. Text analytics is making it possible to aggregate and annotate existing information, link between information repositories, and provide a comprehensive view of the data to the end user.

This point was also highlighted by Andrew Jordan, the CTO and COO of the Accelus division of Thomson Reuters inan interview with the BBC, where he describes their “Content Marketplace” initiative which enables better access to data across their organization. Jordan describes this initiative as “[creating] something bigger than the sum of the parts.” We think that’s a good summary of the value of big data technologies.