Broadcasters hit the button for big data

Big data sways between those who say it is just the latest marketing buzz phrase from profit-seeking vendors to those who proclaim it as heralding a major change in IT.

The truth, as it normally does, lies somewhere between the two. But, for broadcasters, the story is rather different in that the arrival of big data coincides with the transformation to IP file-based systems, cloud delivery and TV Everywhere. As such, it is less of an exaggeration to say that in the case of broadcasting, big data is ushering in a revolution as part of these wider trends that are totally changing the way content is processed and distributed.

The hype is really over the idea that the world has suddenly become submerged under great volumes of increasingly unstructured data that needs radically new methods and tools to process, when, in fact, this has been a steady progress over the last two decades — ever since the emergence of the Internet into mainstream commerce with fast-growing public use. This has created all the new unstructured data dwarfing, in volume, the better-ordered relational databases holding records such as customer bank account details — information capable of providing precise answers to structured queries, but with little scope for unexpected insights.

Broadcasters and pay-TV operators were no different, with their data largely confined to customer information and financials. But then, the data warehouse burst onto the scene in the late 1990s, combining different internal sources of information, although at this stage still confined to structured internal data. With it came data mining, providing tools for divining subtle but commercially valuable correlations that could not readily be spotted by humans because of the volumes involved.

One of the first widely quoted examples was how beer and diapers (nappies for UK readers) tended to turn up together in men’s’ shopping baskets on the way home from work. This led U.S. convenience stores to locate diapers beside the beer counter instead of with other toiletries.

Big data extends the scope of data mining to embrace sources of unstructured data such as publicly available social network information on Facebook, Twitter and elsewhere. But, the big step forward relates not to the data itself, but the ability to process it rapidly in real time. This might not help the convenience store so much, but for broadcasters, pay-TV operators and advertisers, there is huge potential through instant targeting and recommendation. This is on top of traditional opportunities that evolved with the data warehouse to make informed decisions based on analysis of historical data in the days, weeks or months after it was captured.

The real-time capability has come primarily from developments in hardware and to an extent analytical tools, rather than anything directly to do with big data. The ability to bring the data together, and then mine it to obtain interesting patterns or correlations, is the result of new massively parallel hardware.

Both Twitter and Facebook have deployed such systems to mine their vast amounts of data. Twitter especially can claim to have a played a pivotal role in the big data movement with the development of its Storm distributed cluster architecture specifically for processing large amount of unstructured data coming from multiple sources and locations. Before Storm arrived, big-data analytics was ruled by various batch processing systems, notably Hadoop, which has been widely deployed for numerous processing tasks involving complex unstructured data, from identifying recent sales trends for retailers to indexing the web.

But, Hadoop required the data to be static and in one place, so that it could not account for ongoing changes and could only enable historical analysis. This was no use for Twitter which, more than any other social medium, generates information whose value ages very quickly, and which wanted to provide continuous live analysis of activity. For this reason, Twitter developed Storm to enable analysis of data held in distributed sources, and crucially in real time, so that it could provide analysis tweet trends as they occur, and constantly in the light of ongoing activity.

This is relevant for broadcasting because, in September 2011, Twitter made Storm available open source, and since then the model has been adopted by vendors in the TV arena. NDS, now part of Cisco, has incorporated Storm in its cloud platform called Solar, which was demonstrated for the first time at IBC 2012.

This enables pay-TV operators or broadcasters to collect and interrogate the data replicated across multiple servers in order to take actions in real time, such as target ads, generate information or make recommendations on the basis of individual preferences, current content being watched orother factors. It can be used to tailor individual offers to customers, of content or advertised products, with the ability to apply machine-learning algorithms to improve performance in the light of the responses. This can harness one of the promises of big data, which is the ability to adapt a service on the basis of complex relationships within customer relevant data at a scale not possible by direct analysis.

Although big data is certainly an overused buzz phrase, the associated capabilities will be vital for commercial success in broadcasting, and will only be achieved by migrating to cloud-based infrastructures that crunch all the relevant information.