The myth surrounding big data
A ‘personal data economy’ could be the greatest enabler of truly relevant, credible and integrated data to emerge from the online world in years; but it could also be its greatest challenge, says David Brennan, founder of Media Native.
I was fortunate enough to be invited to join the panel at MediaTel’s recent Connected Consumer Conference discussing the future of media metrics in the age of big data. As our first topic of debate, we were asked to address the possibility that, as we know more and more about the consumer via their online behaviour, what would become of ‘research’ – particularly in the form of panel data?
Now, I know that binary positions have to be set to make a conference panel session interesting, especially when the organisers take the brave step of making the metrics panel last on the bill; it is a reflection of the increasing importance of data and metrics that there was almost a full audience for this most esoteric of subjects. Still, I am still perplexed by this ‘winners and losers’ attitude to every aspect of media evolution.
Even though the panel was perfectly set up for an ‘analogue vs. digital’ battle, there was a slightly anti-climactic consensus view that, actually, the future will be both; behavioural data and consumer panel data working together to understand what happened (online at least), by whom and, most importantly, how and why.
Refreshingly, there was not a single plaintive cry of “why rely on a panel of 5,500 when we have the data from millions?” In fact, BARB and the other consumer-based media metrics, came out of the debate rather well, with an implicit recognition of their importance within the framework of the emerging media landscape.
In the run up to the panel session, we had a presentation from Ryan Garner of GfK, who talked about the emergence of a personal data economy, where consumers understand the value of the data trail they leave, and will be willing to make it more openly available if there is a fair exchange of value as a result.
As I said during the debate, this could be the greatest enabler of truly relevant, credible and integrated data to emerge from the online world in years; but it could also be its greatest challenge.
There are two drivers of this potentially major disruption to the data economy, towards placing the consumer in control of the data they are prepared to make available. The first is the current climate around privacy and data sharing; legislation guaranteeing the individual greater control of the data they produce is already beginning to come into force.
Meanwhile, the limitations of the cookie as our primary behavioural tracking tool have been exposed as our online activities migrate to mobile devices and in-app content.
A recent conference addressing the latter issue, organised by the IAB, offered the intriguing possibility that consumers can be persuaded to allow tracking of individual mobile devices registered to them in order to help join the dots that have recently become even more stretched. So, the personal data economy is very much in its formative stages.
The two questions I posed at the Connected Consumer panel session were;
– What does this mean for the behavioural tracking industry?
Whatever happens, it is now a given that we can no longer have ‘census’ data from behavioural tracking. However the data is collected, it will almost certainly come from a sample of consumers – hopefully a very large sample, but a sample nonetheless – which will have to be modelled against the wider population and we will need to understand how those who elect not to be part of this personal data economy can be represented by those who are.
– My second question referred to the ‘exchange of value’ that would be required in order to persuade online users to allow collection of their personal data.
In a way, the principle is already well established with store cards, such as Tesco’s Clubcard and Sainsbury’s Nectar, although these are reportedly becoming less popular and they have the advantage of being retailer-specific.
Although individual online players such as Amazon and Google can persuade users to part with a great deal of data in order to create efficiency in their transactions, could a more general, internet-wide exchange of data be perceived as appealing?
I guess it could be if a significant financial exchange were offered (although that would create its own potential biases in the data), but otherwise, I struggle to see what value could be offered to encourage active sign-up amongst a critical mass of consumers.
We would also need to see much greater auditing of the data if the metrics derived from online behaviour were to be fully integrated into the wider media research landscape.
At the moment, issues such as ad visibility (it is estimated that more than half of all online display ads never appear on screen), fragmented data collection and transaction auditing (more than 60% of internet traffic is non-human, a growth of more than 20% in just two years) are challenging the credibility of the data.
So, to those future stargazers who believe these metrics should form the basis of our future media trading, as well as being integral to the communications planning process, these figures would suggest we have a long way to go before we get the accuracy, reliability and transparency of our existing (panel-based) metric infrastructure.
Like most sensible media people (including my fellow panellists), I believe the future is based on the successful integration of research and analytics, and the industry currencies (e.g. BARB’s Project Dovetail, NRS PADD) appear to agree.
But that is not to say it will be a simple process, nor that it will be accomplished using the somewhat fragmented, uncontrolled and unaudited processes through which much of the ‘big data’ revolution has been subject to so far.
In fact, all of the evidence suggests that the analytics world can learn far more from the world of sample-based research than vice-versa. But then, all of the evidence suggests it will all be sample-based in the future. In which case, let’s hope that the size of the sample will be matched by the quality of the data it produces.