Sign up

Data 101

January 18, 2017

What’s really going on with data?

Pricing is an incredibly powerful framework to look at the world, because it is the offspring of supply and demand, those economic basic forces, and the cousin of value, a key concept underpinning supply and demand.

In this first of two posts, we will try to use the pricing lens to look at what is going on in the fast-paced world of data.

The age of Big Data

First up is a well-known fact: the amount of data is exploding exponentially. I’ve used this simple table about Global Internet Traffic from Cisco because it highlights how the data explosion has changed our very thinking about how we measure data.

We go from gigabytes per day in 1992, to gigabytes per HOUR in 1997, then gigabytes per SECOND over the next 5 years. And now, while consumers are still getting their heads around terabytes, data scientists like us are busy crunching petabytes, exabytes, zettabytes and yottabytes. And we’re looking forward to the age of Big Big Data when we can play with brontobytes and geopbytes.

Data costs are falling

The second fact I want to highlight is also relatively well known, although less so than the first: the cost of storing and processing data is getting lower, also exponentially, as can be seen from this graphic from KPCB’s 2016 Internet Trends Report.

Now, “exponential” is difficult to absorb, so let’s just put it this way – “things are changing really fast”.

Why so?

Not many people have made the link between the two trends; but what is really going on is pure economics 101: just as the price (cost) of data is going down, the supply is going up. Simple.

What we call “elasticity” is, in this case, clearly visible. Since they are both exponential processes, here we show the log of supply and demand in a scatter plot (the trendline is not a linear regression).

Running a simple regression analysis between the two factors (supply and demand) clearly shows the link. And this can be used to infer how much additional data is expected, as its cost continues to fall. Not much different, after all, from how we analyze the impact of pricing on retail demand.

Now, what is not clear here is the direction of causation. Is the reduction in cost causing an explosion in volumes, by allowing for cheaper data to be created profitably? Or is the explosion in demand causing a reduction in supply cost?

Most likely a combination of both.

To the pricing researcher, what is truly important is the ability to run future scenarios, and have a valid explanatory model for it. In fact, in the vast majority of retail analysis, elasticities are a lot harder to measure, and more volatile, than this simple example. Therefore that is our core focus.

‘Junk’ data is exploding

One thing is sure with regards to the data explosion: as the price is going down, its value and therefore quality are inevitably going down. The truly value adding data most likely already existed, even when it was more expensive to store it and process it.

So all this new data is… well, mostly junk!

For example, it takes millions of Tweets (low value data) to gain a handful of useful insights (high value data).

Now, Tweets may be really important in some cases (for example, the police warning about a real-time danger) – but no one has yet achieved a University degree reading Tweets. On the other hand, reading an academic book (high value data, which takes plenty of effort to write) can lead to inspiration, learning, and a degree as well (high value outputs).

Since even poor quality data leads to information with positive returns, I would expect the explosion of data to continue undisturbed until the ratio of information extracted to the cost of processing will have gone down to just above 1 – which is a long time away. You can think of the price and value of data, and see my argument.

Heading for a Singularity Event?

You may be asking if this explosion in data led to the methods for studying data becoming any more innovative.

No, not really.

What is really changing is simply that there is more data and this won’t lead to a “Singularity event” any time soon. Today we are using just the same old methods which are finding a new lease of life because of the huge amounts of data available to feed them.

But that’s a story for a different post.

The rise and fall of spam

Let’s go back to our pricing lens, to understand data and how even seemingly irreversible exponential trends may suddenly reverse.

A case of exponential growth suddenly reversing into exponential decay is the amount of spam email sent:

What has happened here? At first, the cost of sending emails was really low, and the return on that small cost was actually high – since it was enough to get a tiny fraction of the receivers to click on those emails, to return a multiple of the cost of delivery. Therefore, supply was high, cost was low – much as it is for data today.

In this case, unlike the growth data that is generally a positive, the growth in spam was unwanted. So new regulations were introduced, and new filters used to prevent the spam emails from reaching their targets. This increased the cost to the spammers, reduced the return, and therefore the “spam market” has found a new, lower normal balance (thankfully), hopefully disappearing in the future.

So, in the first part of this series, we have seen 2 examples of how the pricing lens can be used to look at the world in novel ways, with a particular focus on data.

In the second piece we will look at pricing and data focused on methods for analysis and extracting value – and go back to our point about Singularity.

Stay tuned!


About the author

Martin Luxton is a writer and content strategist who specializes in explaining how technology affects business and everyday life.

Big Data and Predictive Analytics are here to stay and we have only just begun tapping into their enormous potential.

Hey! Was this page helpful?

We’re always looking to make our docs better, please let us know if you have any
suggestions or advice about what’s working and what’s not!

Send Feedback