Big Data Forecasting
Big Data Forecasting
By Gary Angel|
November 3, 2016
Forecasting is a foundational activity in analytics and is a fundamental part of everyone’s personal mental calculus. At the simplest level, we live and work constantly using the most basic forecasting assumption – that everything will stay the same. And even though people will throw around aphorisms of the “one constant is change” sort, the assumption that things will stay largely the same is far more often true. The keyword in that sentence, though, is “largely”. Because if things mostly do stay the same, they almost never stay exactly the same. Hence the art and science of forecasting lies in figuring out what will change.
There are two macro approaches to forecasting: trending and modelling. With trending, we forecast future measurements by projecting trends of past measurements. And because so many trends have significant variation and cyclical behaviors (seasonal, time-of-day, business, geological), trending techniques often incorporate smoothing.
Though trending can often create very reliable forecasts, particularly when smoothed to reduce variation and cycles, there’s one thing it doesn’t do well – it doesn’t handle significant changes to the system dynamics.
When things change, trends can be broken (or accelerated). When you have significant change (or the likelihood of significant change) in a system, then modelling is often a better and more reliable technique for forecasting. Modelling a system is designed to capture an understanding of the true system dynamics.
Suppose our sales have declined for the past 14 months. In a trend, the expectation will be that sales will decline in the 15 month. But if we decide to cut our prices or dramatically increase our marketing budget, that trend may not continue. A model could capture the impact of price or marketing on sales and potentially generate a much better prediction when one of the key system drivers is changed.
The video is a redux of a couple recent speaking gigs – one on big data and predictive analytics and one on big data and forecasting. The video focuses more on the forecasting side of things and it explains how big data concepts impact forecasting – particularly from a modelling perspective.
Like each of my big data videos, it begins with a discussion of what big data is. If you’ve watched (or watch) either of the first two videos in the series (Big Data Beyond the Hype or Big Data and SQL), you don’t need to watch me reprise my definition of big data in the first half of Big Data and Forecasting. Just skip the first eight minutes. If you haven’t, I’d actually encourage you to check out one of those videos first as they provide a deeper dive into the definition of big data and why getting the right definition matters.
In the second half of the video, I walk through how “real” big data impacts forecasting and predictive problems. The video lays out three common big data forecasting scenarios: integrating textual data into prediction and forecasting systems, building forecasts at the individual level and then aggregating those predictions, and pattern-matching IoT and similar types of data sources as a prelude to analysis.
Each of these is interesting in its own right, though I think only the middle case truly adds anything to the discipline of forecasting. Text and IoT type analytics are genuine big data problems that involve significant pattern-matching and that challenge traditional IT and statistical paradigms. But neither really generate new forecasting techniques.
However, building forecasts from individual patterns is a fairly fundamental change in the way forecasts get built. Instead of applying smoothing techniques for building models against aggregated data, big data approaches use individual patterns to generate a forecast for each record (customer/account/etc.). These forecasts can then be added up (or treated probabilistically) to generate macro-forecasts or forecasting ranges.
If you’ve got an interest in big data and forecasting problems, give it a listen. The full video is about 16 minutes split into two pretty equal halves (big data definition, big data forecasting).