Sunday, May 13, 2007

Predictive analytics

Like a cherry on top of the eMetrics Sunday Summit, I attended the course on Predictive Analytics from Eric Siegel. The slide deck of 200 pages served as a good reference for the 10 topics to be covered in just two days. The first day was easy; introduction and preparation, the most interesting stuff came in the afternoon of the second day, when modeling and deployment were addressed.


Companies need to invest in the process of doing web analytics. I will quote one of my favorite author, Tom Davenport:
At a time when firms in many industries offer similar products, business processes are among the last points of differentiation. Tom Davenport, "Competing on analytics"
Basically, predictive analytics tap on the collective experience of an organization to puts this knowledge to action. No crystal ball here, but some serious statistical modeling and business expertise put to work. The goal is basically to "predict better than the competition".

Killer apps

This topic is about the application of predictive analytics, not about predictive analytics applications (subtle distinction, but important!). Predictive analytics evolves around a problem solving approach:
  1. Set the overall strategy, outline the initiative
  2. Predictive modeling approach
    1. Determine the prediction statement
    2. Find out which data is required (segments & predictors)
    3. Deployment, how the model will be integrated or used
    4. Business case
  3. Evaluation; where KPI are determined, A/B tests are done on a control group and a baseline method of comparison is set
  4. Determine what are the challenges and bottlenecks anticipated: organizational, technical


Preparation of the data amounts to 80% of the work... The prediction goal drives the data preparation and is part of a larger process:

Univariate analysis

If I can't picture it, I can't understand it. Albert Einstein
The goal of univariate analysis is to see how well each predictor do alone. At the same time, univariate analysis provides good insights and serves as a double check for the implementation logic. Individual predictors will later be used as baselines over which modeling much improve.


For those who know about RFM (Recency, Frequency and Monetary), segmentation is the process of slicing, dicing and clustering the data. Here we talked about segmentation, clustering, OLAP, and data mining.


Starting from the concept that "history repeats itself", various modeling methods have been defined, some of them are better suited under specific conditions. Fancy terminology like decision trees, rote learning, naive Bayes, linear regression, neural networks, genetic programming as well as overlearning, underlearning or over-fitting were described.


This topic was all about "how well predictive analytics works". Lift curves were used to validate the model on test data.


Once the model is set and tested, predictive analytics is put to work. Deploying the predictive model brings recommendations for improvement, at the same time, new data is collected and fed back to the modeling engine, leading to further refinement of the model.


Predictive analytics is not a easy task, it requires planning and the right resources. Predictive analytics is a business activity, not an IT one. It's a wholly collaborative process driven by business needs and marketing expertise.


Predictive analytics allows for "per-customer" predictions and tactical achievement of strategic marketing objectives. At the same time, the organizational process ensures predictions are actionable and driven by business needs. Careful deployment of predictive analytics mitigate risks and ease performance tracking.

Note the words "Web" or "Internet" are nowhere here, and that's intentional. Although predictive analytics can be easily applied to the Web, it doesn't stop there. The concepts are far greater and can encompass a vast array of business activities. An interesting course that some participants found difficult, others not going deep enough. To me, the course was valuable and could have lasted another day in order to dig a bit more into the modeling, results and deployment aspects of predictive analytics.