Don't Confuse Correlation with Causation

Correlation is a measure of the degree to which two variables are associated. The correlation between two variables can take any value between –1 and +1. A correlation of +1 indicates a perfect direct association, a correlation of 0.0 indicates no association and a correlation of –1 indicates a perfect inverse association.

It is important not to confuse correlation with causation. Spurious correlation is a frequent flaw in forecasting models. The consequences are inaccurate forecasts and misleading strategic advice about how markets work.

There are three reasons why two variables may display a significant correlation. One may cause the other; both may have a common cause; or both variables may be measures of the same underlying factor.

In developing a forecasting model, it is important to avoid spurious correlations. For example, one may observe a positive correlation between ice cream sales and the rate of plant growth – but there is a common cause and that is temperature.

Sometimes the common cause is something like population or employment growth. Employment growth causes growth in both car sales and liquor sales – so car sales may be correlated with liquor sales. Before exploring relationships between variables, we need to exclude the effect of any possible common influence. In a time series context, this is often done by an appropriate degree of differencing. Factor analysis techniques can be employed with cross-sectional data.

Causality requires at least three conditions:

  1. significant correlation between the variables;
  2. temporal asymmetry (ie time precedence) between the variables (although people can act in the expectation that something is going to happen);
  3. elimination of any common causal variable.