Correlation is a measure of the degree to which two variables are associated. The correlation between two variables can take any value between –1 and +1. A correlation of +1 indicates a perfect direct association, a correlation of 0.0 indicates no association and a correlation of –1 indicates a perfect inverse association.
It is important not to confuse correlation with causation. Spurious correlation is a frequent flaw in forecasting models. The consequences are inaccurate forecasts and misleading strategic advice about how markets work.
There are three reasons why two variables may display a significant correlation. One may cause the other; both may have a common cause; or both variables may be measures of the same underlying factor.
In developing a forecasting model, it is important to avoid spurious correlations. For example, one may observe a positive correlation between ice cream sales and the rate of plant growth – but there is a common cause and that is temperature.
Sometimes the common cause is something like population or employment growth. Employment growth causes growth in both car sales and liquor sales – so car sales may be correlated with liquor sales. Before exploring relationships between variables, we need to exclude the effect of any possible common influence. In a time series context, this is often done by an appropriate degree of differencing. Factor analysis techniques can be employed with cross-sectional data.
Causality requires at least three conditions: