A: The main problem that companies run into with correlation analysis is that many people often quickly assume that the analysis indicates causation. Only proper testing can determine whether or not you’re looking at independent and dependent variables.
One of the modern challenges of correlation analysis is, with so much data that exists, there might be similar correlations and strengthened relationships between many different variables or sets of data with another set of data. There can be some paralysis when deciding which variable to evaluate more closely later using multivariate analysis. It isn’t always immediately clear which correlating relationship will be the most beneficial to pursue. It is important to choose one that may be representative of others that are not truly independent.
For example, when looking at orders or purchases, there might be similar correlations between that variable and visits to a website or store, page views, and number of visitors. One of the challenges is ensuring that your teams understand you can have multiple sets of data that correlate in a similar way because they're similar in nature. These data sets might get collected at the same time or with the same frequency, or they may have some sort of inherent relationship. It’s important to keep that relationship in mind when looking at different variables with similar correlation outcomes.
Companies can also run into problems with missing data. Let’s say you’re looking at the correlation between stock prices and sales in a specific time period. If you suddenly have missing data for a portion of that time, or if the variables don’t line up, it can really throw off the correlation analysis itself because it will treat the missing data as zeros, even though there is a difference between the two. To mitigate potential problems, make sure you choose a period of time for the data you're collecting, or observations that have the right distribution, that the assumptions align with the underlying data, and that you apply the proper technique. And when there's missing data, exclude it. If you’re looking at time-based data, try to find an observation period with consistently collected data.
Finally, a company can make an assumption that because a correlation is statistically significant it means there must be a strong association, but this is not always the case. The relationship can be statistically significant and still have a fairly weak association. Correlation analysis is simply testing the null hypothesis that there is no relationship. By rejecting the null hypothesis, you accept the alternative hypothesis that declares there is a relationship, but there is no information about the strength of the relationship or its importance. Be careful about how you interpret association or correlation, because the correlation coefficient and statistical significance are two separate concepts.