Multivariate analysis involves analyzing multiple variables (more than two) to identify any possible association among them.
Multivariate analysis offers a more complete examination of the data by looking at all possible factors.
Multivariate analysis can help companies predict future outcomes, improve efficiency, make decisions about policies and processes, correct errors, and gain new insights.
Multivariate analysis often builds on univariate (one variable) analysis and bivariate (two variable) analysis.
The more a company invests in ensuring quality data collection, the more likely the results of the multivariate analysis will be accurate.
John Bates is the director of product management for Predictive Marketing Solutions and for Analytics Premium for Adobe Marketing Cloud. His core responsibility is to develop the product roadmap for all advanced statistics, data mining, predictive modeling, machine learning, and text mining/natural language processing solutions within the products of Adobe's digital experience business unit.
Q: Why do companies use multivariate analysis?
A: There are many different benefits companies of conducting multivariate analysis. Doing so can help companies forecast future opportunities, risks, demand for products, etc. And that helps with investment strategies, business decisions, and setting expectations.
Operational efficiency is another way a business may strategically use multivariate analysis. Regression models, for example, can be used to optimize business processes. A factory manager can create a model to understand the impact of oven temperature on the shelf life of cookies baked in those ovens, or a call center can analyze relationships between the wait-time of callers and the number of complaints.
The information derived from multivariate analysis can also support data-driven decision making and eliminate guesswork with corporate policies and processes. Businesses often have large quantities of financial, operational, customer, and purchase data that help inform business decisions based on statistical significance, instead of intuition and gut feel. By relying on multivariate analysis, you can decrease your overall risk and chance of failure.
Multivariate analysis can also correct errors. Not only can regression modeling, for example, help support management decisions, but it can also help identify errors in judgment. A retail store manager may believe that extending shop hours will increase sales, but multivariate analysis or regression analysis may actually indicate that increased revenue might not be sufficient to support the rise in operating expenses due to longer working hours.
A company may also use multivariate analysis to gain new insights. They can uncover new customer targets or identify market patterns that exist during certain times of the year or hours of the day. Without analysis, those signals might be buried in a large collection of unorganized data.
Q: How does multivariate analysis differ from univariate and bivariate analysis?
A: Univariate analysis involves the analysis of one variable at a time. Usually, the objective is to describe the variable. An example of univariate analysis would be an examination of how many students graduated with a degree in computer science.
Bivariate analysis lives between univariate and multivariate analysis. It is a type of correlation analysis that examines the possible relationship between two variables. An example would be an analysis of the correlation between gender and graduation with a computer science degree.
Multivariate analysis goes one step further and analyzes the associations between at least three variables. For instance, multivariate analysis would be looking at the correlation or relationship between gender, graduation with a computer science degree, and country of residence.
Q: Why would a company choose multivariate analysis over univariate or bivariate? ?
A: Because you're accounting for multiple variables with multivariate analysis, you're uncovering the relative influence or impact of one variable on another set of variables. It gives you a better sense of reality.
A simple bivariate correlation model might predict that if a company spends ten times more money on marketing, they will see a five percent increase in sales. But with a multivariate analysis, there might be other constraints or factors that play into that, which would give you a more realistic prediction. You might see that the actual increase in sales wouldn’t be five percent, because you need to take into account the quality of the marketing spend, the channels, or the time of year.
There are cases where multivariate analysis may be unnecessary. If you're trying to get to something like an insight or do a simple forecast of a metric, you don't need multivariate analysis to do that. By looking at historical revenue data, for example, you can make a basic prediction for the next quarter or year. However, if you want to understand the levers you can pull and the factors that cause or influence those predictions, that's when you would want to expand to multivariate analysis.
It might be a smart option to do univariate or bivariate analysis first. You can start by trending data, which is a univariate analysis, to get the statistical mean and median. Once you have that information, then you can perform analysis to understand the relationship between that data and other variables.
Q: What are the different types of multivariate analysis?
A: There is a decision tree of multivariate techniques that can be used, and they depend on a number of factors. Typically, there are a few questions you ask yourself, and that determines what class of multivariate techniques you should focus on. The first question is whether the variables are divided into independent and dependent classifications. There are techniques used specifically for dependent variables, and other techniques focused on independent variables. If the answer is yes, the next step is to identify how many variables are being treated as dependent versus independent, and how both types of variables are measured.
After asking yourself these kinds of questions, you’ll arrive at two families of techniques. One is the family of dependence methods, which includes options like multiple regression, conjoint analysis, multiple discriminant analysis, linear probability models, multivariate analysis of variance, structural equation modeling, and canonical correlation analysis. With each of these techniques, you’re making strong assumptions about the variables up front. With the other family of techniques, interdependence techniques, you’re looking at variables that can’t be classified as either dependent or independent, and you’re not making assumptions about the variables themselves. Examples of interdependence methods include factor analysis, cluster analysis, and correspondence analysis.
Q: What is the process of conducting multivariate analysis?
A: The process of conducting multivariate analysis depends on which techniques you're using and the objective of the multivariate analysis. Generally, when performing multivariate analysis, you’re trying to achieve one of five different objectives:
- Data reduction or structural simplification. Multivariate analysis focused on this objective helps data get as simplified as possible without sacrificing valuable information, which helps with interpretation.
- Sorting and grouping. When you have multiple variables, you may want to group similar ones together based on common characteristics. An example of a method used to achieve this objective would be cluster analysis.
- Investigation of dependents among variables. This is an exploratory analysis technique. You're trying to explore the data, and better understand the relationships between the variables of interest. For example, are they mutually independent? Are one or more variables dependent on the others?
- Prediction. You might want to predict a relationship between variables. For example, you may use past observations of other variables or current observations of one variable to make a prediction on an unknown variable. If clicks to your website go up by 10%, you might use that information to predict how many more sales you’re going to get, but in the context of other variables, like the time of year or marketing channel.
- Testing a hypothesis. For specific statistical hypotheses, you want to formulate them in terms of the parameters of the populations. You want to test the hypotheses with certain assumptions or prior convictions, and test out the influence or impact of a particular treatment, such as on a sample, and then be able to infer that upon the population.
When it comes to the model-building process, there are a few steps you always have to follow. First, you have to define the research problem, objective, and the potential, and then map these to the multivariate technique that will be used. The next step is to develop the analysis plan. Are you going to be reviewing the initial analysis with the stakeholders? Are you going to do a univariate or bivariate analysis first?
You then want to evaluate the assumptions underlying the multivariate techniques themselves. Whichever multivariate technique you choose, there are certain model assumptions that need to be accounted for, like linearity, independence, or the shape of the independent variables. You may need to assess whether or not there are relationships between the independent variables that are undesirable, break some of the assumptions in the model, or are spurious correlations.
Next, you want to estimate the multivariate model and assess its overall model fit. Then you will interpret the model and validate it, executing the information in some way, according to your original objective.
Q: What is the advantage of multivariate analysis?
A: The main advantage is that multivariate analysis considers more than one factor. It looks at the various independent variables that influence the dependent variable.
The conclusions you draw from multivariate analysis is also more likely to be accurate. There will always be errors, but by taking into account all the possible variables that could be influencing your data, you are less likely to miss something and make an incorrect assumption.
Q: What are the disadvantages of multivariate analysis?
A: Multivariate analysis requires more complex computations to arrive at a satisfactory answer. And you have to make sure you have enough observations for all the variables you’re analyzing. All that data needs to be collected, tabulated, and understood, and so it needs to be cleaned. The sort of governance and prep required for multivariate analysis is typically much more complex and time consuming.
Q: What best practices can companies follow to ensure better results?
A: It’s important to remember the age-old saying: “Garbage in, garbage out.” By investing in high-quality and consistent data collection, you will get a more accurate analysis, and this will facilitate the scaling of more and more models to be built. You won’t have to reinvent the wheel each and every time you do analysis, or constantly check the quality of your underlying data.
Also, because companies continue to work with larger and larger quantities of data, there is a tendency to want to use more sophisticated techniques like neural networks or deep-learning type techniques. These methods sacrifice interpretation. You might achieve some level of accuracy, but it might take more computation power and more time to arrive at the output. And that comes at a real cost. You also can't distribute or disseminate the potential insights or the interpretation of the output to the organization because of the added complexity of all of the different variables that need to be included.
There's a balance between completeness and potential value, and there’s not one answer or formula that works for every situation. There may be a different sweet spot for different departments or use cases.
Q: How will multivariate analysis change in the future?
A: In the past, multivariate analysis has been largely left to an actuary or statistician, but in the future, the model-building process will likely become more automated. We're already starting to see this with some software where you simply state an objective or metric and a definitive variable that you're trying to maximize or optimize in some way, and the software is able to quickly compute many different models simultaneously for immediate evaluation. This process will cut out a lot of that manual work and achieve the most accurate or best model for a given objective much more quickly.