Data Warehouse

Data warehouse

Quick Definition: A data warehouse is a large data repository that pulls data from databases, transactional systems, or other sources of information.

Key Takeaways:

The following information was provided during an interview with Nate Smith, product marketing manager for Adobe Analytics.

What is a data warehouse?
What are the drawbacks of a data warehouse?
How has data warehousing evolved over time?
Are there types of industries or businesses that wouldn’t need to use a data warehouse? What are alternatives?
What are the differences among a data warehouse, a database, and a data lake?
How will data warehousing continue to evolve?

What is a data warehouse?

A data warehouse is a large central repository that pulls data aggregates from databases, transactional systems, or other data sources.

The purpose of storing so many data sets in a data warehouse is to analyze them so you can understand trends that are affecting the business — especially trends relating to customer behavior.

These insights usually referred to as business intelligence, will help teams across the organization make better business decisions.

What are the drawbacks of a data warehouse?

One of the largest drawbacks of data warehousing is that most data warehouses — and access to those data sets — are not democratized. This means that the data is not equally accessible for everybody in the organization.

For most use cases, a data analysis team or group is assigned ownership of the data, and they’re responsible for making reports to circulate throughout the company.

Some business users look at these reports as democratizing the data, but this is an incorrect comparison. A visualization of data, and the big data itself, are two very different things.

Say your company’s marketing team wants to start querying engagement data over the past six months to improve your company’s customer journey. The team does not have authorized access, so they must go through the data analysis team to access it, who are too busy to even notice the request in the first place.

Without that data, your marketing team can’t get insights to make better business decisions on their schedules, which could potentially have a negative effect on your customer experience.

How has data warehousing evolved over time?

The concept of data warehousing has been around for decades — though it was a lot more simplified back then, and the location of the data was all on-premises.

As time has passed, data quality and the underlying technology for storing business data and data have gotten better, which has led to better analytical components that shape what data warehousing is today.

Are there types of industries or businesses that wouldn’t need to use a data warehouse? What are alternatives?

In general, data warehousing is unnecessary for small- to medium-sized businesses, from a cost and personnel standpoint.

A data warehouse costs a large sum to set up, and it requires dozens of people to build it, keep it running, and optimize the data within it. This amount of budget and employees is impractical for smaller businesses to worry about.

But moving toward cloud-based storage opens up a lot of opportunities for small businesses to be able to store larger amounts of data.

And artificial intelligence is becoming more capable, which eliminates the need to pay for professional data scientists.

Data storage is now becoming much more about self-service. Smaller businesses will have access to technology they couldn’t even think about by introducing cloud storage for data warehousing.

What are the differences among a data warehouse, a database, and a data lake?

A data warehouse is higher on the pyramid of big data storage, in terms of size and function.

A database is typically used to capture raw data for real-time use. It’s also a smaller repository, and the types of data collected there are more specific.

A data lake is used to store data for later. You can throw whatever kind of data in any format into data lakes, and then, at some point in the future, you can extract value from that data.

A data warehouse is even bigger than a database or a data lake, but is used to hold historical data, not real-time data. The best way to use a data warehouse is for connecting data across channels.

How will data warehousing continue to evolve?

The future of data warehousing is cloud-based, which has a lot of benefits. There is a lot less risk with keeping data in the cloud than there is with keeping it locally, not to mention all of the legal issues and regulatory requirements that arise with on-site data storage.

The cost is also significantly less to keep data in the cloud, freeing up huge chunks of budget that would be dedicated to on-site servers. Even smaller businesses will benefit from cloud-based data storage.

Another future aspect of data warehousing will be combining it with analytics in the cloud. Very large companies are already employing huge analytical sets that work with data warehouses.

A step further than this would be artificial intelligence components that are being built into data warehouses to help you use machine learning for business decision-making.

People also view