A Data Warehouse or
Enterprise Data Warehouse (DW, DWH, or EDW) is a database used for reporting
and data analysis. It is a central repository of data which is created by
integrating data from one or more disparate sources. Data warehouses store
current as well as historical data and are used for creating trending reports
for senior management reporting such as annual and quarterly comparisons.
The data stored
in the warehouse
are uploaded from
the operational systems.
The data may pass through an operational data store for additional
operations before they are used in the DW for reporting.
The typical ETL-based
data warehouse uses staging, data integration, and access layers to house its
key functions. The staging layer or staging database stores raw data extracted
from each of the disparate source data systems. The integration layer
integrates the disparate data sets by transforming the data from the staging
layer often storing this transformed data in an operational data store (ODS)
database. The integrated data are then moved to yet another database, often
called the data warehouse database, where the data is arranged into hierarchical
groups often called dimensions and into facts and aggregate facts. The
combination of facts and dimensions is sometimes called a star schema. The
access layer helps users retrieve data.
Benefits
of a data warehouse:
- A data warehouse maintains a copy of information from the source transaction systems.
- Gather data from multiple sources into a single database so a single query engine can be used to present data.
- Mitigate the problem of database isolation level lock contention in transaction processing systems caused by attempts to run large, long running, analysis queries in transaction processing databases.
- Maintain data history, even if the source transaction systems do not.
- Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger.
- Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data.
- Present the organization's information consistently.
- Provide a single common data model for all data of interest regardless of the data's source.
- Restructure the data so that it makes sense to the business users.
- Restructure the data so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems.
- Add value to operational business applications, notably customer relationship management (CRM) systems.