What is Data Warehouse?

Data warehouse is an emerging data repository architecture used to store a large amount of data, which can also be termed as the historical data, collected from a numerous sources for the purpose of analysis and development to further make an effective decision accordingly.

Features of Data Warehouse:
  • It is separate from an operational database.
  • Stores huge amount of  data, more historical then current data.
  • Queries are generally complex.
  • Does not require data to be highly accurate.
  • Integrates data from heterogenous systems.
  • Goal is to execute statistical queries and provide results which can influence decision making. 

Properties of Data Warehouse:
A data warehouse is a subject - oriented, integrated, time variant, and non - volatile collection of data in support of management decision making process.


Subject - Oriented:
Store data according to target specific subjects. 

Integrated:
Data may be distributed across heterogonous sources which have to be integrated.

Time Variant:
Data are stored to provide information from an historic prospective.
Example - Results of students in the last four years.

Non- Volatile:
It is separate from the Enterprise operational Database and hence is not subject to frequent modification.  

Advantages of Data Warehouse:
  1. Allows Business users to quickly access critical data from some sources all in one place.
  2. Provides consistent information on various cross - functional activities.
  3. Helps to integrate many sources of data to reduce stress on the production system.
  4. Helps to reduce total turnaround time for analysis and reporting.
  5. Allows users to access critical data from the number of sources in a single place, saving user's time of retrieving data from multiple sources.
  6. Data warehouses store a large amount of historical data, this helps users to analyze different time periods and trends to make future predictions.
Disadvantages of Data Warehouse:
  1. Not an ideal option for unstructured data.
  2. Creation and implementation of Data Warehouse is surely a time consuming process.
  3. Data Warehouse can be outdated relatively quickly.
  4. Difficult to make changes in data types and ranges, data source schema, indexes and queries.
  5. The Data Warehouse may seem easy but actually it is too complex for the average users.
  6. Despite best efforts at project management, Data Warehousing project scope will always increase.
  7. Organizations need to spend lots of their resources for training and implementation purpose.

Data Warehouse is also called Online Analytical Processing Systems(OLAP).
An OLAP system manages large amount of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity.

Granularity is the extent to which a system is broken down into smaller parts, either the system itself or its description or observation.

Some typical OLAP operations:
Roll-up (drill -up):
 - summarize the data.
 - by climbing up hierarchy or by dimension reduction.

Drill down (roll down):
 - reverse of roll - up.
 - from higher level summary to lower level summary or detailed data, or introducing new dimensions.

Slice and Dice:
 - Project and select.

Slice:
 - Slice operation performs a selection on one dimension of the given cube, resulting in a sub - cube.

Dice:
 - Dice operation defines a sub - cube by performing a selection on two or more dimensions.

Pivot (rotate):
 - reorient the cube, visualization.

Drill through:
 - through the bottom level of the cube to its backend relational tables.

Data Warehouse Usage:
  • Data warehouses and data marts are used in a wide range of applications.
  • Business executives use the data in data warehouses and data marts to perform data analysis and make strategic decisions.
  • Data warehouses are used extensively in banking and financial services, consumer goods and retail distribution sectors.

Data Warehouse Applications:

There are three kinds of data warehouse applications:
1. Information Processing
2. Analytical Processing
3. Data mining

Information Processing:
Supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphs.

Analytical Processing:
 - It generally operates on historic data in both summarized and detailed forms.
 - Supports basic OLAP operations, slice and dice, drilling, pivoting.
 - The major strength of OALP over information processing is the multidimensional data analysis of data warehouse data.
 
Data Mining:
 - Knowledge discovery from hidden patterns.
 - Supports associations, constructing analytical models, performing classification and prediction.
 - Presenting the mining results using visualizations tools.




Comments

Popular Posts