Concept of Data warehouse Database
A single, complete and consistent storage of data which are obtained from a variety of different sources that are made available to end users in what they can understand and use in a business context is called Data warehouse. A data warehouse can be taken as subject-oriented, integrated, time-variant and non-volatile, a collection of data in support of management's decision-making process. Data is arranged and optimized in such a way that it provides the answer to questions from diverse functional areas. The data is organized and summarized by topic such as Sales / Marketing / Finance / Distribution etc. It focuses on modeling and analysis of data for decision makers. It excludes data which are not useful in decision support process. Data Warehouse is constructed by integrating multiple heterogeneous sources. Data pre-processing are applied to ensure consistency. The data warehouse is a centralized and consolidated database where the integrated data derives from the entire organization: Multiple Sources, Diverse Sources, Diverse Formats. The Data Warehouse represents the flow of time through time. It can contain projected data from statistical models. The data that is uploaded periodically from the time-dependent data is recomputed. It provides information from the historical perspective as that of past 5-10 years. Every key structure contains an element of time either implicitly or explicitly. Once the data is entered it is never removed. It represents the company's entire history. As time grows, the near term history is continually added to it. It must support terabyte databases and multiprocessors. It acts as the read-only database for data analysis and query processing. Data warehouse requires two operations in data accessing and they are the Initial loading of data and Access of data. Phases of data mining are data preparation, data analysis and classification, knowledge acquisition and prognosis. Industries have a large amount of operational data. The knowledge worker wants to turn this data into a useful information. This information is used to support strategic decision making. It is a platform for consolidated historical data for analysis. It stores data of good quality so that the knowledge-worker can make correct decisions. From a business perspective, it is the latest marketing weapon. It helps to keep customers by learning more about their needs. It is accepted as a valuable tool in the current competitive fast evolving world. Application area of the data warehouse is OLAP, DSS, and data mining. Data warehouse architecture is composed of data warehouse server, the online analytical processing server, and clients.
Summary
A single, complete and consistent storage of data which are obtained from a variety of different sources that are made available to end users in what they can understand and use in a business context is called Data warehouse. A data warehouse can be taken as subject-oriented, integrated, time-variant and non-volatile, a collection of data in support of management's decision-making process. Data is arranged and optimized in such a way that it provides the answer to questions from diverse functional areas. The data is organized and summarized by topic such as Sales / Marketing / Finance / Distribution etc. It focuses on modeling and analysis of data for decision makers. It excludes data which are not useful in decision support process. Data Warehouse is constructed by integrating multiple heterogeneous sources. Data pre-processing are applied to ensure consistency. The data warehouse is a centralized and consolidated database where the integrated data derives from the entire organization: Multiple Sources, Diverse Sources, Diverse Formats. The Data Warehouse represents the flow of time through time. It can contain projected data from statistical models. The data that is uploaded periodically from the time-dependent data is recomputed. It provides information from the historical perspective as that of past 5-10 years. Every key structure contains an element of time either implicitly or explicitly. Once the data is entered it is never removed. It represents the company's entire history. As time grows, the near term history is continually added to it. It must support terabyte databases and multiprocessors. It acts as the read-only database for data analysis and query processing. Data warehouse requires two operations in data accessing and they are the Initial loading of data and Access of data. Phases of data mining are data preparation, data analysis and classification, knowledge acquisition and prognosis. Industries have a large amount of operational data. The knowledge worker wants to turn this data into a useful information. This information is used to support strategic decision making. It is a platform for consolidated historical data for analysis. It stores data of good quality so that the knowledge-worker can make correct decisions. From a business perspective, it is the latest marketing weapon. It helps to keep customers by learning more about their needs. It is accepted as a valuable tool in the current competitive fast evolving world. Application area of the data warehouse is OLAP, DSS, and data mining. Data warehouse architecture is composed of data warehouse server, the online analytical processing server, and clients.
Things to Remember
- A single, complete and consistent storage of data which are obtained from a variety of different sources that are made available to end users in what they can understand and use in a business context is called Data warehouse.
- A data warehouse can be taken as subject-oriented, integrated, time-variant and non-volatile, the collection of data in support of management's decision-making process.
- Data is arranged and optimized in such a way that it provides the answer to questions from diverse functional areas. The data is organized and summarized by topic such as Sales / Marketing / Finance / Distribution etc.
- It focuses on modeling and analysis of data for decision makers. It excludes data which are not useful in decision support process.
- Data Warehouse is constructed by integrating multiple heterogeneous sources. Data pre-processing are applied to ensure consistency.
- The data warehouse is a centralized and consolidated database where the integrated data derives from the entire organization: Multiple Sources, Diverse Sources, Diverse Formats.
- The Data Warehouse represents the flow of time through time. It can contain projected data from statistical models.
- The data that is uploaded periodically from the time-dependent data is recomputed. It provides information from the historical perspective as that of past 5-10 years.
- Every key structure contains an element of time either implicitly or explicitly. Once the data is entered it is never removed. It represents the company's entire history.
- As time grows, the near term history is continually added to it. It must support terabyte databases and multiprocessors. It acts as the read-only database for data analysis and query processing.
- Data warehouse requires two operations in data accessing and they are the Initial loading of data and Access of data.
- Phases of data mining are data preparation, data analysis and classification, knowledge acquisition and prognosis. Industries have a large amount of operational data.
- The knowledge worker wants to turn this data into a useful information. This information is used to support strategic decision making. It is a platform for consolidated historical data for analysis.
- It stores data of good quality so that the knowledge-worker can make correct decisions.
- From a business perspective, it is the latest marketing weapon. It helps to keep customers by learning more about their needs. It is accepted as a valuable tool in the current competitive fast evolving world.
- Application area of the data warehouse is OLAP, DSS, and data mining.
- Data warehouse architecture is composed of data warehouse server, the online analytical processing server, and clients.
MCQs
No MCQs found.
Subjective Questions
No subjective questions found.
Videos
No videos found.

Concept of Data warehouse Database
Concept of Data Warehouse Database
A single, complete and consistent storage of data which are obtained from a variety of different sources that are made available to end users in what they can understand and use in a business context is called Data warehouse. A data warehouse can be taken as subject-oriented, integrated, time-variant and non-volatile, a collection of data in support of management's decision-making process.
Subject-oriented:
Data is arranged and optimized in such a way that it provides the answer to questions from diverse functional areas. The data is organized and summarized by topic such as Sales / Marketing / Finance / Distribution etc. It focuses on modeling and analysis of data for decision makers. It excludes data which are not useful in decision support process.
Integrated:
Data Warehouse is constructed by integrating multiple heterogeneous sources. Data pre-processing are applied to ensure consistency. The data warehouse is a centralized and consolidated database where the integrated data derives from the entire organization:
- Multiple Sources
- Diverse Sources
- Diverse Formats.
Time-variant:
The Data Warehouse represents the flow of time through time. It can contain projected data from statistical models. The data that is uploaded periodically from the time-dependent data is recomputed. It provides information from the historical perspective as that of past 5-10 years. Every key structure contains an element of time either implicitly or explicitly.
Non-volatile:
Once the data is entered it is never removed. It represents the company's entire history. As time grows, the near term history is continually added to it. It must support terabyte databases and multiprocessors. It acts as the read-only database for data analysis and query processing. Data warehouse requires two operations in data accessing and they are:
- Initial loading of data
- Access to data.
Rules of a Data Warehouse:
- Data Warehouse and Operational Environments are separated
- Data is integrated
- It contains historical data over a long period of time
- Data is a snapshot where it is captured at a given point in time
- Data is subject-oriented.
- Mainly read-only with periodic batch updates.
- Development Life Cycle has a data-driven approach versus the traditional process-driven approach.
- Data contains several levels of detail such as Current, Old, Lightly Summarized, Highly Summarized.
- The environment is characterized by Read-only transactions to very large data sets.
- It is a system that traces data sources, transformations, and storage
- Metadata is a critical component. For example: Source, transformation, integration, storage, relationships, history etc.
- It contains a chargeback mechanism for resource usage that enforces the optimal use of data by the end users.
Phases of Data Mining:
- Data Preparation: This phase is responsible for identifying the main data sets which are to be used by the data mining operation (usually the data warehouse).
- Data Analysis and Classification: In this phase, the data is studied to identify the common data characteristics or patterns such as data groupings, classifications, clusters, sequences, data dependencies, links or relationships, data patterns, trends, and deviation.
- Knowledge Acquisition: This phase uses the result of the Data Analysis and Classification phase. The data mining tool selects the appropriate modeling or the knowledge-acquisition algorithms which are listed below:
Neural Networks.
Decision Trees.
Rules Induction.
Genetic algorithms.
Memory-Based Reasoning. - Prognosis: This phase predicts the future behavior and forecasts the business outcomes. For example: 65% of customers who did not use a particular credit card in the past 6 months are 88% likely to cancel the account.
Need for Data Warehousing:
- Industries have a large amount of operational data. The knowledge worker wants to turn this data into a useful information. This information is used to support strategic decision making.
- It is a platform for consolidated historical data for analysis.
- It stores data of good quality so that the knowledge-worker can make correct decisions.
- From a business perspective, it is the latest marketing weapon. It helps to keep customers by learning more about their needs. It is accepted as a valuable tool in the current competitive fast evolving world.
Application Area of Data Warehouse
- OLAP (Online Analytical Processing) is a term which is used to describe the analysis of complex data from the data warehouse.
- DSS (Decision Support System) which is also known as EIS (Executive Information Systems) supports organization's leading decision makers for making complex and important decisions.
- Data Mining is used for knowledge discovery which is the process of searching data for unanticipated new knowledge.
Characteristics of a Data Warehouse:
- Subject Oriented - organized based on use
- Integrated - inconsistencies removed
- Non-volatile - stored in read-only format
- Time variant - data are normally time series.
- Summarized - in decision-usable format
- Large volume - data sets are quite large.
- Non-normalized - often redundant.
- Metadata - data about data are stored
- Data sources - comes from non-integrated sources
Data Warehouse Architecture
The architecture of data warehouse is composed of the following:
- Data Warehouse Server: It is almost always a relational DBMS and rarely flat files.
- Online Analytical Processing (OLAP) servers: This server is to support and operate on multidimensional data structures.
- Clients use to query and reporting tools, analysis tools, and data mining tools.
References:
- H.F.Korth and A. Silberschatz,"Database system concepts",McGraw Hill,2010
- A.K.Majumdar and p, Bhatt acharya,"Database Management Systems",Tata McGraw Hill,India,2004
- F.Korth, Henry. Database System Concepts. 6th edition.
Lesson
Advanced database Concepts
Subject
Computer Engineering
Grade
Engineering
Recent Notes
No recent notes.
Related Notes
No related notes.