Sunday 17 November 2013

Data Warehousing concepts


Data warehousing  Architecture


Data Warehouse Architecture

 Data Warehouse server – almost always a relational DBMS,rarely flat files

 OLAP servers – to support and operate on multi-dimensional data structures

 Clients – Query and reporting tools – Analysis tools – Data mining tools

Difference between OLTP V/s Data warehouse???













What is Data warehousing???
Definition:- 
A data warehouse is collection of data marts.

Represents historical data. DW is a relational Database which is specially designed for analysis Purpose rather then for transactional purpose.

-subject-oriented,

-integrated,

-time-variant,

-nonvolatile

collection of data in support of management’s decision making process.

Subject-oriented

 Data warehouse is organized around subjects such as sales, product, customer.

Integration

 Data Warehouse is constructed by integrating multiple heterogeneous sources.

 Data Preprocessing are applied to ensure consistency.

Time-variant

 Provides information from historical perspective

Eg. past 5-10 years.

 Which supports business users in analyzing the business and comparing the business with different time periods.

Ex:- Yearly, Half yearly, Quarterly, Monthly,weekly, Daily.

Nonvolatile

 Data once recorded cannot be updated.

 Data warehouse requires two operations in data
accessing– Initial loading of data – Access of data

Data mart

 Data mart is a subset of data warehouse which concentrates on a specific business

Ex:- Sales dept, Purchase dept, HR dept

Why a Data Warehouse?

 For analysis and decision support, end users require access to data captured and stored in an organization’s operational or production systems

 This data is stored in multiple formats, on multiple platforms, in multiple data
structures, with multiple names, and probably created using different business rules

Dimension Table

 Contains descriptive or textual.

 Relatively static(Occasionally data will be changed – SCD’s I, II, III).

 Hierarchical data.

 Joined to another tables – Fact table(Are joined to a fact table through foreign key reference).

Fact table

 Fact table contain measures and the key of the dimension table.

 Can hold large volume of data

 Are join to dimension table through foreign keys that reference primary keys in the
dimension tables.

Star Schema

 A Star schema is defined as a logical database design in which there will be a centrally located fact table which is surrounded by at least one or more dimension tables.

SnowFlake Schema

 Variant of star schema model.

 A single,large and central fact table and one or more tables for each dimension.

 Dimension tables are normalized i.e. split dimension table data into additional tables

Fact Constellation

 Multiple fact tables share dimension tables.

 This schema is viewed as collection of stars hence called galaxy schema or fact constellation.

 Sophisticated application requires such schema.

Refresh???
          Propagate updates on source data to the warehouse
          Issues:
        when to refresh

        how to refresh -- refresh techniques

When to Refresh???
          periodically (e.g., every night, every week) or after significant events
          on every update: not warranted unless warehouse data require  current data (up to the minute stock quotes)
          refresh policy set by administrator based on user needs and traffic

          possibly different policies for different sources 


Data Warehousing Tools

- ETL Tools
- Informatica 
- Data stage 
- Ab initio 
- SAS 
- MSBI

- Reporting tools 

- Cognos
- BO(Business objects)
- Micro strategy 

- MSBI

3 comments:

  1. Thank you for sharing wonderful information with us to get some idea about that content.
    Microstrategy Online Course
    Microstrategy Certification

    ReplyDelete
  2. I truly appreciate the time and work you put into sharing your knowledge. I found this topic to be quite effective and beneficial to me. Thank you very much for sharing. Continue to blog.

    Data Engineering Services 

    AI & ML Solutions

    Data Analytics Services

    Data Modernization Services

    ReplyDelete