Sunday, 17 November 2013

Data Warehousing concepts


Data warehousing  Architecture


Data Warehouse Architecture

 Data Warehouse server – almost always a relational DBMS,rarely flat files

 OLAP servers – to support and operate on multi-dimensional data structures

 Clients – Query and reporting tools – Analysis tools – Data mining tools

Difference between OLTP V/s Data warehouse???













What is Data warehousing???
Definition:- 
A data warehouse is collection of data marts.

Represents historical data. DW is a relational Database which is specially designed for analysis Purpose rather then for transactional purpose.

-subject-oriented,

-integrated,

-time-variant,

-nonvolatile

collection of data in support of management’s decision making process.

Subject-oriented

 Data warehouse is organized around subjects such as sales, product, customer.

Integration

 Data Warehouse is constructed by integrating multiple heterogeneous sources.

 Data Preprocessing are applied to ensure consistency.

Time-variant

 Provides information from historical perspective

Eg. past 5-10 years.

 Which supports business users in analyzing the business and comparing the business with different time periods.

Ex:- Yearly, Half yearly, Quarterly, Monthly,weekly, Daily.

Nonvolatile

 Data once recorded cannot be updated.

 Data warehouse requires two operations in data
accessing– Initial loading of data – Access of data

Data mart

 Data mart is a subset of data warehouse which concentrates on a specific business

Ex:- Sales dept, Purchase dept, HR dept

Why a Data Warehouse?

 For analysis and decision support, end users require access to data captured and stored in an organization’s operational or production systems

 This data is stored in multiple formats, on multiple platforms, in multiple data
structures, with multiple names, and probably created using different business rules

Dimension Table

 Contains descriptive or textual.

 Relatively static(Occasionally data will be changed – SCD’s I, II, III).

 Hierarchical data.

 Joined to another tables – Fact table(Are joined to a fact table through foreign key reference).

Fact table

 Fact table contain measures and the key of the dimension table.

 Can hold large volume of data

 Are join to dimension table through foreign keys that reference primary keys in the
dimension tables.

Star Schema

 A Star schema is defined as a logical database design in which there will be a centrally located fact table which is surrounded by at least one or more dimension tables.

SnowFlake Schema

 Variant of star schema model.

 A single,large and central fact table and one or more tables for each dimension.

 Dimension tables are normalized i.e. split dimension table data into additional tables

Fact Constellation

 Multiple fact tables share dimension tables.

 This schema is viewed as collection of stars hence called galaxy schema or fact constellation.

 Sophisticated application requires such schema.

Refresh???
          Propagate updates on source data to the warehouse
          Issues:
        when to refresh

        how to refresh -- refresh techniques

When to Refresh???
          periodically (e.g., every night, every week) or after significant events
          on every update: not warranted unless warehouse data require  current data (up to the minute stock quotes)
          refresh policy set by administrator based on user needs and traffic

          possibly different policies for different sources 


Data Warehousing Tools

- ETL Tools
- Informatica 
- Data stage 
- Ab initio 
- SAS 
- MSBI

- Reporting tools 

- Cognos
- BO(Business objects)
- Micro strategy 

- MSBI

Thursday, 3 May 2012

Etl testing interview questions

1.What is Data warehousing?
2.Difference b/w Data warehouse and Datamart?
3.What is surrogate key?
4.Difference b/w surrogate key and primary key?
4.Difference b/w OLTP v/s OLAP?
5.what is Dimension table?
6.What is Fact table?
7.what is Defect life cycle?
8.How to Export tescases in Defect traking tool(QC ALM)?
9.what is STLC and SDLC?
10.what are the basic secnarios for etl testing?
11.How to validate Flat file and Dimension Table?
12.How to validate different schemas data?
13)ETL Testing Approach you have used in your previous projects?
14)Materialized View and basic difference between Materialized View and Normal View?
14)Star Schema and Snow Flake Schema?
15)SCD and SCD Types?
16)Fact Table and its types ( Ex - Factless Fact Table)?
17)Partitioning in Oracle and its types?
18)Dimensions and Types of dimensions ( example -Conform dimensions , Role play dimensions)?
19)SQL query performance?
20)Rank in oracle?
21)Overview of Mercury Quality Center( if you have used it in your previous projects)?
22)Difference between materialised view and View?
23)Difference between View and Trigger?
24)Testing experiance as a manual tester?
25)Worked on any automated testing tool?
26)What is a Synoym?
27)What is a Fact Less Fact table?
28)What is a Bridge table?
29)Testing experiance of previous projects?
30)What is a serrogate Key?
31)Type 1 , Type2 and Type3 SCD?
32)About the PL/SQL work done in the previous project?
33)About defect raising tools?


-------------------------------------------------------------
------------------------------------------------------------------------------------

1) Difference between Simple view and Materialized view.

View does not store the data.

Explain types of view

Have explain the scenarios where we can update /insert /delete the data through the view

MV is snapshot / data replication of a table.

It stores the data. (physical existence) Explain the types of M view Need to refresh the view using one of the 3 tech fast, complete or force.

2) Difference between stored procedure and function.

Funtion must return a value. SP can or can return the value with the help of OUT parameter.

Function can call in select stmt but sp can't.

Function are complied at run time but SP are pre complied.

3) SCD type 2 and SCD type 3

SCD type2 maintain history . SCD type maintain the current and prev status of the record.

4) Factless Fact Table.
Table that without measurement. I have explain with example.

5) Blackbox testing and white box testing

Testing software without any knowledge of the inner workings, structure or language of the module being tested.

Testing in which the tester has knowledge of the inner workings, structure and language of the software, or at least its purpose.

6) How and when to log defect in QC.

When we identified data mismatch between source minus target tests script Data not available to test the scenarios Once find the bug then do the root cause analysis. Based on analysis raise the defect and assigned to the developer.

Then retest the defect and close after getting the expected result.

7) What will be the approach for writing test script.

Go through the doc SSD And FSD

Create the test case scenarios

Based on scenarios write the script

Review the script

Once the data get available execute the script .................

8) What is the difference between primary key and surrogate key??

1. Both keys contain unique value for a record in a table.
2. Primary keys are used in OLTP whereas surrogate keys are used in OLAP
schemas.
3. Primary keys hold some business meaning whereas surrogate does not hold
any business meaning.
4. Primary key may contain numeric as well as non-numeric values whereas
surrogate keys contain only (simple)numeric values.

Why do we use surrogate keys in OLAP rather then  using primary keys?

There are two main reasons for this:- 
1. Surrogate keys are simple numeric values, as simple as normal counting.
So most of the time they save storage space.
2. As surrogate keys are simple and short, it speed-up the join performance.
3. Best thing is that same pattern of surrogate keys can be used across all
the tables present in a star/schema.