Professional Documents
Culture Documents
In simple terms, level of granularity defines the extent of detail. As an example, let us look at
geographical level of granularity. We may analyze data at the levels of COUNTRY, REGION,
TERRITORY, CITY and STREET. In this case, we say the highest level of granularity is STREET.
c
Fact Table contains the measurements or metrics or facts of business process. If your business process
is "Sales" , then a measurement of this business process such as "monthly sales number" is captured
in the Fact table. Fact table also contains the foriegn keys for the dimension tables.
c c
Data Warehouse is a repository of integrated information, available for queries and analysis. Data and
information are extracted from heterogeneous sources as they are generated«.This makes it much
easier and more efficient to run queries over data that originally came from different sources. Typical
relational databases are designed for on-line transactional processing (OLTP) and do not meet the
requirements for effective on-line analytical processing (OLAP). As a result, data warehouses are
designed differently than traditional relational databases.
c
Data modeling is probably the most labor intensive and time consuming part of the development
process. Why bother especially if you are pressed for time? A common
c
On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other
types of clustered/non-clustered, unique/non-unique indexes.
To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports bitmaps.
Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the
fact table, but not the others. For example:
Current_Balance and Profit_Margin are the facts. Current_Balance is a semi-additive fact, as it makes
sense to add them up for all accounts (what's the total current balance for all accounts in the bank?),
but it does not make sense to add them up through ti me (adding up all current balances for a given
account for each day of the month does not give us any useful information
A factless fact table captures the many-to-many relationships between
dimensions, but contains no numeric or textual facts. They are of ten used to record events or
coverage information. Common examples of factless fact tables include:
- Identifying product promotion events (to determine promoted products that didn?t sell)
- Tracking student attendance or registration events
- Tracking insurance-related accident events
- Identifying building, facility, and equipment schedules for a hospital or university
%
&
Yes it is correct to develop a Data Mart using an ODS.becoz ODS which is used to ?store transaction
data and few Days (less historical data) this is what datamart is required so it is coct to develop
datamart using ODS .
c
c
$
Fact table typically has two types of columns: those that contain numeric facts (often called
measurements), and those that are foreign keys to dimension tables.
A fact table contains either detail-level facts or facts that have been aggregated. Fact tables that
contain aggregated facts are often called summary tables. A fact table usually contains facts with the
same level of aggregation.
Though most facts are additive, they can also be semi-additive or non-additive. Additive facts can be
aggregated by simple arithmetical addition. A common example of this is sales. Non-additive facts
cannot be added at all.
An example of this is averages. Semi-additive facts can be aggregated along some of the dimensions
and not along others. An example of this is inventory levels, where you cannot tell what a level means
simply by looking at it.
Conventional Load:
Before loading the data, all the Table constraints will be checked against the data.
Direct load:(Faster Loading)
All the Constraints will be disabled. Data will be loaded directly.Later the data will be checked against
the table constraints and the bad data won't be indexed.
c
A cube can be stored on a single analysis server and then defined as a linked cube on other Analysis
servers. End users connected to any of these analysis servers can then access the cube. This
arrangement avoids the more costly alternative of storing and maintaining copies of a cube on multiple
analysis servers. linked cubes can be connected using TCP/IP or HTTP. To end users a linked cube
looks like a regular cube.
(c
Star Schema means
A centralized fact table and sarounded by diffrent dimensions
Snowflake means
In the same star schema dimensions split into another dimensions
Star Schema contains Highly Denormalized Data
Snow flake? contains Partially normalized
Star can not have parent table
But snow flake contain parent tables
Why need to go there Star:
Here 1)less joiners contains
2)simply database
3)support drilling up options
Why nedd to go Snowflake schema:
Here sometimes we used to provide?seperate dimensions from existing dimensions that time we will go
to snowflake
Dis Advantage Of snowflake:
Query performance is very low because more joiners is there
Basically the fact table consists of the Index keys of the dimension/ook up tables and the measures.
so whenever we have the keys in a table .that itself implies that the table is in the normal form.
c &
ODS stands for Online Data Storage.
c
The basic purpose of the scheduling tool in a DW Application is to stream li ne the flow of data from
Source To Target at specific time or based on some condition.
-
Surrogate Key is an artificial identifier for an entity. In surrogate key values are generated by the
system sequentially(Like Identity property in SQL Server and Sequence in Oracle). They do not
describe anything. Primary Key is a natural identifier for an entity. In Primary keys all the values are
entered manually by the user which are uniquely identified. There will be no repetition of data.
.*
If a column is made a primary key and later there needs a change in the data type or the length for
that column then all the foreign keys that are dependent on that primary key should be changed
making the database Unstable . Surrogate Keys make the database more stable because it insulates
the Primary and foreign key relationships from changes in the data types and length.
c
c
They are dimension tables in a star schema data mart that adhere to a common structure, and
therefore allow queries to be executed across star schemas. For example, the Calendar dimension is
commonly needed in most data marts. By making this Calendar dimension adhere to a single structure,
regardless of what data mart it is used in your organization, you can query by date/time from one data
mart to another to another.
c
A conformed dimension is a single, coherent view of the same piece of data throughout the
organization. The same dimension is used in all sub sequent star schemas defined. This enables
reporting across the complete data warehouse in a simple format.
Degenerated Dimension is a dimension key without corresponding dimension. Example:
In the PointOfSale Transaction Fact table, we have:
Date Key (FK), Product Key (FK), Store Key (FK), Promotion Key?(FP),?and POS Transaction Number??
Date Dimension corresponds to Date Key, Production Dimension correspon ds to Production Key. In a
traditional parent-child database, POS Transactional Number would be?the key to the transaction
header record that contains all the info valid for the transaction as a whole, such as the transaction
date and store?identifier.?But in this?dimensional model, we have already extracted this info into other
dimension. Therefore, POS Transaction Number?looks like a dimension key in the fact table but does
not have the corresponding dimension table.
Therefore, POS Transaction Number is a degenerated dimension.
Normally Surrogate keys are sequencers which keep on increasing with new records being injected into
the table. The standard datatype is integer