Professional Documents
Culture Documents
2.6 click 'Next' button and select the 'Value' property of the child package variable.
How to Implement?
Designed SSIS package like:
How to Implement?
5. The data in the Flat File as follows:
"132","Ramakrishna"," " ,"Hyderabad"
"132","Radhika","17","Vangara"
How to remove double quotes from the file to process the data.
In the Flat File Connection Manager Editor, Enter double quotes in Text Qualifier text box:
FailPackageOnFailure: property needs to be set to True for enabling the task in the
checkpoint.
Checkpoint mechanism uses a Text File to mark the point of package failure.
These checkpoint files are automatically created at a given location upon the package
failure and automatically deleted once the package ends up with success.
10. How to execute SSIS Package from Stored Procedure.
using xp_cmdshell command
11. How to enable Xp_CmdShell in Sql Server?
We can enable through either T-Sql or SQL Server Surface Area Configuration.
-- To allow advanced options to be changed.
EXEC sp_configure 'show advanced options', 1
GO-- To update the currently configured value for advanced options.RECONFIGURE
GO
-- To enable the feature.
EXEC sp_configure 'xp_cmdshell', 1
GO
-- To update the currently configured value for this feature.
RECONFIGURE
GO
Drag and Drop 'Execute Sql Task'. Open the Execute Sql Task Editor and in Parameter
Mapping' section, select the system variables as follows:
Create a table in Sql Server Database with Columns as: PackageID, PackageName,
TaskID, TaskName, ErrorCode, ErrorDescription.
For Example:
I have comma-separated value in a flat file with two columns (code,
Name). Code is an integer value and name is a varchar(20) data type
configured in the flat file connection manager. Some of the codes in the
flat files are characters. So, flat file reader component will fail reading the
character value. But, I want to redirect the error data to separate table.
Foreach ADO:
The ADO Enumerator enumerates rows in a table. For example, we can get the rows in
the ADO records.The variable must be of Object data type.
Foreach ADO.NET Schema Rowset:
The ADO.Net Enumerator enumerates the schema information. For example, we can get
the table from the database.
Foreach File:
The File Enumerator enumerates files in a folder. For example, we can get all the files
which have the *.txt extension in a windows folder and its sub folders.
Foreach From Variable:
The Variable Enumerator enumerates objects that specified variables contain. Here
enumerator objects are nothing but an array or data table.
Foreach Item:
The Item Enumerator enumerates the collections. For example, we can enumerate the
names of executables and working directories that an Execute Process task uses.
Foreach Nodelist:
The Node List Enumerator enumerates the result of an XPath expression.
Foreach SMO:
The SMO Enumerator enumerates SQL Server Management Objects (SMO). For example,
we can get the list of functions or views in a SQL Server database.
Container
Container Description Purpose of SSIS Container
Type
To repeat tasks for each element in a
Foreach This container runs a Control collection, for example retrieve files from
Loop Flow repeatedly using an a folder, running T-SQL statements that
Container enumerator. reside in multiple files, or running a
command for multiple objects.
For Loop This container runs a Control To repeat tasks until a specified
Container Flow repeatedly by checking expression evaluates to false. For
conditional expression (same as example, a package can send a different
For Loop in programming e-mail message seven times, one time
language). for every day of the week.
This container group tasks and containers
Groups tasks as well as that must succeed or fail as a unit. For
Sequence containers into Control Flows example, a package can group tasks that
Container that are subsets of the package delete and add rows in a database table,
Control Flow. and then commit or roll back all the tasks
when one fails.
Success Workflow will proceed when the preceding container executes successfully.
Indicated in control flow by a solid green line.
Failure Workflow will proceed when the preceding containers execution results in a
failure. Indicated in control flow by a solid red line.
Completion Workflow will proceed when the preceding containers execution completes,
regardless of success or failure. Indicated in control flow by a solid blue line.
Expression/Constraint with Logical AND Workflow will proceed when specified
expression and constraints evaluate to true. Indicated in control flow by a solid color line
along with a small fx icon next to it. Color of line depends on logical constraint chosen
(e.g. success=green, completion=blue).
Expression/Constraint with Logical OR Workflow will proceed when either the specified
expression or the logical constraint (success/failure/completion) evaluates to true.
Indicated in control flow by a dotted color line along with a small fx icon next to it.
Color of line depends on logical constraint chosen (e.g. success=green,
completion=blue).
Keep Identity By default this setting is unchecked which means the destination table (if
it has an identity column) will create identity values on its own. If you check this setting,
the dataflow engine will ensure that the source identity values are preserved and same
value is inserted into the destination table.
Keep Nulls Again by default this setting is unchecked which means default value will be
inserted (if the default constraint is defined on the target column) during insert into the
destination table if NULL value is coming from the source for that particular column. If
you check this option then default constraint on the destination table's column will be
ignored and preserved NULL of the source column will be inserted into the destination.
Table Lock By default this setting is checked and the recommendation is to let it be
checked unless the same table is being used by some other process at same time. It
specifies a table lock will be acquired on the destination table instead of acquiring
multiple row level locks, which could turn into lock escalation problems.
Check Constraints Again by default this setting is checked and recommendation is to
un-check it if you are sure that the incoming data is not going to violate constraints of
the destination table. This setting specifies that the dataflow pipeline engine will validate
the incoming data against the constraints of target table. If you un-check this option it
will improve the performance of the data load.
#5 - Effect of Rows Per Batch and Maximum Insert Commit Size Settings:
Rows per batch:
The default value for this setting is -1 which specifies all incoming rows will be treated as
a single batch. You can change this default behavior and break all incoming rows into
multiple batches. The allowed value is only positive integer which specifies the maximum
number of rows in a batch.
Maximum insert commit size:
The default value for this setting is '2147483647' (largest value for 4 byte integer type)
which specifies all incoming rows will be committed once on successful completion. You
can specify a positive value for this setting to indicate that commit will be done for those
number of records. Changing the default value for this setting will put overhead on the
dataflow engine to commit several times. Yes that is true, but at the same time it will
release the pressure on the transaction log and tempdb to grow specifically during high
volume data transfers.
The above two settings are very important to understand to improve the performance of
tempdb and the transaction log. For example if you leave 'Max insert commit size' to its
default, the transaction log and tempdb will keep on growing during the extraction
process and if you are transferring a high volume of data the tempdb will soon run out of
memory as a result of this your extraction will fail. So it is recommended to set these
values to an optimum value based on your environment.
The number of buffer created is dependent on how many rows fit into a buffer and how
many rows fit into a buffer dependent on few other factors. The first consideration is the
estimated row size, which is the sum of the maximum sizes of all the columns from the
incoming records. The second consideration is the DefaultBufferMaxSize property of the
data flow task. This property specifies the default maximum size of a buffer. The default
value is 10 MB and its upper and lower boundaries are constrained by two internal
properties of SSIS which are MaxBufferSize (100MB) and MinBufferSize (64 KB). It
means the size of a buffer can be as small as 64 KB and as large as 100 MB. The third
factor is, DefaultBufferMaxRows which is again a property of data flow task which
specifies the default number of rows in a buffer. Its default value is 10000.
If the size exceeds the DefaultBufferMaxSize then it reduces the rows in the buffer. For
better buffer performance you can do two things.
First you can remove unwanted columns from the source and set data type in each
column appropriately, especially if your source is flat file. This will enable you to
accommodate as many rows as possible in the buffer.
Second, if your system has sufficient memory available, you can tune these properties to
have a small number of large buffers, which could improve performance. Beware if you
change the values of these properties to a point where page spooling (see Best Practices
#8) begins, it adversely impacts performance. So before you set a value for these
properties, first thoroughly testing in your environment and set the values appropriately.
Let's consider a scenario where the first component of the package creates an object i.e.
a temporary table, which is being referenced by the second component of the package.
During package validation, the first component has not yet executed, so no object has
been created causing a package validation failure when validating the second
component. SSIS will throw a validation exception and will not start the package
execution. So how will you get this package running in this common scenario?
SSIS provide a set of performance counters. Among them, the following few are helpful
when you tune or debug your package:
Buffers in use
Flat buffers in use
Private buffers in use
Buffers spooled
Rows read
Rows written
Buffers in use, Flat buffers in use and Private buffers in use are useful to discover
leaks. During package execution time, we will see these counters fluctuating. But once
the package finishes execution, their values should return to the same value as what
they were before the execution. Otherwise, buffers are leaked.
Buffers spooled has an initial value of 0. When it goes above 0, it indicates that the
engine has started memory swapping. In a case like this, set Data Flow Task properties
BLOBTempStoragePath and BufferTempStoragePath appropriately for maximal I/O
bandwidth.
Buffers Spooled: The number of buffers currently written to the disk. If the data flow
engine runs low on physical memory, buffers not currently used are written to disk and
then reloaded when needed.
Rows read and Rows written show how many rows the entire Data Flow has
processed.
12. FastParse property
Fast Parse option in SSIS can be used for very fast loading of flat file data. It will speed
up parsing of integer, date and time types if the conversion does not have to be locale-
sensitive. This option is set on a per-column basis using the Advanced Editor for the flat
file source.
13. Checkpoint features helps in package restarting
1. A data flow consists of the sources and destinations that extract and load data, the
transformations that modify and extend data, and the paths that link sources,
transformations, and destinations. The Data Flow task is the executable within the SSIS
package that creates, orders, and runs the data flow. Data Sources, Transformations,
and Data Destinations are the three important categories in the Data Flow.
2. Data flows move data, but there are also tasks in the control flow, as such, their
success or Failure effects how your control flow operates
3. Data is moved and manipulated through transformations.
4. Data is passed between each component in the data flow.
DTEXECUI provides a graphical user interface that can be used to specify the various
options to be set when executing an SSIS package. You can launch DTEXECUI by
double-clicking on an SSIS package file (.dtsx). You can also launch DTEXECUI from a
Command Prompt then specify the package to execute.
2. Using the DTEXEC.EXE command line utility one can execute an SSIS package
that is stored in a File System, SQL Server or an SSIS Package Store. The syntax to
execute a SSIS package which is stored in a File System is shown below.
DTEXEC.EXE /F "C:\BulkInsert\BulkInsertTask.dtsx"
3. Test the SSIS package execution by running the package from BIDS:
-In Solution Explorer, right click the SSIS project folder that contains the package which
you want to run and then click properties.
- In the SSIS Property Pages dialog box, select Build option under the Configuration
Properties node and in the right side panel, provide the folder location where you want
the SSIS package to be deployed within the OutputPath. Click OK to save the changes in
the property page.
-Right click the package within Solution Explorer and select Execute Package option from
the drop down menu
Navigate to SQL Server Agent then Proxies in SSMS Object Explorer and right click to
create a new proxy
files.
41. What is the use of Percentage Sampling transformation in SSIS?
Percentage Sampling transformation is generally used for data mining. This
transformation builds a random sample of set of output rows by choosing specified
percentage of input rows. For example if the input has 1000 rows and if I specify 10
as percentage sample then the transformation returns 10% of the RANDOM records from
the input data.
42. What is the use of Term Extraction transformation in SSIS?
Term Extraction transformation is used to extract nouns or noun phrases or both noun
and noun phrases only from English text. It extracts terms from text in a transformation
input column and then writes the terms to a transformation output column. It can be
also used to find out the content of a dataset.
43. What is Data Viewer and what are the different types of Data Viewers in
SSIS?
A Data Viewer allows viewing data at a point of time at runtime.
The different types of data viewers are:
1. Grid
2. Histogram
3. Scatter Plot
4. Column Chart
Merge Transformations combines two sorted data sets of same column structure into a
single output.The rows from each dataset are inserted into the output based on values in
their key columns.
The Merge transformation is similar to the Union All transformations. Use the Union All
transformation instead of the Merge transformation in the following situations:
Multicast Transformation generates exact copies of the source data, it means each
recipient will have same number of records as the source whereas the Conditional Split
Transformation divides the source data based on the defined conditions and if no rows
match with this defined conditions those rows are put on default output.
Bulk Insert Task is used to copy the large volumn of data from text file to sql server
destination.
46. Explain Audit Transformation ?
It allows you to add auditing information. Auditing options that you can add to
transformed data through this transformation are :
1. Execution of Instance GUID : ID of execution instance of the package
2. PackageID : ID of the package
3. PackageName : Name of the Package
4. VersionID : GUID version of the package
5. Execution StartTime
6. MachineName
7. UserName
8. TaskName
9. TaskID : unique identifier type of the data flow task that contains audit
transformation.
47. what are the possible locations to save SSIS package?
1.File System: We can save the package on a physical location on hard drive or any
shared folder with this option, and we should provide a full qualified path to stored
package in the FileSystem option.
2. Sql Server: SSIS packages will be stored in the MSDB database, in the
sysssispackages table.
SSIS Package Store is nothing but combination of SQL Server and File System
deployment, as you can see when you connect to SSIS through SSMS: it looks like a
store which has categorized its contents (packages) into different categories based on its
managers (which is you, as the package developer) taste. So, dont get it wrong as
something different from the 2 types of package deployment.
48. How to provide security to packages?
We can provide security to packages in 2 ways
1. Package encryption
2. Password protection
1. DonotSaveSensitive: any sensitive information is simply not written out to
the package XML file when you save the package.
2. EncryptSensitiveWithUserKey: encrypts sensitive information based on the
credentials of the user who created the package. It is the default value for
the ProtectionLevel property.
3. EncryptSensitiveWithPassword: requires to specify a password in the
package, and this password will be used to encrypt and decrypt the sensitive
information in the package.
4. EncryptAllWithPassword: allows to encrypt the entire contents of the SSIS
package with your specified password.
5. EncryptAllWithUserKey: allows to encrypt the entire contents of the SSIS
package by using the user key.
6. Server Storage: allows the package to retain all sensitive information
when you are saving the package to SQL Server. SSIS packages are saved to
MSDB database of SQL Server.
You can change the Protection Level of deployed packages by using the
DTUTIL utility.
49. How to track a variable in ssis?
OnVariableValueChanged: This event gets raised when value of the variable is changed.
1.Set the "EvaluateasExpression" property of the variable as True.
2.Set the "RaiseChangedEvent" property of the variable as True.
3.Create an event handler for the "OnVariableValueChanged" event for the container
in which the variable is scoped.
The FTP connection manager supports only anonymous authentication and basic
authentication. It does not support Windows Authentication.
Predefined FTP Operations:
Send Files, Receive File,
Create Local directory, Remove Local Directory,
Create Remote Directory, Remove Remote Directory
Delete Local Files, Delete Remote File
Customer Log Entries available on FTP Task:
FTPConnectingToServer
FTPOperation
3. Flat File Connection Manager Changes - -The Flat File connection manager now
supports parsing files with embedded qualifiers. The connection manager also by default
always checks for row delimiters to enable the correct parsing of files with rows that are
missing column fields. The Flat File Source now supports a varying number of columns,
and embedded qualifiers.
REPLACENULL: You can use this function to replace NULL values in the first argument
with the expression specified in the second argument. This is equivalent to ISNULL in T-
SQL: REPLACENULL(expression, expression)
TOKEN: This function allows you to return a substring by using delimiters to separate a
string into tokens and then specifying which occurrence to
return: TOKEN(character_expression, delimiter_string, occurrence)
TOKENCOUNT: This function uses delimiters to separate a string into tokens and then
returns the count of tokens found within the string: TOKENCOUNT(character_expression,
delimiter_string)
6. Easy Column Remapping in Data Flow (Mapping Data Flow Columns) -When modifying
a data flow, column remapping is sometimes needed -SSIS 2012 maps columns on name
instead of id -It also has an improved remapping dialog
7. Shared Connection Managers: To create connection managers at the project level that
can shared by multiple packages in the project. The connection manager you create at
the project level is automatically visible in the Connection Managers tab of the SSIS
Designer window for all packages. -When converting shared connection managers back
to regular (package) connection managers, they disappear in all other packages.
8. Scripting Enhancements: Now Script task and Script Component support for 4.0. -
Breakpoints are supported in Script Component
9. ODBC Source and Destination - -ODBC was not natively supported in 2008 -SSIS
2012 has ODBC source & destination -SSIS 2008 could access ODBC via ADO.NET
10. Reduced Memory Usage by the Merge and Merge Join Transformations The old
SSIS Merge and Merge Join transformations, although helpful, used a lot of system
resources and could be a memory hog. In 2012 these tasks are much more robust and
reliable. Most importantly, they will not consume excessive memory when the multiple
inputs produce data at uneven rates.
11. Undo/Redo: One thing that annoys users in SSIS before 2012 is lack of support of
Undo and Redo. Once you performed an operation, you cant undo that. Now in SSIS
2012, we can see the support of undo/redo.
Control The Script task is configured on the Control Flow tab of The Script component is configured on the
Flow/Date the designer and runs outside the data flow of the Data Flow page of the designer and
Flow package. represents a source, transformation, or
destination in the Data Flow task.
Purpose A Script task can accomplish almost any general-purpose You must specify whether you want to create
task. a source, transformation, or destination with
the Script component.
Raising The Script task uses both the TaskResult property and The Script component runs as a part of the
Results the optional ExecutionValue property of the Dts object to Data Flow task and does not report results
notify the runtime of its results. using either of these properties.
Raising The Script task uses the Events property of the Dts The Script component raises errors, warnings,
Events object to raise events. For example: and informational messages by using the
Dts.Events.FireError(0, "Event Snippet", ex.Message & methods of the IDTSComponentMetaData100
ControlChars.CrLf & ex.StackTrace interface returned by the
ComponentMetaData property. For example:
Dim myMetadata as
IDTSComponentMetaData100
myMetaData = Me.ComponentMetaData
myMetaData.FireError(...)
Execution A Script task runs custom code at some point in the A Script component also runs once, but
package workflow. Unless you put it in a loop container typically it runs its main processing routine
or an event handler, it only runs once. once for each row of data in the data flow.
Editor The Script Task Editor has three pages: General, Script, The Script Transformation Editor has up to
and Expressions. Only the ReadOnlyVariables and four pages: Input Columns, Inputs and
ReadWriteVariables, and ScriptLanguage properties Outputs, Script, and Connection Managers.
directly affect the code that you can write. The metadata and properties that you
configure on each of these pages determines
the members of the base classes that are
autogenerated for your use in coding.
Interaction In the code written for a Script task, you use the Dts In Script component code, you use typed
with the property to access other features of the package. The accessor properties to access certain package
Package Dts property is a member of the ScriptMain class. features such as variables and connection
managers. The PreExecute method can access
only read-only variables. The PostExecute
method can access both read-only and
read/write variables.
Using The Script task uses the Variables property of the Dts The Script component uses typed accessor
Variables object to access variables that are available through the properties of the autogenerated based class,
tasks ReadOnlyVariables and ReadWriteVariables created from the components
properties. For example: string myVar; ReadOnlyVariables and ReadWriteVariables
myVar = properties. For example:
Dts.Variables["MyStringVariable"].Value.ToString(); string myVar; myVar =
this.Variables.MyStringVariable;
Using The Script task uses the Connections property of the Dts The Script component uses typed accessor
Connections object to access connection managers defined in the properties of the autogenerated base class,
package. For example: string created from the list of connection managers
myFlatFileConnection; myFlatFileConnection = entered by the user on the Connection
(Dts.Connections["Test Flat File Managers page of the editor. For example:
Connection"].AcquireConnection(Dts.Transaction) as IDTSConnectionManager100
String); connMgr;connMgr =
this.Connections.MyADONETConnection;
3. The Bulk Insert task uses the T-SQL BULK INSERT statement for speed when loading
large amounts of data.
58.which services are installed during Sql Server installation
SSIS
SSAS
SSRS
SQL Server (MSSQLSERVER)
SQL Server Agent Service
SQL Server Browser
SQL Full-Text
Offline: In this mode, the source database is detached from the source server after
putting it in single user mode, copies of the mdf, ndf and ldf files are moved to specified
network location. On the destination server the copies are taken from the network
location to the destination server and then finally both databases are attached on the
source and destination servers. This mode is faster, but a disadvantage with mode is that
the source database will not available during copy and move operation. Also, the person
executing the package with this mode must be sysadmin on both source and destination
instances.
Online: The task uses SMO to transfer the database objects to the destination server. In
this mode, the database is online during the copy and move operation, but it will take
longer as it has to copy each object from the database individually. Someone executing
the package with this mode must
be either sysadmin or database owner of the specified databases.
A script component can be used for the designated task. The steps are as follows:
1. Drag and drop the Script Document to the Data flow and select the Script Component
Type as Transformation.
2. Double click the Script Component.
3. Select the column which is to pass through the script component, in the Input
Columns tab.
4. Add a column with an integer data type, in the Inputs and Outputs tab.
65. Breakpoint in SSIS?
A breakpoint allows you to pause the execution of the package in BIDS during
development or when troubleshooting an SSIS Package. You can right click on the task in
control flow, click on 'Edit Breakpoint' menu and from the Set Breakpoint window, you
specify when you want execution to be halted/paused. For example OnPreExecute,
OnPostExecute, OnError events, etc. To toggle a breakpoint, delete all breakpoints and
disable all breakpoints go to the Debug menu and click on the respective menu item. You
can even specify different conditions to hit the breakpoint as well.
66. What is the DisableEventHandlers property used for?
SSIS packages, tasks and containers have a property called DisableEventHandlers. If
you set this property to TRUE for task or container, then all event handlers will be
disabled for that task or container. If you set this property value to FALSE then the
event handlers will once again be executed.
}
69. How to pass property value at Run time?
A property value like connection string for a Connection Manager can be passed to the
package using package configurations.
70. How to skip first 5 lines in each Input flat file?
In the Flat file connection manager editor, Set the 'Header rows to skip' property.
71. Parallel processing in SSIS
To support parallel execution of different tasks in the package, SSIS uses 2 properties:
1.MaxConcurrentExecutables: defines how many tasks can run simultaneously, by
specifying the maximum number of SSIS threads that can execute in parallel per
package. The default is -1, which equates to number of physical or logical processor + 2.
2. EngineThreads: is property of each DataFlow task. This property defines how many
threads the data flow engine can create and run in parallel. The EngineThreads property
applies equally to both the source threads that the data flow engine creates for sources
and the worker threads that the engine creates for transformations and destinations.
Therefore, setting EngineThreads to 10 means that the engine can create up to ten
source threads and up to ten worker threads.
72. How do we convert data type in SSIS?
The Data Conversion Transformation in SSIS converts the data type of an input column
to a different data type.
SSRS Interview Questions and
Answers
1. How do u implement Cascading parameter?
The list of values for one parameter depends on the value chosen in preceding
parameter.
Eg: Country --> State --> City
In this case, you want to pass variables dynamically, using an available value from the
source dataset. You can think of it like this:
http://servername/reportserver?%2fpathto
%2freport&rs:Command=Render&ProductCode=Fields!ProductCode.Value
The exact syntax in the "Jump to URL" (Fx) expression window will be:
="javascript:void(window.open('http://servername/reportserver?%2fpathto
%2freport&rs:Command=Render&ProductCode="+Fields!ProductCode.Value+"'))"
STEP2:
In the Pie Chart, select Series Properties and select the Fill option from left side.
Now write following expression in the Color expression:
=code.GetColor(Fields!Year.Value)
Now apply this function to the style property of an element on the report.
=code.StyleElement("TABLE_HEADER_TEXT")
If you want apply dynamic styles to report, then create tables in sql server and insert
style information into the tables.
Create a Dataset, specify the Stored Procedure.
example: =Fields!TABLE_HEADER_TEXT.Value
where TABLE_HEADER_TEXT is a value in the table.
Report Filter: This includes filtering after the source query has come back on a data
region (like the Tablix), or a data grouping. When you implement a filter within the
report, when the report is re-executed again with different parameter choices, the Report
Server uses cached data rather than returning to the database server.
Parameters are applied at the database level. The Data will be fetched based on
parameters at the database level using WHERE condition in the query.
1. The total time to generate a report (RDL) can be divided into 3 elements:
Time to retrieve the data (TimeDataRetrieval).
Time to process the report (TimeProcessing)
Time to render the report (TimeRendering)
Total time = (TimeDataRetrieval) + (TimeProcessing) + (TimeRendering)
These 3 performance components are logged every time for which a deployed report is
executed. This information can be found in the table ExecutionLogStorage in the
ReportServer database.
2. Use the SQL Profiler to see which queries are executed when the report is generated.
Sometimes you will see more queries being executed than you expected. Every dataset
in the report will be executed. A lot of times new datasets are added during building of
reports. Check if all datasets are still being used. For instance, datasets for available
parameter values. Remove all datasets which are not used anymore.
3. Sometimes a dataset contains more columns than used in the Tablix\list. Use only
required columns in the Dataset.
4. ORDER BY in the dataset differs from the ORDER BY in the Tablix\list. You need to
decide where the data will be sorted. It can be done within SQL Server with an ORDER
BY clause or in by the Reporting server engine. It is not useful to do it in both. If an
index is available use the ORDER BY in your dataset.
5. Use the SQL Profiler to measure the performance of all datasets (Reads, CPU and
Duration). Use the SQL Server Management Studio (SSMS) to analyze the execution plan
of every dataset.
6. Avoid dataset with result sets with a lot of records like more than 1000 records. A lot
of times data is GROUPED in the report without an Drill down option. In that scenario do
the group by already in your dataset. This will save a lot of data transfer to the SQL
Server and it will save the reporting server engine to group the result set.
7. Rendering of the report can take a while if the result set is very big. Look very critical
if such a big result set is necessary. If details are used in only 5 % of the situations,
create another report to display the details. This will avoid the retrieval of all details in
95 % of the situations.
12. I have 'State' column in report, display the States in bold, whose State
name starts with letter 'A' (eg: Andhra pradesh, Assam should be in bold)
Shared datasets use only shared data sources, not embedded data sources.
To create a shared dataset, you must use an application that creates a shared dataset
definition file (.rsd). You can use one of the following applications to create a shared
dataset:
1. Report Builder: Use shared dataset design mode and save the shared dataset to a
report server or SharePoint site.
2. Report Designer in BIDS: Create shared datasets under the Shared Dataset folder in
Solution Explorer. To publish a shared dataset, deploy it to a report server or SharePoint
site.
Upload a shared dataset definition (.rsd) file. You can upload a file to the report server or
SharePoint site. On a SharePoint site, an uploaded file is not validated against the
schema until the shared dataset is cached or used in a report.
The shared dataset definition includes a query, dataset parameters including default
values, data options such as case sensitivity, and dataset filters.
18. How do u display the partial text in bold format in textbox in Report? (eg:
FirstName LastName, where "FirstName" should in bold fornt and "LastName"
should be in normal font.)
Use PlaceHolder
To avoid extra blank pages during export, the size of the body should be less or equal to
the size of the report - margins.
Set the width of the body to 26.7 cm (29.7 -1.5 - 1.5)
Set the height of the body to 18 cm (21 - 1.5 -1.5)
The first time a user clicks the link for a report configured to cache, the report execution
process is similar to the on-demand process. The intermediate format is cached and
stored in ReportServerTempDB Database until the cache expiry time.
If a user request a different set of parameter values for a cached report, then the report
processor treats the requests as a new report executing on demand, but flags it as a
second cached instance.
Report snapshot contains the Query and Layout information retrieved at specific point of
time. It executes the query and produces the intermediate format. The intermediate
format of the report has no expiration time like a cached instance, and is stored in
ReportServer Database.
27. Subscription. Different types of Subscriptions?
Subscriptions are used to deliver the reports to either File Share or Email in response to
Report Level or Server Level Schedule.
There are 2 types of subscriptions:
1. Standard Subscription: Static properties are set for Report Delivery.
2. Data Driven Subscription: Dynamic Runtime properties are set for Subscriptions
Ad hoc reports:Ad Hoc reporting allows the end users to design and create reports on
their own provided the data models.
3 components: Report Builder, Report Model and Model Designer
Use 'Model Designer' tool to design 'Report Models' and then use 'Report Model' tool to
generate reports.
Report Builder
- Windows Winform application for End users to build ad-hoc reports with the help of
Report models.
32. Explain the Report Model Steps.
1. Create the report model project
select "Report Model Project" in the Templates list
A report model project contains the definition of the data source (.ds file), the definition
of a data source view (.dsv file), and the report model (.smdl file).
2. Define a data source for the report model
3. Define a data source view for the report model
A data source view is a logical data model based on one or more data sources.
SQL Reporting Services generates the report model from the data source view.
4. Define a report model
5. Publish a report model to report server.
The <Query> element of RDL contains query or command and is used by the Report
Server to connect to the datasources of the report.
The <Query> element is optional in RDLC file. This element is ignored by Report Viewer
control because Report Viewer control does not perform any data processing in Local
processing mode, but used data that the host application supplies.
You can provide control to the user by adding Interactive Sort buttons to toggle between
ascending and descending order for rows in a table or for rows and columns in a matrix.
The most common use of interactive sort is to add a sort button to every column header.
The user can then choose which column to sort by.
36. What is Report Builder
Windows Winform application for End users to build ad-hoc reports with the help of
Report models.
37. Difference between Table report and Matrix Report
A Table Report can have fixed number of columns and dynamic rows.
A Matrix Report has dynamic rows and dynamic columns.
38. When to use Table, Matrix and List
1. Use a Table to display detail data, organize the data in row groups, or both.
2. Use a matrix to display aggregated data summaries, grouped in rows and columns,
similar to a PivotTable or crosstab. The number of rows and columns for groups is
determined by the number of unique values for each row and column groups.
3. Use a list to create a free-form layout. You are not limited to a grid layout, but can
place fields freely inside the list. You can use a list to design a form for displaying many
dataset fields or as a container to display multiple data regions side by side for grouped
data. For example, you can define a group for a list; add a table, chart, and image; and
display values in table and graphic form for each group value
42.How to Combine Datasets in SSRS (1 Dataset gets data from Oracle and
other dataset from Sql Server)
Using LookUP function, we can combine 2 datasets in SSRS.
In the following example, assume that a table is bound to a dataset that includes a field
for the product identifier ProductID. A separate dataset called "Product" contains the
corresponding product identifier ID and the product name Name.
In the above expression, Lookup compares the value of ProductID to ID in each row of
the dataset called "Product" and, when a match is found, returns the value of the Name
field for that row.
The configuration settings of Report Manager and the Report Server Web service are
stored in a single configuration file (rsreportserver.config).
Report Manager is the web-based application included with Reporting Services that
handles all aspects of managing reports (deploying datasources and reports, caching a
report, subscriptions, snapshot).
44. Steps to repeat Table Headers in SSRS 2008?
1. Select the table
2. At the bottom of the screen, select a dropdown arrow beside column groups. Enable
"Advanced Mode" by clicking on it.
3. under Row Groups,select the static row and choose properties / press F4.
4. Set the following attributes for the static row or header row.
Set RepeatOnNewPage= True for repeating headers
Set KeepWithGroup= After
Set FixedData=True for keeping the headers visible.
45. How to add assemblies in SSRS
45. Report Extensions?
46. parent grouping, child grouping in SSRS
Open the data source dialog in report designer, and select the "Use Single
Transaction when processing the queries' check box. Once selected,
datasets that use the same data source are no longer executed in parallel.
They are also executed as a transaction, i.e. if any of the queries fails to
execute, the entire transaction is rolled back.
Snowflake: Normalized form of star schema is a snow flake schema. Dimension tables
can be further broken down into sub dimensions.
Dimension table will have one or more parent tables.
Hierarchies are broken into seperate tables in snow schema. These hierarchies helps to
drilldown the data from Top hierarchy to lowermost hierarchy.
Increases the number of joins and poor performance in retrival of data.
In MOLAP, the structure of aggregation along with the data values are stored in multi
dimensional format, takes more space with less time for data analysis compared to
ROLAP.
MOLAPoffers faster query response and processing times, but offers a high latency and
requires average amount of storage space. This storage mode leads to duplication of
data as the detail data is present in both the relational as well as the multidimensional
storage.
In HOLAP, stucture is stored in Relational model and data is stored in multi dimensional
model which provides optimal usage and space.
This storage mode offers optimal storage space, query response time, latency and fast
processing times.
3. Types of Dimensions
Dimension Description
type
Regular A dimension whose type has not been set to a special
dimension type.
Time A dimension whose attributes represent time periods, such
as years, semesters, quarters, months, and days.
Organization A dimension whose attributes represent organizational
information, such as employees or subsidiaries.
Geography A dimension whose attributes represent geographic
information, such as cities or postal codes.
BillOfMaterials A dimension whose attributes represent inventory or
manufacturing information, such as parts lists for products.
Accounts A dimension whose attributes represent a chart of accounts
for financial reporting purposes.
Customers A dimension whose attributes represent customer or
contact information.
Products A dimension whose attributes represent product
information.
Scenario A dimension whose attributes represent planning or
strategic analysis information.
Quantitative A dimension whose attributes represent quantitative
information.
Utility A dimension whose attributes represent miscellaneous
information.
Currency This type of dimension contains currency data and
metadata.
Rates A dimension whose attributes represent currency rate
information.
Channel A dimension whose attributes represent channel
information.
Promotion A dimension whose attributes represent marketing
promotion information.
4. Types of Measures
Fully Additive Facts: These are facts which can be added across all the associated
dimensions. For example, sales amount is a fact which can be summed across different
dimensions like customer, geography, date, product, and so on.
Semi-Additive Facts: These are facts which can be added across only few dimensions
rather than all dimensions. For example, bank balance is a fact which can be summed
across the customer dimension (i.e. the total balance of all the customers in a bank at
the end of a particular quarter). However, the same fact cannot be added across the
date dimension (i.e. the total balance at the end of quarter 1 is $X million and $Y million
at the end of quarter 2, so at the end of quarter 2, the total balance is only $Y million
and not $X+$Y).
Non-Additive Facts: These are facts which cannot be added across any of the dimensions
in the cube. For example, profit margin is a fact which cannot be added across any of the
dimensions. For example, if product P1 has a 10% profit and product P2 has a 10%
profit then your net profit is still 10% and not 20%. We cannot add profit margins
across product dimensions. Similarly, if your profit margin is 10% on Day1 and 10% on
Day2, then your net Profit Margin at the end of Day2 is still 10% and not 20%.
Derived Facts: Derived facts are the facts which are calculated from one or more base
facts, often by applying additional criteria. Often these are not stored in the cube and
are calculated on the fly at the time of accessing them. For example, profit margin.
Factless Facts: A factless fact table is one which only has references (Foreign Keys) to
the dimensions and it does not contain any measures. These types of fact tables are
often used to capture events (valid transactions without a net change in a measure
value). For example, a balance enquiry at an automated teller machine (ATM). Though
there is no change in the account balance, this transaction is still important for analysis
purposes.
Textual Facts: Textual facts refer to the textual data present in the fact table, which is
not measurable (non-additive), but is important for analysis purposes. For example,
codes (i.e. product codes), flags (i.e. status flag), etc.
5. Types of relationships between dimensions and measuregroups.
No relationship: The dimension and measure group are not related.
Regular: The dimension table is joined directly to the fact table.
Referenced: The dimension table is joined to an intermediate table, which in turn,is
joined to the fact table.
Many to many:The dimension table is to an intermediate fact table,the intermediate fact
table is joined, in turn, to an intermediate dimension table to which the fact table is
joined.
Data mining:The target dimension is based on a mining model built from the source
dimension. The source dimension must also be included in the cube.
Fact table: The dimension table is the fact table.
6. Proactive caching
Proactive caching can be configured to refresh the cache (MOLAP cache) either on a pre-
defined schedule or in response to an event (change in the data) from the underlying
relational database. Proactive caching settings also determine whether the data is
queried from the underlying relational database (ROLAP) or is read from the outdated
MOLAP cache, while the MOLAP cache is rebuilt.
Proactive caching helps in minimizing latency and achieve high performance.
It enables a cube to reflect the most recent data present in the underlying database by
automatically refreshing the cube based on the predefined settings.
Lazy aggregations:
When we reprocess SSAS cube then it actually bring new/changed relational data into
SSAS cube by reprocessing dimensions and measures. Partition indexes and
aggregations might be dropped due to changes in related dimensions data so
aggregations and partition indexes need to be reprocessed. It might take more time to
build aggregation and partition indexes.
If you want to bring cube online sooner without waiting rebuilding of partition indexes
and aggregations then lazy processing option can be chosen. Lazy processing option
bring SSAS cube online as soon as dimensions and measures get processed. Partition
indexes and aggregations are triggered later as a background job.
Advantage: Lazy processing saves processing time as it brings as soon as measures and
dimension data is ready.
Disadvantage: User will see performance hit when aggregation are getting build in
background.
7. Partition processing options
Process Default: SSAS dynamically chooses from one of the following process options.
Process Full: Drop all object stores, and rebuild the objects. This option is when a
structural change has been made to an object, for example, when an
attribute hierarchy is added, deleted, or renamed.
Process Update: Forces a re-read of data and an update of dimension attributes. Flexible
aggregations and indexes on related partitions will be dropped.
Process Add: For dimensions, adds new members and updates dimension attribute
captions and descriptions.
Process Data:Processes data only without building aggregations or indexes. If there is
data is in the partitions, it will be dropped before re-populating the
partition with source data.
Process Index: Creates or rebuilds indexes and aggregations for all processed partitions.
For unprocessed objects, this option generates an error.
Unprocess: Delete data from the object.
Process Structure: Drop the data and perform process default on all dimensions.
Process Clear: Drops the data in the object specified and any lower-level constituent
objects. After the data is dropped, it is not reloaded.
Process Clear Structure: Removes all training data from a mining structure.
When you build a cube, and you add dimensions to that cube, you create cube
dimensions: cube dimensions are instances of a database dimension inside a cube.
A database dimension can be used in multiple cubes, and multiple cube dimensions can
be based on a single database dimension
The Database dimension has only Name and ID properties, whereas a Cube dimension
has several more properties.
Database dimension is created one where as Cube dimension is referenced from
database dimension.
Database dimension exists only once.where as Cube dimensions can be created more
than one using ROLE PLAYING Dimensions concept.
11. Importance of CALCULATE keyword in MDX script, data pass and limiting
cube space
Select to store the attribute member in the intermediate dimension that links the
attribute in the reference dimension to the fact table in the MOLAP
structure. This imporvies the qery performance, but increases the processing time and
storage space.
If the option is not selected, only the relationship between the fact records and the
intermediate dimension is stored in the cube. This means that Anaylysis services has to
derive the aggregated values for the members of the referenced dimension when a query
is executed, resulting in slower query performance.
13. Partition processing and Aggregation Usage Wizard
Linked Dimensions can be used when the exact same dimension can be used across
multiple Cubes within an Organization like a Time Dimension, gography
Dimension etc.
Here are some of the highlights of a Linked Dimension:
-More than one Linked Dimension can be created from a Single Database Dimension.
-These can be used to implement the concept of Conformed Dimensions.
-For an end user, a Linked Dimension appears like any other Dimension.
Degenerate Dimensions are commonly used when the Fact Table contains/represents
Transactional data like Order Details, etc. and each Order has an Order Number
associated with it, which forms the unique value in the Degenerate Dimension.
One of the common scenarios is when a Fact Table contains a lot of Attributes which are
like indicators, flags, etc. Using Junk Dimensions, such Attributes can be
removed/cleaned up from a Fact Table.
SCD: The Slowly Changing Dimension (SCD) concept is basically about how the data
modifications are absorbed and maintained in a Dimension Table.
The new (modified) record and the old record(s) are identified using some kind of a flag
like say IsActive, IsDeleted etc. or using Start and End Date fields to indicate the validity
of the record.
17. Parent Child Hierarchy, NamingTemplate property,
MemberWithLeafLevelData property
18. How will you keep measure in cube without showing it to user?
Now, if you pass the value [Date].[Calendar Year].&[2002] to the P1, then it will run just
like:
where [Date].[Calendar Year].&[2002]
23. CASE (CASE, WHEN, THEN, ELSE, END) statement, IF THEN END IF, IS
keyword, HAVING clause
28. What do you understand by attribute relationship? what are the main
advantages in using attribute relationship?
An Attribute Relationship is a relationship between various attributes within a Dimension.
By default, every Attribute in a Dimension is related to the Key
Attribute.
30. What do you understand by rigid and flexible relationship? Which one is
better from performance perspective?
Rigid: Attribute Relationship should be set to Rigid when the relationship between those
attributes is not going to change over time. For example,
relationship between a Month and a Date is Rigid since a particular Date always belongs
to a particular Month like 1st Feb 2012 always belongs to Feb
Month of 2012. Try to set the relationship to Rigid wherever possible.
Flexible: Attribute Relationship should be set to Flexible when the relationship between
those attributes is going to change over time. For example, relationship between an
Employee and a Manager is Flexible since a particular Employee might work under one
manager during this year (time period) and under a different manager during next year
(another time period).
31. In which scenario, you would like to go for materializing dimension?
Reference dimensions let you create a relationship between a measure group and a
dimension using an intermediate dimension to act as a bridge between
them.
32. In dimension usage tab, how many types of joins are possible to form
relationship between measure group and dimension?
37. What do you understand by linked cube or linked object feature in SSAS?
38. How will you write back to dimension using excel or any other client tool?
39. What do you understand by dynamic named set (SSAS 2008)? How is i
different from static named set?
44. How will you implement data security for given scenario in analysis service
data?
"I have 4 cubes and 20 dimension. I need to give access to CEO, Operation
managers and Sales managers and employee.
1) CEO can see all the data of all 4 cubes.
2) Operation Managers can see only data related to their cube. There are four
operation managers.
3) Employees can see only certain dimension and measure groups data. (200
Employees) "
1.BIDS
In BIDS from the build menu select the build option (or right click on the project in the
solution explorer).
The build process will create four xml files in the bin subfolder of the project folder
.asdatabase - is the main object definition file
.configsettings
.deploymentoptions
.deploymenttargets
2. Deploy
Deployment via BIDS will overwrite the destination database management settings so
is not recommended for production deployment.
52. What are KPIs? How will you create KPIs in SSAS?
53. What are the main feature differences in SSAS 2005 and SSAS 2008 from
developer point of view?
MDX
1. Explain the structure of MDX query?
2. MDX functions?
MDX KPI Functions:
KPICurrentTimeMember, KPIGoal, KPIStatus, KPITrend
KPIValue, KPIWeight
MDX Metadata Functions:
Axis, Count (Dimension), Count (Hierarchy Levels), Count (Tuple)
Hierarchy, Level, Levels, Name,Ordinal, UniqueName
MDX Navigation Functions:
Ancestor, Ancestors, Ascendants, Children
Cousin, Current, CurrentMember, CurrentOrdinal
DataMember, DefaultMember, FirstChild, FirstSibling
IsAncestor, IsGeneration, IsLeaf, IsSibling
Lag, LastChild, LastSibling, Lead
LinkMember, LookupCube, NextMember, Parent
PrevMember, Properties, Siblings, UnknownMember
The different between the two is the scope. Using WITH specifies the scope of the named
set as the query, so as soon as the query finishes executing, that named set is gone.
Using CREATE, the scope of the query is limited to the MDX session as long as you don't
drop the set.
When defining your named set, you also have the option to specify when the named set
is evaluated using DYNAMIC or STATIC, as seen here:
A Dynamic Named Set respects the context of a query's subcube and the query's WHERE
clause and is evaluated at the time the query is executed.
A Static Named Set is evaluated at the time the cube is processed and will not respect
any subcube context and slicers in WHERE clause.
Example 1:
CREATE SET DateRange AS
[Date].[Calendar Year].&[2001] : [Date].[Calendar Year].&[2004]
SELECT [Measures].[Reseller Sales Amount] ON COLUMNS,
DateRange ON ROWS
FROM [Adventure Works]
Example 2:
WITH SET SouthEastUS AS
{[Geography].[State-Province].&[AL]&[US],
[Geography].[State-Province].&[FL]&[US],
[Geography].[State-Province].&[GA]&[US],
[Geography].[State-Province].&[SC]&[US]}
SELECT [Measures].[Reseller Sales Amount] ON COLUMNS,
SouthEastUS ON ROWS
FROM [Adventure Works]
5. How will you differentiate among level, member, attribute, hierarchy?
SELECT
NON EMPTY
{
[Measures].[Hits]
,[Measures].[Subscribers]
,[Measures].[Spam]
} ON COLUMNS
,{
[Geography].[Country].Children
} ON ROWS
FROM [Blog Statistics];
NONEMPTY():
The NonEmpty() returns the set of tuples that are not empty from a specified set, based
on the cross product of the specified set with a second set. Suppose we want to see all
the measures related to countries which have a non-null value for Subscribers
SELECT
{
[Measures].[Hits]
,[Measures].[Subscribers]
,[Measures].[Spam]
} ON COLUMNS
,{
NonEmpty
(
[Geography].[Country].Children
,[Measures].[Subscribers]
)
} ON ROWS
A respects the context of a query's subcube and the query's WHERE clause and is
evaluated at the time the query is executed.
A Static Named Set is evaluated at the time the cube is processed and will not respect
any subcube context and slicers in WHERE clause.
14. Difference between natural and unnatural hierarchy, attribute relationships
16. Write MDX for retrieving top 3 customers based on internet sales amount?
17. Write MDX to find current month's start and end date?
18. Write MDX to compare current month's revenue with last year same month
revenue?
19. Write MDX to find MTD(month to date), QTD(quarter to date) and YTD(year
to date) internet sales amount for top 5 products?
20. Write MDX to find count of regions for each country?
21. Write MDX to rank all the product category based on calendar year 2005
internet sales amount?
22. Write MDX to extract nth position tuple from specific set?
syntax:
Index syntax:
Set_Expression.Item(Index)
Set_Expression.Item(String_Expression1 [ ,String_Expression2,...n])
24. What are the performance consideration for improving MDX queries?
26. Which one is better from performance point of view...NON Empty keyword
or NONEMPTY function?
27. How will you find performance bottleneck in any given MDX?
How do you eliminate quotes from being uploaded from a flat file
to SQL Server?
This can be done using TEXT QUALIFIER property. In the SSIS package on the Flat File
Connection Manager Editor, enter quotes into the Text qualifier field then preview the data to
ensure the quotes are not included.
What are the different values you can set for CheckpointUsage
property ?
There are three values, which describe how a checkpoint file is used during package execution:
1) Never: The package will not use a checkpoint file and therefore will never restart.
2) If Exists: If a checkpoint file exists in the place you specified for the CheckpointFilename
property, then it will be used, and the package will restart according to the checkpoints written.
3) Always: The package will always use a checkpoint file to restart, and if one does not exist,
the package will fail.
What is Data Viewer and what are the different types of Data
Viewers in SSIS?
A Data Viewer allows viewing data at a point of time at runtime. If data viewer is placed before
and after the Aggregate transform, we can see data flowing to the transformation at the runtime
and how it looks like after the transformation occurred. The different types of data viewers are:
1. Grid
2. Histogram
3. Scatter Plot
4. Column Chart.
What is a package?
A discrete executable unit of work composed of a collection of control flow and other objects,
including data sources, transformations, process sequence, and rules, errors and event handling,
and data destinations.
What is a workflow in SSIS?
A workflow is a set of instructions on how to execute tasks.
(It is a set of instructions on how to execute tasks such as sessions, emails and shell commands. a
workflow is created form work flow mgr.
What is the diff between control flow Items and data flow Items?
The control flow is the highest level control process. It allows you to manage the run-time
process activities of data flow and other processes within a package.
When we want to extract, transform and load data within a package. You add an SSIS dataflow
task to the package control flow.
When you run a package from with in BIDS,it is built and temporarily deployed to the folder. By
default the package will be deployed to the BIN folder in the Packages Project folder and you
can configure for custom folder for deployment. When the Packages execution is completed and
stopped in BIDS,the deployed package will be deleted and this is called as Design Time
Deployment.
1. What is a package?
a).a discrete executable unit of work composed of a collection of control
flow and other objects, including data sources, transformations, process
sequence, and rules, errors and event handling, and data destinations.
1. Grouping Tasks so that you can disable a part of the package that no
longer needed.
2. Narrowing the scope of the variable to a container.
3. Managing the property of multiple Tasks in one step by setting the
properties of the container.
iii. For loop container: evaluates an expression and repeats Its workflow
until the expression evaluates to false.
iv. For each loop container: defines a control flow repeatedly by using
an enumerator.
For each loop container repeats the control flow for each member of a
specified enumerator.
7. Connection manager:
a).It is a bridge b/w package object and physical data. It provides logical
representation of a connection at design time the properties of the
connection mgr describes the physical connection that integration
services creates when the package is run.
a).DTExecUI
1. To open command prompt->run->type dtexecui->press enter
2. The execute package Utility dialog box opens.
3. in that click execute to run the package.
Wait until the package has executed successfully.
b).DTExec Utility
1.open the command prompt window.
2. Command prompt window->type dtexec /followed by the DTS, SQL, or
file option and the package path, including package name.
3. If the package encryption level is encrypting sensitive with password or
encrypt all with password, use the decrypt option to provide the password.
If no password is included, dtexec will prompt you for the password.
4. Optionally, provide additional command-line options
5. Press enter.
6. Optionally, view logging and reporting information before closing the
command prompt window.
The execute package Utility dialog box opens.
7. In the execute package Utility dialog box, click execute package.
Wait until the package has executed successfully.
v).using SQL server mgmt studio to execute package
1. In SSMS right click a package, and then click run package.
Execute package Utility opens.
2. Execute the package as described previously.
10. How can u handle the errors through the help of logging in
SSIS?
a) To create an on error event handler to which you add the log error
execute SQL Task.
11. What is a log file and how to send log file to mgr?
a) It is especially useful when the package has been deployed to the
production environment, and you cannot use BIDS and VSA to debug the
package.
SSIS enables you to implement logging code through the Dts. Log method.
When the Dts. Log method is called in the script, the SSIS engine will route
the message to the log providers that are configured in the containing
package.
15. as per error handling in T/R, which one handle the better
performance? Like fail component, redirect row or ignore failure?
a) Redirect row provides better performance for error handling.
17. Task??
a) An individual unit of work.
Types:-
Logging can be done based on event, in SSIS there are 12 events that can
be logged at Task or package level. You can enable partial logging for one
Task and enable much more detailed logging for billing Tasks.
Example:-
SQL profiler
Text files
SQL server
Window event log
Xml file
23. TRANSFORMATIONS??
It is an object that generates, modifies, or passes data.
1. AGGEGATE T/R:-It applies an aggregate function to grouped records and
produces new output records from aggregated results.
2. AUDIT T/R:-the t/r adds the value of a system variable, such as machine
name or execution instance GUID to a new output column.
3. CHARACTER MAP T/R:-this t/r makes string data changes such as
changing data from lower case to upper case.
4. CONDITIONAL SPLIT:-It separate input rows into separate output data
pipelines based on the Boolean expressions configured for each output.
5. COPY COLUMN:-add a copy of column to the t/r output we can later
transform the copy keeping the original for auditing personal
6.DATA CONVERSION:-converts a columns data type to another data type.
7. DATA MINING QUERY:-perform a data mining query against analysis
services.
8. DERIVED COLUMN:-create a new derive column calculated from
expression.
9. EXPORT COLUMN:-It allows you to export a column from the data flow to
a file.
10. FUZZY GROUPING:-perform data cleansing by finding rows that are
likely duplicates.
11. FUZZY LOOKUP:-matches and standardizes data based on fuzzy logic.
Ex:-transform the name jon to john
12.IMPORT COLUMN:-reads the data from a file & adds It into a dataflow.
13. LOOKUP:-perform the lookup of data to be used later in a transform.
Ex:-t/T to lookup a city based on zip code.
1. Getting a related value from a table using a key column value
2. Update slowly changing dimension table
3.to check whether records already exist in the table.
14. MERGE:-merges two sorted data sets into a single data set into a
single data flow.
15. MERGE JOIN:-merges two data sets into a single dataset using a join
junction.
16. MULTI CAST:-sends a copy of two data to an additional path in the
workflow.
17. ROW COUNT:-stores the rows count from the data flow into a variable.
18. ROW SAMPLING:-captures the sample of data from the dataflow by
using a row count of the total rows in dataflow.
19. ROW SAMPLING:-captures the sample of the data from the data flow
by using a row count of the total rows in data flow.
20. UNION ALL:-merge multiple data sets into a single dataset.
21. PIVOT:-converts rows into columns
22.UNPIVOT:-converts columns into rows
24. Batch?
a) A batch is defined as group of sessions. Those are 2 types.
1. Parallel batch processing
2. Sequential batch processing
Comment: Differences between 2005 and 2008 are not very big so 2005, 2008 or
2008 R2 experience usually is very similar. The big difference is with 2000 which
had DTS and it very different (SSIS is created from scratch)
Comment: This is common term in SSIS world which just means that you have
templates that are set up to perform routine tasks like logging, error handling
etc. Yes answer would usually indicate experienced person, no answer is still fine
if your project is not very mission critical.
Comment: SSIS is in most cases used for data warehouses so knowledge of Data
Warehouses Designs is very useful.
Comment: The thing is that most people who read good books have usually an
advantage over those who hasn't because they know what they know and they
know what they don't know (but they know it exists and is available).
Blog/Articles very in quality so best practise articles is a big plus+, conferences
can be also a plus.
Question: SSIS certifications?
Comment: This is rather disappointing point for me. Qualifications generally are
welcome but unfortunately many people simply cheat. Companies run courses
and then give questions and answers, or people find them on the internet. I've
met people who had certification but knew very little, I've met people very
experienced and knowledgeable without certification and people who have done
certification for their self-satisfaction and are experienced and knowledgeable. In
other words be careful with certification. It is easy to get a misleading
impression so make sure you ask the best questions for the position you can.
Question: How many difference source and destinations have you used?
It is very common to get all kinds of sources so the more the person worked with
the better for you. Common ones are SQL Server, CSV/TXT, Flat Files, Excel,
Access, Oracle, MySQL but also Salesforce, web data scrapping.
Comment: Some people use SSIS only to extract data and then go with stored
procedures only.they are usually missing the point of the power of SSIS. Which
allows to create "a flow" and on each step applies certain rules this greatly
simplifies the ETL process and simplicity is very good.
Comment: Fast Load option. This option is not set by default so most developers
know this answer as otherwise the load is very slow.
Question: Give example of handling data quality issues?
Comment: Data Quality is almost always a problem and SSIS handles it very well.
Examples include importing customers from different sources where customer
name can be duplicates. For instance you can have as company name: SQL
Server Business Intelligence but also SQL Server BI or SQL Server BI LTD or SQL
Server BI Limited or intelligence (with one l). There are different ways to handle
it. Robust and time consuming is to create a table with or possible scenarios and
update it after each update. You can also use fuzzy grouping which is usually
easy to implement and will make usually very good decisions but it is not 100%
accurate so this approach has to be justified. Other typical quality issues are
nulls (missing values), outliers (dates like 2999 or types like 50000 instead of
5000 especially important if someone is adjusting the value to get bigger bonus),
incorrect addresses
and these are either corrected during ETL, ignored, re-directed for further manual
updates or it fails the packages which for big processes is usually not practised.
Comment: This was one of the requested question in comment (at the bottom of
the page). This one is very important but also tricky. ALL SSIS developers have
SQL Server background and that is sometime not very good if they use SQL not
SSIS approach.
Let's start with when you typically use SPs. This is for preparing tables (truncate),
audit tasks (usually part of SSIS framework), getting configuration values for
loops and a few other general tasks.
During ETL extract you usually type simple SQL because it comes from other
sources and usually over complication is not a good choice (make it dynamic)
because any changes usually affect the package which has to be updated as
well.
During Transformation phase (business rules, cleaning, core work) you should
use Transformation tasks not Stored procedures! There are loads of tasks that
make the package much easier to develop but also a very important reason is
readability which is very important for other people who need to change the
package and obviously it reduces risks of making errors. Performance is usually
very good with SSIS as it is memory/flow based approach. So when to use Stored
Procedures for transformations? If you don't have strong SSIS developers or you
have performance reasons to do it. In some cases SPs can be much faster
(usually it only applies to very large datasets). Most important is have reasons
which approach is better for the situation.
Question: What is your approach for ETL with data warehouses (how
many packages you developer during typical load etc.)?
Comment: This is rather generic question. A typical approach (for me) when
building ETL is to. Have a package to extract data per source with extract specific
transformations (lookups, business rules, cleaning) and loads data into staging
table. Then a package do a simple merge from staging to data warehouse
(Stored Procedure) or a package that takes data from staging and performs extra
work before loading to data warehouse. I prefer the first one and due to this
approach I occasionally consider having extract stage (as well as stage phase)
which gives me more flexibility with transformation (per source) and makes it
simpler to follow (not everything in one go). So to summarize you usually have
package per source and one package per data warehouse table destination.
There are might be other approach valid as well so ask for reasons.
Comment: It is 3rd party free component used rather frequently to output errors
into XML field which saves development time.
Q: Can you name five of the Perfmon counters for SSIS and the value they
provide?
SQLServer:SSIS Service
SSIS Package Instances
SQLServer:SSIS Pipeline
BLOB bytes read
BLOB bytes written
BLOB files in use
Buffer memory
Buffers in use
Buffers spooled
Flat buffer memory
Flat buffers in use
Private buffer memory
Private buffers in use
Rows read
Rows written
If you picture a data flow as a river, and transformation buffer usage as a dam in that
river, here is the impact of your transformation on your data flow.
A Non Blocking transformation is a dam that just lets the water spill over the top.
Other than perhaps a bit of a slow down the water (your data) proceeds on its way with
very little delay
A Partially Blocking transformation is a dam that holds the water back until it
reaches a certain volume , and then releases that volume of water downstream and then
completely blocks the flow until that volume is achieved again. Your data in this case,
will stop, then start, then stop, then start over and over until all the data has moved
through the transformation. The downstream transformations end up starved for data
during certain periods, and then flooded with data during other periods. Clearly your
downstream transformations will not be able to work as efficiently when this happens,
and your entire package will slow down as a result.
A Blocking transformation is a dam that lets nothing through until the entire
volume of the river has flowed into the dam. Nothing is left to flow from upstream, and
nothing has been passed downstream. Then once the transformation is finished, it
releases all the data downstream. Clearly for a large dataset this can be extremely
memory intensive. Additionally, if all the transforms in your package are just waiting for
data, your package is going to run much more slowly.
Generally speaking if you can avoid Blocking and Partially Blocking transactions, your
package will simply perform better. If you think about it a bit, you will probably be able to
figure out which transformations fall into which category. Here is a quick list for your
reference:
Non Blocking
Audit
Character Map
Conditional Split
Copy Column
Data Conversion
Derived Column
Import Column
Lookup
Multicast
Percentage sampling
Row count
Row sampling
Script component
Partially Blocking
Data mining
Merge
Merge Join
Pivot/Unpivot
Term Extraction
Term Lookup
Union All
Blocking
Aggregate
Fuzzy Grouping
Fuzzy Lookup
Sort
Facts :
Sort is a fully blocking transformation.
A Merge transform requires a Sort, but a Union All does not, use a Union All when you
can.
The component has to acquire multiple buffers of data before it can perform its
processing. An example is the Sort transformation, where the component has to process
the complete set of rows in a single operation.
The component has to combine rows from multiple inputs. An example is the
Merge transformation, where the component has to examine multiple rows from each
input and then merge them in sorted order.
There is no one-to-one correspondence between input rows and output rows. An
example is the Aggregate transformation, where the component has to add a row to the
output to hold the computed aggregate values.
In Integration Services scripting and programming, you specify an asynchronous
transformation by assigning a value of 0 to the SynchronousInputID property of the
component's outputs. . This tells the data flow engine not to send each row automatically
to the outputs. Then you must write code to send each row explicitly to the appropriate
output by adding it to the new output buffer that is created for the output of an
asynchronous transformation.
Note
Since a source component must also explicitly add each row that it reads from
the data source to its output buffers, a source resembles a transformation with
asynchronous outputs.
You create a package deployment utility for an Integration Services project by first
configuring the build process to create a deployment utility, and then building the project.
When you build the project, all packages and package configurations in the project are
automatically included. To deploy additional files such as a Readme file with the project,
place the files in the Miscellaneous folder of the Integration Services project. When the
project is built, these files are also automatically included.
You can configure each project deployment differently. Before you build the project and
create the package deployment utility, you can set the properties on the deployment
utility to customize the way the packages in the project will be deployed. For example,
you can specify whether package configurations can be updated when the project is
deployed. To access the properties of an Integration Services project, right-click the
project and click Properties.
1. In SQL Server Data Tools (SSDT), open the solution that contains the Integration
Services project for which you want to create a package deployment utility.
2. Right-click the project and click Properties.
3. In the Property Pages dialog box, click Deployment Utility.
4. To update package configurations when packages are deployed, set
AllowConfigurationChanges to True.
5. Set CreateDeploymentUtility to True.
6. Optionally, update the location of the deployment utility by modifying the
DeploymentOutputPath property.
7. Click OK.
8. In Solution Explorer, right-click the project, and then click Build.
9. View the build progress and build errors in the Output window
After youve gone through these steps the next time you build your project it will create
the file (YourProjectName).SSISDeploymentManifest. This file is located in the same
folder as your packages in the bin\Deployment folder.
If you run this file it will open the Package Installation Wizard that will allow you to deploy
all your packages that were located in the project to a desired location.
The package protection levels that encrypt packages by using passwords require that
you provide a password also. If you change the protection level from a level that does
not use a password to one that does, you will be prompted for a password.
Also, for the protection levels that use a password, Integration Services uses the Triple
DES cipher algorithm with a key length of 192 bits, available in the .NET Framework
Class Library (FCL).
Protection Levels
Suppresses the values of sensitive properties in the package when the package is
saved. This protection level does not encrypt, but instead it prevents properties that are
marked sensitive from being saved with the package and therefore makes the sensitive
data unavailable to other users. If a different user opens the package, the sensitive
information is replaced with blanks and the user must provide the sensitive information.
When used with the dtutil utility (dtutil.exe), this protection level corresponds to the value
of 0.
When used with the dtutil utility, this protection level corresponds to the value of 4.
Note
For protection levels that use a user key, Integration Services uses DPAPI standards.
For more information about DPAPI, see the MSDN Library at
http://msdn.microsoft.com/library.
When used with the dtutil utility, this protection level corresponds to the value of 2.
When used with the dtutil utility, this protection level corresponds to the value of 1.
Note For protection levels that use a user key, Integration Services uses DPAPI
standards. For more information about DPAPI, see the MSDN Library at
http://msdn.microsoft.com/library.
What is a Transformation?
A transformation simply means bringing in the data in a desired format. For example you
are pulling data from the source and want to ensure only distinct records are written to
the destination, so duplicates are removed. Anther example is if you have
master/reference data and want to pull only related data from the source and hence you
need some sort of lookup. There are around 30 transformation tasks available and this
can be extended further with custom built tasks if needed.
What is a Task?
A task is very much like a method of any programming language which represents or
carries out an individual unit of work. There are broadly two categories of tasks in SSIS,
Control Flow tasks and Database Maintenance tasks. All Control Flow tasks are
operational in nature except Data Flow tasks. Although there are around 30 control flow
tasks which you can use in your package you can also develop your own custom tasks
with your choice of .NET programming language.
These are the types of precedence constraints and the condition could be either a
constraint, an expression or both
o Success (next task will be executed only when the last task completed successfully) or
o Failure (next task will be executed only when the last task failed) or
o Complete (next task will be executed no matter the last task was completed or failed).
A container is a logical grouping of tasks which allows you to manage the scope of the
tasks together.
o For Loop Container Used when you want to have repeating flow in package
o For Each Loop Container Used for enumerating each object in a collection; for
example a record set or a list of files.
Apart from the above mentioned containers, there is one more container called the
Task Host Container which is not visible from the IDE, but every task is contained in it
(the default container for all the tasks).
Variables can have a different scope depending on where it was defined. For example
you can have package level variables which are accessible to all the tasks in the
package and there could also be container level variables which are accessible only to
those tasks that are within the container
Similar to a source adaptor, the destination adapter indicates a destination in the Data
Flow to write data to. Again like the source adapter, the destination adapter also uses a
connection manager to connect to a target system and along with that you also specify
the target table and writing mode, i.e. write one row at a time or do a bulk insert as well
as several other properties.
Please note, the source and destination adapters can both use the same connection
manager if you are reading and writing to the same database.
o SSIS log provider for SQL Server, which writes the data to the msdb..sysdtslog90 or
msdb..sysssislog table depending on the SQL Server version.
Please note, enabling event logging is immensely helpful when you are troubleshooting a
package, but also incurs additional overhead on SSIS in order to log the events and
information. Hence you should only enabling event logging when needed and only
choose events which you want to log. Avoid logging all the events unnecessarily.
Supported The container/task does not create a separate transaction, but if the
parent object has already initiated a transaction then participate in it
Isolation level dictates how two more transaction maintains consistency and
concurrency when they are running in parallel
Design time validation is performed when you are opening your package in BIDS
whereas run time validation is performed when you are actually executing the package.
Define early validation (package level validation) versus late validation (component level
validation).
When a package is executed, the package goes through the validation process. All of
the components/tasks of package are validated before actually starting the package
execution. This is called early validation or package level validation. During execution of
a package, SSIS validates the component/task again before executing that particular
component/task. This is called late validation or component level validation.
o The data flow pipeline engine manages the flow of data from source to destination and
in-memory transformations
o The SSIS object model is used for programmatically creating, managing and
monitoring SSIS packages
How is SSIS runtime engine different from the SSIS dataflow pipeline
engine?
The SSIS Runtime Engine manages the workflow of the packages during runtime, which
means its role is to execute the tasks in a defined sequence. As you know, you can
define the sequence using precedence constraints. This engine is also responsible for
providing support for event logging, breakpoints in the BIDS designer, package
configuration, transactions and connections. The SSIS Runtime engine has been
designed to support concurrent/parallel execution of tasks in the package.
The Dataflow Pipeline Engine is responsible for executing the data flow tasks of the
package. It creates a dataflow pipeline by allocating in-memory structure for storing data
in-transit. This means, the engine pulls data from source, stores it in memory, executes
the required transformation in the data stored in memory and finally loads the data to the
destination. Like the SSIS runtime engine, the Dataflow pipeline has been designed to
do its work in parallel by creating multiple threads and enabling them to run multiple
execution trees/units in parallel.
o Partially Blocking Transformations do not block the output until a full read of the inputs
occur. However, they require new buffers/memory to be allocated to store the newly
created result-set because the output from these kind of transformations differs from the
input set. For example, Merge Join transformation joins two sorted inputs and produces
a merged output. In this case if you notice, the data flow pipeline engine creates two
input sets of memory, but the merged output from the transformation requires another
set of output buffers as structure of the output rows which are different from the input
rows. It means the memory requirement for this type of transformations is higher than
synchronous transformations where the transformation is completed in place.
o Full Blocking Transformations, apart from requiring an additional set of output buffers,
also blocks the output completely unless the whole input set is read. For example, the
Sort Transformation requires all input rows to be available before it can start sorting and
pass down the rows to the output path. These kind of transformations are most
expensive and should be used only as needed. For example, if you can get sorted data
from the source system, use that logic instead of using a Sort transformation to sort the
data in transit/memory.
What is an SSIS execution tree and how can I analyze the execution
trees of a data flow task?
The work to be done in the data flow task is divided into multiple chunks, which are
called execution units, by the dataflow pipeline engine. Each represents a group of
transformations. The individual execution unit is called an execution tree, which can be
executed by separate thread along with other execution trees in a parallel manner. The
memory structure is also called a data buffer, which gets created by the data flow
pipeline engine and has the scope of each individual execution tree. An execution tree
normally starts at either the source or an asynchronous transformation and ends at the
first asynchronous transformation or a destination. During execution of the execution
tree, the source reads the data, then stores the data to a buffer, executes the
transformation in the buffer and passes the buffer to the next execution tree in the path
by passing the pointers to the buffers.
To see how many execution trees are getting created and how many rows are getting
stored in each buffer for a individual data flow task, you can enable logging of these
events of data flow task: PipelineExecutionTrees, PipelineComponentTime,
PipelineInitialization, BufferSizeTunning, etc.
What is an SSIS Proxy account and why would you create it?
When we try to execute an SSIS package from a SQL Server Agent Job it fails with the
message Non-SysAdmins have been denied permission to run DTS Execution job steps
without a proxy account. This error message is generated if the account under which
SQL Server Agent Service is running and the job owner is not a sysadmin on the
instance or the job step is not set to run under a proxy account associated with the SSIS
subsystem.
How can you configure your SSIS package to run in 32-bit mode on
64-bit machine when using some data providers which are not
available on the 64-bit platform?
In order to run an SSIS package in 32-bit mode the SSIS project property
Run64BitRuntime needs to be set to False. The default configuration for this property is
True. This configuration is an instruction to load the 32-bit runtime environment rather
than 64-bit, and your packages will still run without any additional changes. The property
can be found under SSIS Project Property Pages -> Configuration Properties ->
Debugging.
b) Integration Services object model: includes managed API for accessing Integration Services
tools, command-line utilities, and custom applications.
c) Integration Services runtime and run-time executables: it saves the layout of packages,
runs packages, and provides support for logging, breakpoints, configuration, connections, and
transactions. The Integration Services run-time executables are the package, containers, tasks,
and event handlers that Integration Services includes, and custom tasks.
d) Data flow engine: provides the in-memory buffers that move data from source to destination.
Q4 How to pass property value at Run time? How do you implement Package
Configuration?
A property value like connection string for a Connection Manager can be passed to the pkg using
package configurations.Package Configuration provides different options like XML File, Environment
Variables, SQL Server Table, Registry Value or Parent package variable.
Q10 What are the points to keep in mind for performance improvement of the package?
http://technet.microsoft.com/en-us/library/cc966529.aspx
Q11 You may get a question stating a scenario and then asking you how would you
create a package for that e.g. How would you configure a data flow task so that it can
transfer data to different table based on the city name in a source table column?
b) Data has to be sorted before Merge Transformation whereas Union all doesn't have any
condition like that.
Q14 May get question regarding what X transformation do?Lookup, fuzzy lookup, fuzzy
grouping transformation are my favorites.
For you.
Q15 How would you restart package from previous failure point?What are Checkpoints
and how can we implement in SSIS?
When a package is configured to use checkpoints, information about package execution is written
to a checkpoint file. When the failed package is rerun, the checkpoint file is used to restart the
package from the point of failure. If the package runs successfully, the checkpoint file is deleted,
and then re-created the next time that the package is run.
Fastest way to do incremental load is by using Timestamp column in source table and then storing
last ETL timestamp, In ETL process pick all the rows having Timestamp greater than the stored
Timestamp so as to pick only new and updated records
Late arriving dimensions sometime get unavoidable 'coz delay or error in Dimension ETL or may be
due to logic of ETL. To handle Late Arriving facts, we can create dummy Dimension with
natural/business key and keep rest of the attributes as null or default. And as soon as Actual
dimension arrives, the dummy dimension is updated with Type 1 change. These are also known as
Inferred Dimensions.