Professional Documents
Culture Documents
(EVEN NUMBERED)
select * from emp where rowid in (select decode(mod(rownum,2),0,rowid, null) from emp);
2. To select ALTERNATE records from a table. (ODD NUMBERED)
select * from emp where rowid in (select decode(mod(rownum,2),0,null ,rowid) from emp);
3. Find the 3rd MAX salary in the emp table.
select distinct sal from emp e1 where 3 = (select count(distinct sal) from emp e2 where e1.sal <=
e2.sal);
4. Find the 3rd MIN salary in the emp table.
select distinct sal from emp e1 where 3 = (select count(distinct sal) from emp e2where e1.sal >=
e2.sal);
5. Select FIRST n records from a table.
select * from emp where rownum <= &n;
6. Select LAST n records from a table
select * from emp minus select * from emp where rownum <= (select count(*) - &n from emp);
7. List dept no., Dept name for all the departments in which there are no employees in the
department.
select * from dept where deptno not in (select deptno from emp);
alternate solution: select * from dept a where not exists (select * from emp b where a.deptno =
b.deptno);
altertnate solution: select empno,ename,b.deptno,dname from emp a, dept b where a.deptno(+) =
b.deptno and empno is null;
8. How to get 3 Max salaries ?
select distinct sal from emp a where 3 >= (select count(distinct sal) from emp b where a.sal <= b.sal)
order by a.sal desc;
9. How to get 3 Min salaries ?
select distinct sal from emp a where 3 >= (select count(distinct sal) from emp b where a.sal >= b.sal);
10. How to get nth max salaries ?
select distinct hiredate from emp a where &n = (select count(distinct sal) from emp b where a.sal >=
b.sal);
11. Select DISTINCT RECORDS from emp table.
select * from emp a where rowid = (select max(rowid) from emp b where a.empno=b.empno);
12. How to delete duplicate rows in a table?
delete from emp a where rowid != (select max(rowid) from emp b where a.empno=b.empno);
13. Count of number of employees in department wise.
select count(EMPNO), b.deptno, dname from emp a, dept b where a.deptno(+)=b.deptno group by
b.deptno,dname;
14. Suppose there is annual salary information provided by emp table. How to fetch monthly
salary of each and every employee?
15. Select all record from emp table where deptno =10 or 40.
16. Select all record from emp table where deptno=30 and sal>1500.
19. Select all records where ename starts with ‘S’ and its lenth is 6 char.
20. Select all records where ename may be any no of character but it should end with ‘R’.
select * from emp where sal> any(select sal from emp where sal<3000);
select * from emp where sal> all(select sal from emp where sal<3000);
5. Select all the employee group by deptno and sal in descending order.
6. How can I create an empty table emp1 with same structure as emp?
8. Select all records where dept no of both emp and dept table matches.
select * from emp where exists(select * from dept where emp.deptno=dept.deptno)
9. If there are two tables emp1 and emp2, and both have common record. How can I fetch all
the recods but common records only once?
(Select * from emp) Union (Select * from emp1)
10. How to fetch only common records from two tables emp and emp1?
(Select * from emp) Intersect (Select * from emp1)
11. How can I retrive all records of emp those should not present in emp2?
(Select * from emp) Minus (Select * from emp1)
12. Count the total sal deptno wise where more than 2 employees exist.
SELECT deptno, sum(sal) As totalsal
FROM emp
GROUP BY deptno
HAVING COUNT(empno) > 2
*******************************************
Orchadmin Command -- useful for dataset
******************************************
1) Orchadmin describe -s <dataset name> ------ > to see the metadata of a dataset
3) Orchadmin dump -field <col1> -filed <col2> ----> to see the data in particular column
4) Dsrecords <dataset name> ----> to see the number of records in the dataset.
6) Orchadmin dump -part 4 <dataset name> -----> to see the nth partition data of a
dataset
7) Orchadmin dump -skip 4 <dataset name> -----> to skip n records from each partition
of a dataset
9) Orchadmin dump -p 5 <dataset name> -----> to display nth record of every partition
of a dataset
10) Orchadmin dump -part 0 -n 99 -field customer big.ds-----> to dump the value of the
customer field of the first 99 records of partition 0 of big.ds.
************************
Sed command
************************
1) To skip 2nd record in a file--------> sed -n '2!p' test.txt or sed '2d' test.txt
3) To skip 3rd record to 8th record in a file -----> sed '3,8d' filename.txt
5) To get 3rd and 24th record of a file -----> sed -n '3p; 24p’ filename.txt
6) To get first five records -------> sed -n '1,5p' test.txt or head -5 filename.txt
7) To get first and last record of a file -----> sed -n '1p; $p' filename.txt
8) To delete header and footer (or) first and last records of a file -----> sed -i '1d;$d'
filename.txt
10) to delete all occurrences of a word "hello" in one shot from a file -----> sed 's/hello//g'
filename.txt
11) To replace the word "bad" with "good" in a file-----> sed 's/bad/good/' < filename.txt
12) To replace all occurrences of the word "bad" with "good" in a file-----> sed
's/bad/good/g' < filename.txt
13) To replace the word "hello" with "hai" in the first 100 lines of a file -----> sed '1,100
s/hello/hai/' < filename.txt
14) To replace the word "hello" with "hai" in the from 100th line to last line in a file ---
sed '100, $ s/hello/hai/' < filename.txt
16) To print the lines that do not contain the word "run" in a file -----> sed -n '/run/!p' <
filename.txt
*******************
Cut command
*******************
1) To get 1st and 2nd columns of a file-------cut -d ':' -f1,2 filename.txt (":" is the
delimiter)
2) To get 3rd and 6th columns of a file ------cut -d ':' -f3, 6 filename.txt;
4) To print the fields from 10th to the end of the line ----> cut -d ':' -f10- filename.txt
5) To display the third and fourth character from each line of a file ----> cut -c 3,4
filename.txt
6) To display the characters from 10 to 20 from each line of a file -----> cut -c 10-20
filename.txt
7) To display the first 10 characters from each line of a file -----> cut -c -10 filename.txt
8) To display from 10 th character to end of the line in a file -----> cut -c 10-filename.txt
********************
SCP command
********************
1) To transfer entire directory from one server to other server------>use the following
syntax/command
2) To transfer particular file from one server to other server ----->use the following
syntax/command
Note: “prappal” is the username of the other server and “edwist” is the server name .
*********************
Grep command
*********************
1) To search for a string inside a given file -----> grep 'poorna' filename.txt
2) To search for a string inside a current directory ----- > grep 'poorna' * filename.txt
3) To search for a string in a directory with the subdirectories recursed ---- > grep -r
'poorna' * filename.txt
4) To print the lines that do not contain the word "poorna" from a file -----> grep -v
'poorna' filename.txt
5) To count the no. of occurrences of a given string in a file -----> grep -c 'poorna'
filename.txt (one per line means if there are multiple strings in one line it will count only
one)
8) To remove Empty lines form a given file-----> grep –v “^$” filename.txt >
temfilename.txt
mv tempfilename.txt filename.txt
9) To search 4 digit word in a file -----> grep “\<[0-9] [0-9] [0-9] [0-9]\>” filename.txt
10) To search the line having only three characters -----> grep “^…$” filename.txt
11) To display lines ending with “$” character in a given file -----> grep “\$$”
filename.txt
12) To print the lines which end with the word "end" in a file -----> grep 'end$'
filename.txt
13) To print the lines that start with the word "poorna" from a file ----> grep '^poorna'
filename.txt
*************************
Other useful commands
*************************
1) To convert all the capital letters of a file into lower case -----> cat filename.txt | tr
"[A-Z]" "[a-z]"
Egrep: accepts more than one pattern for search. Also accepts patterns from a file.
Fgrep: accepts multiple patterns both from command line and file but does not accept
regular expressions only strings. It is faster than the other two and should be used when
using fixed strings.
Datastage:
1) What are the types of Parallelism?
Pipeline Parallelism :
It is the ability for a downstream stage to begin processing a row as soon as an
upstream stage has finished processing that row (rather than processing one row
completely through the job before beginning the next row). In Parallel jobs, it is
managed automatically.
For example, consider a job (source &Transformer & Target) running on a system
having three processors:
The source stage starts running on one processor, reads the data from the source and
starts filling a pipeline with the read data.
At the same time, the target stage starts running on another processor, writes data to
the target as soon as the data is available.
Partitioning Parallelism:
Partitioning parallelism means that entire record set is partitioned into small sets and
processed on different nodes. That is, several processors can run the same job
simultaneously, each handling a separate subset of the total data.
For example if there are 100 records, then if there are 4 logical nodes then each node
would process 25 records each. This enhances the speed at which loading takes place.
a) Auto:
It chooses the best partitioning method depending on:
The mode of execution of the current stage and the preceding stage.
The number of nodes available in the configuration file.
b) Round robin:
Here, the first record goes to the first processing node, the second to the second
processing node, and so on. This method is useful for resizing partitions of an input
dataset that are not equal in size to approximately equal-sized partitions.
Data Stage uses ‘Round robin’ when it partitions the data initially.
c) Same:
It implements the Partitioning method same as the one used by the preceding stage.
The records stay on the same processing node; that is, data is not redistributed or
repartitioned. Same is considered as the fastest Partitioning method.
Data Stage uses ‘Same’ when passing data between stages in a job.
d) Random:
It distributes the records randomly across all processing nodes and guarantees that
each processing node receives approximately equal-sized partitions.
e) Entire:
It distributes the complete dataset as input to every instance of a stage on every
processing node. It is mostly used with stages that create lookup tables for their input.
f) Hash:
It distributes all the records with identical key values to the same processing node so
as to ensure that related records are in the same partition. This does not necessarily
mean that the partitions will be equal in size.
When Hash Partitioning, hashing keys that create a large number of partitions should
be selected.
Reason: For example, if you hash partition a dataset based on a zip code field, where
a large percentage of records are from one or two zip codes, it can lead to bottlenecks
because some nodes are required to process more records than other nodes.
g) Modulus:
Partitioning is based on a key column modulo the number of partitions. The modulus
partitioned assigns each record of an input dataset to a partition of its output dataset
as determined by a specified key field in the input dataset.
h) Range:
It divides a dataset into approximately equal-sized partitions, each of which contains
records with key columns within a specific range. It guarantees that all records with
same partitioning key values are assigned to the same partition.
Note: In order to use a Range partitioned, a range map has to be made using the
‘Write range map’ stage.
http://www.allwalkin.blogspot.com/http://www.allwalkin.blogspot.com/
i) DB2:
Partitions an input dataset in the same way that DB2 would partition it.
For example, if this method is used to partition an input dataset containing update
information for an existing DB2 table, records are assigned to the processing node
containing the corresponding DB2 record.
OLTP stores current data where as OLAP stores current and history data for analysis.
The query retrieval is very fast in OLTP when compared to the OLAP systems because
in OLTP all data is stored in one table and in OLAP data is stored in multiple tables.
4) What is the difference between star schema and snow-flake schema?
In star schema, dimension tables are denormalized where as in snow flake schema,
dimension tables are normalized.
7) If we check the preserve partitioning in one stage and if we don’t give any
partitioning method (Auto) in the next stage which partition method it will
use?
In this case, the partitioning method used by the preceding stage is used.
Preserve Partitioning indicates whether the stage wants to preserve the partitioning at
the next stage of the job. Options in this tab are:
9) Why we need sort stage other than sort-merge collective method and
perform sort option in the stage in advanced properties?
Sort Stage is used to perform more complex sort operations which are not possible
using stages Advanced tab properties.
Many stages have an optional sort function via the partition tab. This means if you are
partitioning your data in a stage you can define the sort at the same time. The sort
stage is for use when you don't have any stage doing partitioning in your job but you
still want to sort your data, or if you want to sort your data in descending order, or if
you want to use one of the sort stage options such as "Allow Duplicates" or "Stable
Sort". If you are processing very large volumes and need to sort you will find the sort
stage is more flexible then the partition tab sort.
10)Why we need filter, copy and column export stages instead of transformer
stage?
In parallel jobs we have specific stage types for performing specialized tasks. Filter,
copy, column export stages are operator stages. These operators are the basic
functional units of an orchestrate application. The operators in your Orchestrate
application pass data records from one operator to the next, in pipeline fashion. For
example, the operators in an application step might start with an import operator,
which reads data from a file and converts it to an Orchestrate data set. Subsequent
operators in the sequence could perform various processing and analysis tasks. The
processing power of Orchestrate derives largely from its ability to execute operators in
parallel on multiple processing nodes. By default, Orchestrate operators execute on all
processing nodes in your system. Orchestrate dynamically scales your application up
or down in response to system configuration changes, without requiring you to modify
your application. Thus using operator stages will increase the speed of data processing
applications rather than using transformer stages.
11) Describe the types of Transformers used in DataStage PX for processing and
uses?
Difference:
A Basic transformer compiles in "Basic Language" whereas a Normal Transformer
compiles in "C++".
Basic transformer does not run on multiple nodes whereas a Normal Transformer can
run on multiple nodes giving better performance.
Basic transformer takes less time to compile than the Normal Transformer.
Usage:
A basic transformer should be used in Server Jobs.
12) What will you do in a situation where somebody wants to send you a file and
use that file as an input or reference and then run job?
Use wait for file activity stage between job activity stages in job sequencer.
By using check point information we can restart the sequence from failure. if u enabled
the check point information reset the aborted job and run again.
14) What are Performance tunings you have done in your last project to increase
the performance of slowly running jobs?
3. Use Operator stages like remove duplicate, Filter, and Copy etc instead of
transformer stage.
4. Sort the data before sending to change capture stage or remove duplicate stage.
5. Key column should be hash partitioned and sorted before aggregate operation.
15) What is Change Capture stage? Which execution mode would you use when
you used for comparison of data?
The Change Capture stage takes two input data sets, denoted before and after, and
outputs a single data set whose records represent the changes made to the before
data set to obtain the after data set.
The stage produces a change data set, whose table definition is transferred from the
after data set’s table definition with the addition of one column: a change code with
values encoding the four actions: insert, delete, copy, and edit. The preserve-
partitioning flag is set on the change data set.
The compare is based on a set of key columns, rows from the two data sets are
assumed to be copies of one another if they have the same values in these key
columns. You can also optionally specify change values. If two rows have identical key
columns, you can compare the value columns in the rows to see if one is an edited
copy of the other.
The stage assumes that the incoming data is key-partitioned and sorted in ascending
order. The columns the data is hashed on should be the key columns used for the data
compare. You can achieve the sorting and partitioning using the Sort stage or by using
the built-in sorting and partitioning abilities of the Change Capture stage.
We can use both Sequential as well as parallel modes of execution for change capture
stage.
The Peek stage is a Development/Debug stage. It can have a single input link and any
number of output links.
The Peek stage lets you print record column values either to the job log or to a
separate output link as the stage copies records from its input data set to one or more
output data sets, like the Head stage and the Tail stage.
The Peek stage can be helpful for monitoring the progress of your application or to
diagnose a bug in your application.
a. If reference table is having huge amount of data then we go for join where as if the
reference table is having less amount of data then we go for lookup.
b. Join performs all 4 types of joins (inner join, left-outr join, right-outer join and full-
outer join) where as lookup performs inner join and left-outer join only.
c. Join don’t have reject link where as lookup is having a reject link.
DataStage is flexible about Meta data. It can cope with the situation where Meta data
isn’t fully defined. You can define part of your schema and specify that, if your job
encounters extra columns that are not defined in the Meta data when it actually runs,
it will adopt these extra columns and propagate them through the rest of the job. This
is known as runtime column propagation (RCP).
This can be enabled for a project via the DataStage Administrator, and set for
individual links via the Outputs Page Columns tab for most stages or in the Outputs
page General tab for Transformer stages. You should always ensure that runtime
column propagation is turned on.
RCP is implemented through Schema File.
The schema file is a plain text file contains a record (or row) definition.
The Row Generator stage is a Development/Debug stage. It has no input links, and a
single output link.
The Row Generator stage produces a set of mock data fitting the specified metadata.
This is useful where we want to test our job but have no real data available to process.
Row Generator is also useful when we want processing stages to execute at least
once in absence of data from the source.
Stage Variable - An intermediate processing variable that retains value during read
and doesn’t pass the value into target column.
Derivation - Expression that specifies value to be passed on to the target column.
Constant –
Conditions that are either true or false that specifies flow of data with a link.
The order of execution is stage variables-> constraints->derivations
21) Explain the Types of Dimension Tables?
Conformed Dimension: If a dimension table is connected to more than one fact table,
the granularity that is defined in the dimension table is common across between the
fact tables. Junk Dimension: The Dimension table, which contains only flags.
Monster Dimension: If rapidly changes in Dimension are known as Monster Dimension.
De-generative Dimension: It is line item-oriented fact table design.
Sparse Lookup:
If the reference table is having more amounts of data than Primary table data, then
better to go for sparse lookup
Normal Lookup:
If the reference table is having less amount of data than primary table data , then
better to go for normal
In both cases reference tables should be entire partioned and primary table should be
hash portioned.
24) Can we capture the duplicates in Datastage? If yes, how to do that?