You are on page 1of 5

Creating Table in HIVE :

hive> create external table Tri100(id int,name string,location varchar(30),sal int,Hike int)
>
>
>
>

row format delimited


fields terminated by ','
lines terminated by '\n'
stored as textfile location '/Jayanth/Jay/';

OK
Time taken: 0.688 seconds

Aggregated Functions and Normal Queries:

hive> select * from Tri100;


OK
1
rahul Hyderabad
30000
40000
2
Mohit Banglore
22000
25000
3
Rohan Banglore
33000
40000
4
Ajay Bangladesh
40000
45000
5
srujay Srilanka
25000
30000
Time taken: 0.184 seconds, Fetched: 5 row(s)

SUM

Returns the sum of the elements in the group or sum of the distinct values of the column in the group.

hive> select sum(sal) from Tri100;


OK
150000
Time taken: 17.909 seconds, Fetched: 1 row(s)

hive> select Sum(sal) from Tri100 where loccation='Banglore';


OK
55000
Time taken: 18.324 seconds, Fetched: 1 row(s)

Count

count(*) - Returns the total number of retrieved rows, including rows containing NULL values;
count(expr) Returns the number of rows for which the supplied expression is non-NULL;
count(DISTINCT expr[, expr]) Returns the number of rows for which the supplied expression(s) are
unique and non- NULL;

hive> select count(*) from Tri100;


OK
5
Time taken: 16.307 seconds, Fetched: 1 row(s)

hive> select count(distinct location) from Tri100;

OK
4
Time taken: 17.37 seconds, Fetched: 1 row(s)

hive> select count(*) from Tri100 where sal>30000;


OK
2
Time taken: 18.36 seconds, Fetched: 1 row(s)

hive> select count(location) from Tri100;


OK
5
Time taken: 17.338 seconds, Fetched: 1 row(s)

Average

Returns the average of the elements in the group or the average of the distinct values of the column in
the group.

hive> select avg(sal) from Tri100 where location='Banglore';


OK
27500.0
Time taken: 17.276 seconds, Fetched: 1 row(s)

hive> select avg(distinct sal) from Tri100;


OK
30000.0
Time taken: 17.276 seconds, Fetched: 1 row(s)

Minimum

Returns the minimum of the column in the group.

hive> select min(sal) from Tri100;


OK
22000
Time taken: 17.368 seconds, Fetched: 1 row(s)

Maximum

Returns the maximum of the column in the group.

hive> select max(sal) from Tri100;


OK
40000
Time taken: 17.267 seconds, Fetched: 1 row(s)

Variance

Returns the variance of a numeric column in the group.

hive> select variance(sal) from Tri100;


OK
3.96E7
Time taken: 17.223 seconds, Fetched: 1 row(s)

hive> select var_pop(sal) from Tri100;


OK
3.96E7
Time taken: 17.195 seconds, Fetched: 1 row(s)
Returns the unbiased sample variance of a numeric column in the group.

hive> select var_samp(sal) from Tri100;


OK
4.95E7
Time taken: 17.245 seconds, Fetched: 1 row(s)

Standard Deviation

Returns the Standard Deviation of a numeric column in the group.

hive> select stddev_pop(sal) from Tri100;


OK
6292.8530890209095
Time taken: 18.63 seconds, Fetched: 1 row(s)
Returns the unbiased sample Standard Deviation of a numeric column in the group.

hive> select stddev_samp(sal) from Tri100;


OK
7035.623639735144
Time taken: 17.299 seconds, Fetched: 1 row(s)

Covariance

Returns the population covariance of a pair of numeric columns in the group.

hive> select covar_pop(sal,Hike) from Tri100;


OK
4.4E7
Time taken: 18.888 seconds, Fetched: 1 row(s)
Returns the sample covariance of a pair of numeric columns in the group.

hive> select covar_samp(sal,Hike) from Tri100;


OK
5.5E7
Time taken: 18.302 seconds, Fetched: 1 row(s)

Correlation

Returns the Pearson coefficient of correlation of a pair of a numeric columns in the group.

hive> select corr(sal,Hike) from Tri100;


OK
0.9514987095307504
Time taken: 17.514 seconds, Fetched: 1 row(s)

Percentile

Returns the exact pth percentile of a column in the group(does not work with floating point types).
P must be between 0 and 1. NOTE: A true percentile Percentile(BIGINT col,P)can only be computed for
INTEGER values. Use PERCENTILE_APPROX if you are input is non-integral.

hive> select percentile(sal,0) from Tri100;------------------------Output Gives Lower Value of table as P is


0.It takes lower value as 0%.
OK
22000.0
Time taken: 17.321 seconds, Fetched: 1 row(s)

hive> select percentile(sal,1) from Tri100; -----------------------Output Gives Higher Value of table as P is
1.It takes Higher value as 100%.
OK
40000.0
Time taken: 17.966 seconds, Fetched: 1 row(s)

hive> select percentile(sal,0.5) from Tri100;


OK
30000.0
Time taken: 17.368 seconds, Fetched: 1 row(s)

Histogram

Computes a histogram of a numeric column in the group using

b non-uniformly spaced bins.

The output is an array of size b of double-valued (x,y) coordinates that represent the bin centers and
heights. histogram_numeric(col, b)

hive> select histogram_numeric(sal,5) from Tri100;


OK
[{"x":22000.0,"y":1.0},{"x":25000.0,"y":1.0},{"x":30000.0,"y":1.0},{"x":33000.0,"y":1.0},
{"x":40000.0,"y":1.0}]
Time taken: 17.534 seconds, Fetched: 1 row(s)

Collections

Returns a set of objects with duplicate elements eliminated.

hive> select collect_set(Hike) from Tri100;

OK
[45000,40000,25000,30000]
Time taken: 18.29 seconds, Fetched: 1 row(s)
Returns a set of objects with duplicates(as of Hive 0.13.0)

hive> select collect_list(Hike) from Tri100;


OK
[40000,25000,40000,45000,30000]
Time taken: 17.217 seconds, Fetched: 1 row(s)

NTILE

This function divides an ordered partition into x groups called buckets and assigns a bucket number to
each row in the partition. This allows easy calculation of tertiles, quartiles, deciles, percentiles and
other common summary statistics. (As of Hive 0.11.0.).

hive> select name,Hike,NTILE(3) over (order by sal DESC) from Tri100;


OK
Ajay
45000 1
Rohan 40000 1
Rahul 40000 2
Srujay 30000 2
Mohit 25000 3
Time taken: 17.217 seconds, Fetched: 1 row(s)