Professional Documents
Culture Documents
hive> create external table Tri100(id int,name string,location varchar(30),sal int,Hike int)
>
>
>
>
OK
Time taken: 0.688 seconds
SUM
Returns the sum of the elements in the group or sum of the distinct values of the column in the group.
Count
count(*) - Returns the total number of retrieved rows, including rows containing NULL values;
count(expr) Returns the number of rows for which the supplied expression is non-NULL;
count(DISTINCT expr[, expr]) Returns the number of rows for which the supplied expression(s) are
unique and non- NULL;
OK
4
Time taken: 17.37 seconds, Fetched: 1 row(s)
Average
Returns the average of the elements in the group or the average of the distinct values of the column in
the group.
Minimum
Maximum
Variance
Standard Deviation
Covariance
Correlation
Returns the Pearson coefficient of correlation of a pair of a numeric columns in the group.
Percentile
Returns the exact pth percentile of a column in the group(does not work with floating point types).
P must be between 0 and 1. NOTE: A true percentile Percentile(BIGINT col,P)can only be computed for
INTEGER values. Use PERCENTILE_APPROX if you are input is non-integral.
hive> select percentile(sal,1) from Tri100; -----------------------Output Gives Higher Value of table as P is
1.It takes Higher value as 100%.
OK
40000.0
Time taken: 17.966 seconds, Fetched: 1 row(s)
Histogram
The output is an array of size b of double-valued (x,y) coordinates that represent the bin centers and
heights. histogram_numeric(col, b)
Collections
OK
[45000,40000,25000,30000]
Time taken: 18.29 seconds, Fetched: 1 row(s)
Returns a set of objects with duplicates(as of Hive 0.13.0)
NTILE
This function divides an ordered partition into x groups called buckets and assigns a bucket number to
each row in the partition. This allows easy calculation of tertiles, quartiles, deciles, percentiles and
other common summary statistics. (As of Hive 0.11.0.).