R and Handoop

Experience:I Have around 16 years experience working in
Business Intelligence domain working as a Consultant,

Architect/Lead and Manager and around 9 years experience in
Project and Team Management(Detail Oriented, Peoples person
and a Good Leader). I worked in SAP suite of BW tools
BI/BW/HANA/BOBJ/BPC, Teradata, QuikView and Oracle BI,ETL
tools BODS and Informatica and Data Visualization tools such as
Bex, BOBJ(Webi, Dashboard Design, BO Analysis for
Microsoft),SAP Lumira, QuikSense, cognos and Power BI.
Big data environment like Appache Hadoop and Cloudera
Experience working in Predictive Analytics R
Descriptive Analytics, which use data aggregation and data mining to provide insight into the
past and answer: What has happened?
Predictive Analytics, which use statistical models and forecasts techniques to understand the
future and answer: What could happen?
I have extensive experience working in Logistics, Manufacturing,

Financial, Supply Chain Management, HR, CRM, and FI-
Controlling Functional Areas to develop both Operational,self
serving and Management Reporting.
In my Current Project, Core Responsibility is for the
implementation of SAP BW,SAP BW/HANA and HANA for
MM,HR, Financial and CRM Loyalty Management Reports and
integration of SAP HANA system with MDM,RPAS,
Sucessfactors, Oracle, Hadoop(for Device Data and for Historical
Data) and R(Association and Clustering Algorithms) for MAPS,
Space Optimization and Customer and Store Clustering and
helping establish COE for SAP BI infrastructure.(HID Access
system) and Self serving Master Data reporting and quality of
Master data(AMDP).
Involved in the MAPS (Merchandise and Assortment Planning) as
it integrated with SAP.
Thats a brief summary of my experience and I would be happy to
answer any questions you have it for me.
PAL :
Clustering: k-means
Space Optimization,Customer Clustering
Classification
Association
Propensity modelling
Apriori-apprioori
Association algorithm: Put simply, the apriori principle states that if an itemset is
infrequent, then all its subsets must also be infrequent
Support, Confidence and Lift
Both X and Y can be placed on the same shelf, so that buyers of one item would
be prompted to buy the other.
Promotional discounts could be applied to just one out of the two items.
Advertisements on X could be targeted at buyers who purchase Y.
X and Y could be combined into a new product, such as having Y in flavors of X.
While we may know that certain items are frequently bought together, the question is,
how do we uncover these associations?
Besides increasing sales profits, association rules can also be used in other fields. In
medical diagnosis for instance, understanding which symptoms tend to co-morbid can
help to improve patient care and medicine prescription.
Definition
Association rules analysis is a technique to uncover how items are associated to each
other. There are three common ways to measure association.
Measure 1: Support. This says how popular an itemset is, as measured by the
proportion of transactions in which an itemset appears. In Table 1 below, the support of
{apple} is 4 out of 8, or 50%. Itemsets can also contain multiple items. For instance, the
support of {apple, beer, rice} is 2 out of 8, or 25%.
Table 1. Example Transactions
If you discover that sales of items beyond a certain proportion tend to have a significant
impact on your profits, you might consider using that proportion as your support
threshold. You may then identify itemsets with support values above this threshold as
significant itemsets.
Measure 2: Confidence. This says how likely item Y is purchased when item X is
purchased, expressed as {X -> Y}. This is measured by the proportion of transactions
with item X, in which item Y also appears. In Table 1, the confidence of {apple -> beer}
is 3 out of 4, or 75%.
One drawback of the confidence measure is that it might misrepresent the importance
of an association. This is because it only accounts for how popular apples are, but not
beers. If beers are also very popular in general, there will be a higher chance that a
transaction containing apples will also contain beers, thus inflating the confidence
measure. To account for the base popularity of both constituent items, we use a third
measure called lift.
Measure 3: Lift. This says how likely item Y is purchased when item X is purchased,
while controlling for how popular item Y is. In Table 1, the lift of {apple -> beer} is
1,which implies no association between items. A lift value greater than 1 means that
item Y is likely to be bought if item X is bought, while a value less than 1 means that
item Y is unlikely to be bought if item X is bought.
BFL is a library that delivers a set of commonly used Financial Algorithms,
To be able to utilize the BFL algorithms you need to install the AFL (Application
Function Library) that matches the revision number of your HANA box. AFL will deliver
BFL plus PAL (Predictive Analysis Library)
Forecasting the Sales (Demand Planning):

For a product with high seasonal changes, changes for last
58 months seem to be erratic..
1) Determining Base line Seasonality: Average Sales
Overall Period(Year) and establish the mean, then
seasonality is determined by as Actual Sales per Month
less than the mean for overall year
Gamma is the term use to see if there are changes in the seasonality. In
short, we may have 14 more items sold in July than the other months of the
year, but this is also growing at 4 items each year. Ex: in July 2001 we have
sales of 115 items. This is 14 items more than the average for the other
months of the year. But what if in July 2002 it increases to 119 items (+4)
and to in July 2003 to 123 items (+4)? - We may then say that the Gamma
represents an increase of + 4.( smoothing factor for the seasonal
component)
exponential smoothing methods with alpha, beta and gamma
Mean Square Error (MSE)
We now calculate the MSE by:
First, Sum the (The forecasted value the actual value)2 for all
forecasts. I.e. Jan 2002 = (100.63 104)2 = 11.357 + Feb 2002 = (101
100)2 = 1.000 (this is done for all 108 months)
Second, Divide the sum by number of periods in the forecast (nine
years of 12 months) = 108. MSE for our example is now 193.6. Our
goal is to reduce the MSE to as small as possible by changing the
alpha, beta and gamma
FYI: the SAP ECC default/initial settings are: alpha = 0.2; beta =
0.1; delta = 0.3 and gamma = 0.3
Hadoop with HANA

1) Using BODS unstructured data is processed by
SPARK-APACHE and stored as used as source for
BODS using Hive Adopter
Disadvantages: Real Time not possible
2) Using SDA/SDI-
3) SAP Smart Data Access (SDA) allows SAP HANA to connect and
virtually access data remotely without any need for data to move into
SAP HANA. In our case, the tables are consumed as virtual tables
and the SQL query is run directly in Hadoop.
4) SAP HANA Smart Data Integration (SDI) is the integrated component
of SAP HANA which allows for seamless integration with external
systems (here Hadoop) without the need for any separate,
heterogeneous, non-native tier between the source and SAP HANA.
Smart Data Integration is supported from SAP HANA SPS09 and
further enhanced in SAP HANA SPS10 making it one of the ideal
solutions for bringing real-time data from external systems. In our
case, SDI facilitates real-time replication of data from Hadoop to SAP
HANA. The data can be either pulled on demand when any query
executes. When data in Hadoop changes or gets updated, it gets
automatically pushed into SAP HANA.
As of the latest version, SAP HANA supports Hive connector (using
JDBC), HDFS (using File adapter), SQL on top of Spark (using SAP
HANA Spark controller) and direct Hadoop (using ODBC).
Hive is SQL like Interface to query Hadoop.
IOT: Intel Retail Sensor(Intel RSP) Collects data from sensors in the
store and RFIDS from Products---Then RSP feeds into SAP Dynamic
Edge Processing Server --- SAP Dynamic Edge Processing Server
Monitors store data for critical events and dervies insights based on
business context for store to take action---Insync with
Remote Synchronize is useful for large datasets for trends but less
useful for real time actionable analytics at store(which is supported by
light weight SAP products closer to the store---Cloud
1) Transaction Availabilty for Remote sites(TARS) which allows ERP
transactions even when transactions to the IT Data center is
slow
2) SAP SQL anywhere for Server,desktop,mobile and remote office
3) SAP Streaming Lite(SDS)
4) SAP HANA Remote Sync
1) Superior Customer Service
2) Detect Item shrink quickly and more accurately then cycle
counting
3) Drive real time requests to restock shelves
4) Optimize check out times
Machine Learning:
The newest frontier in machine learning involves demand forecasting and
related functions. Introduced three years ago, demand forecasting that
uses machine learning can link assortment, space, price and fulfillment into
a single plan, factoring in time of year, weather and information on
competitive products, sell through, customer traffic and demographics.
Demand forecasting allows online and offline retailers to generate more
precise forecasts than traditional time series approaches. This yields more
accurate inventory levels in stores, online and warehouses. In-stock
positioning improves, there are fewer markdowns and better ROIs.
The time series strategy, in contrast, employs just a handful of demand

factors (e.g., trend, seasonality and cycle) and is restricted to demand
history. Demand is analyzed only for a certain product, SKU, category,
demographic market or channel. The process uses single dimension
algorithms, each of which analyzes demand based on data-limited
constraints. Hence, data must be manually cleansed and separated, making
the process longer and costlier
Toolsgroup;
Most demand planning software can factor in seasonalitylike the
fact that more ice cream is sold in summer than in winter. But
sometimes seasonality can become so extreme or complex that it is
not as well suited to normal regression analysis-based techniques.
For instance, at one customer machine learning sifts through SKU-
Locations to identify clusters of products with similar
seasonality profiles, recognizing more than 200 micro-climates
and their seasonal timing variations.
Social Sensing:
Weather and Macroeconomics
Find the table use by a transaction code

You can easily find all the table access by a transaction code via SE49.
Supply the transaction and click the Display button and all the table access by the
particular transaction code will appear.
Finding fields with Table Name
I'm working as a JR. ABAPer & frequently I encounter this situation when I'm given a
report either with fields from a structure or no fields at all. I have following doubts:-
1) Who's responsibility it is to find out the fields complete with table name?
Functional consultant or ABAPER?
2) Is there some std. way of finding out these fields?
You can go through Tables i.e. Data dictionary.

The table which store information about Structures and Tables are as follows:
DD02L - table properties
DD02T - table texts
DD03L - field properties
DD03T - field texts

How to find data related to a structure? With GREAT difficulty, but here are some
ways (which will NOT always work! LOL) to find specific fields if you know the
structure's fieldname.
1) First, goto SE11, and enter your structure name. Goto the field that holds your data,
and double-click on the element name. Once inside the element, do a where-used list
for that element, searching tables only. Then go into each table, and see if you can
find the one holding your data. There may sometimes be a huge number of tables
displayed, but a lot of them will be empty.
2) Get the name of the program behind your transaction, goto SE80, and enter the
program name. 99% of the time it will be part of a module-pool and bring up the pool.
Goto the dictionary structures, and search each table there for the one holding your
data.
3) Open a new session with transaction ST05, select SQL Trace, click on the Trace
On button, and go back to your transaction (while leaving the session with ST05
open). Submit your transaction, and go back to the ST05 session. Click on the Trace
Off button, then select the List Trace button. Continue with standard selections, and a
BASIC TRACE LIST will appear. search thru the tables displayed under
ObjectName.
In ECC 6.0, you can ge the similar functionality through a function module.
Check the Package - SEST, Function Group - SEA1 and Function Module -
RS_PROGRAM_TABLES.
This FM provides the same result. Just Provide Object Type as 'T' (in case of Transaction code,
likewise P in case of Program), Object Name as 'XD01' (Transaction Code for Customer Master) and
execute. It will provide all the tables which is used by this Transaction.

R and Handoop

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

R and Handoop

Uploaded by

Copyright:

Available Formats

Experience:I Have around 16 years experience working in

Business Intelligence domain working as a Consultant,

I have extensive experience working in Logistics, Manufacturing,

Space Optimization,Customer Clustering

Support, Confidence and Lift

Forecasting the Sales (Demand Planning):

Hadoop with HANA

Hive is SQL like Interface to query Hadoop.

The time series strategy, in contrast, employs just a handful of demand

Find the table use by a transaction code

Finding fields with Table Name

You can go through Tables i.e. Data dictionary.

DD02L - table properties

DD02T - table texts

DD03L - field properties

DD03T - field texts

You might also like