You are on page 1of 2

sanjay natraj

data scientist/hadoop engineer


Chicago, IL - Email me on Indeed: indeed.com/r/sanjay-natraj/375b258f28552492
Willing to relocate: Anywhere
Authorized to work in the US for any employer

WORK EXPERIENCE

Data Scientist
Data Science - March 2016 to March 2016
Working on enhancements on PIG Scripts to include more Topics.
Working on creation of Pig UDFs to process User Cookies Data.
Working on Table interaction from HIVE into PIG using Hcatalog.
Maintain all the Java Code created for PIG UDFs in GIT Repository.
Use Maven as a tool to Build , Compile and deploy Code.
Created UDF's using python for Queries in Hive.
Developed POC in Spark to Query the dataset.
Worked on Data Ingestion using Sqoop from Oracle to Hdfs.
Schema design for Hbase.
Implemented Java API to fetch the data from Hbase based on Row Key Design.
Written Scala Code to Perform Data Analysis using RDD's.
Worked on Complex Data Structures (Arrays, Structs, Array of Structs, Maps, Array of Maps) in Hive

Data Scientist
Data Science , EMC Corporation - Hopkinton, MA - June 2012 to August 2015
Worked with CTO office in designing an analytics-driven recommendation engine for the healthcare industry
Experience in dealing with Apache Hadoop components like HDFS, MapReduce, HiveQL, HBase, Pig,
Sqoop, Ozzie, Mahout, Cassendra, Mongo db, Big Data and Big Data Analytics.

EDUCATION

Custom UDFs
Secondary Namenode

ADDITIONAL INFORMATION
SKILLS
Languages: JAVA, J2EE, HTML5, CSS3, Java Script,CC++ R
Big Data Tools:
Hadoop distributions, MapReduce, YARN, Hive, HBase, Sqoop, Pig, Oozie, Zookeeper, R- Language,
R-Studio, R- Commander, Matlab, GIT, Sublime, SVN, JBOSS Drools, Tomcat, ETL, Hadoop, Spark,
MapReduce, Pig, Hive, NoSQL, HDFS, Elastisearch, Tensorflow.
IDE / Tools /Framework: NetBeans, Eclipse, Putty, Cygwin, Git, Maven, JIRA, Jenkins, SOAP UI
Database Oracle 9/10g, SQL Server, MySQL

PROJECTS
Handwritten digits classification using Artificial Neural Network and Logistic Regression technique (Python)
Implemented a multilayer perceptron neural network using feed forward, back propagation units and
evaluated its performance in classifying digits using sufficiently large datasets procured from MNIST
The result achieved 98% accuracy by optimal regularization on weights and tuning hyper parameters for
the neural network
A multi class classification of handwritten digits using logistic regression model was also developed using
the same dataset. The result reported an accuracy of 92.4%.
Convolutional Neural Network(CNN) in NLP using Google's Tensorflow (Python)
Implemented a CNN and tested its performance in classifying movie reviews on datasets acquired from
Rotten Tomatoes. Achieved an accuracy of 74% by tweaking model hyper parameters and loss below 10%.
This can be improved by increasing training epochs and batch sizes over a very large dataset.
Time Series forecast of stocks using datasets from NASDAQ (R)
Computed time series forecast of data at Center for Computational Research (CCR) using linear regression
model, Holt-Winters Model, and ARIMA model.
Evaluated the error measure (MAE) for the three models and calculated the stocks with minimum price for
all three models.
Comparative analysis of PIG, Hive, MapReduce in Stock volatility calculation using NASDAQ data (SQL,
Java) o Implemented the business logic for calculating least volatile stocks using PIG, Hive and Java
MapReduce. o Pig, Hive and MapReduce jobs were incrementally scaled over core sizes ranging from 1 to 48.
Compared the Performance of both Pig and Hive to MapReduce over certain performance markers such as
complexity of code, running time, ease of use.
Storage and analysis of twitter data using accumulo (Python, Java) o Acquired twitter data for 30 NBA teams
using Oauth API and implemented accumulo to store.
o Executed a MapReduce on top of it to determine the popularity count of NBA teams on the data sample

You might also like