Professional Documents
Culture Documents
Email: andrescastaneda535@gmail.com
Phone: 773-598-4686
PROFESSIONAL SUMMARY
Around 8 years of experience in IT as Data scientist with strong technical expertise, business experience, and
communication skills to drive high-impact business outcomes through data-driven innovations and decisions.
Extensive experience in Retail Analytics for consumer goods and buyers behavior, developing different Data
Mining, Analytics, Machine Learning solutions to various business problems and generating data visualizations
using Python.
Expertise in transforming business requirements into analytical models, designing algorithms, building models,
developing data mining and reporting solutions that scale across a massive volume of structured and
unstructured data.
Experience in designing stunning visualizations using Python software and publishing and presenting
dashboards, Storyline on desktop platforms.
Hands on experience in implementing LDA, Naïve Bayes and skilled in Random Forests, Decision Trees, Linear
and Logistic Regression, SVM, Clustering, Principle Component Analysis and good knowledge on Recommender
Systems.
Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random
Forest, SVM, K-Nearest Neighbors, Bayesian, XGBoost) in Forecasting/ Predictive Analytics, Segmentation
methodologies, Regression-based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
Worked and extracted data from various database sources like Oracle, SQL Server.
Well experienced in Normalization & De-Normalization techniques for optimum performance in relational and
dimensional database environments.
Hand on working experience in statistics to draw meaningful insights from data. I am good at communication and
storytelling with data.
Utilize analytical applications/libraries like Pandas, Numpy, Scikit-Learn, Seaborn and Matplotlib to identify
trends and relationships between different pieces of data, draw appropriate conclusions and translate analytical
findings into marketing strategies that drive value.
Hands on experience on Databricks and PySpark utilities such as classification, regression, clustering,
dimensionality reductions
Strong knowledge of data governance and experienced in input data analysis for deviations in raw data
processing.
Solid team player, team builder, and an excellent communicator.
Extensive hands-on experience and high proficiency with structures, semi-structured and unstructured data,
using a broad range of data science programming languages and big data tools including Python, Spark, SQL,
Scikit Learn, Hadoop, RDD, RDBMS.
Expertise in Technical proficiency in Designing, Cleaning and preparing Data, Modeling, Solution, Data
Warehouse/Business Intelligence Applications.
Experience in working on both windows, Linux a platforms.
Flexible with Unix/Linux and Windows Environments, working with Operating Systems like Ubuntu13/14.
EDUCATION
Bachelors of science, UVM Laureate University, Mexico (GPA 9.28/10.0)
TECHNICAL SKILLS
Page 1 of 3
Programming: Python
Databases: Oracle, MongoDB, MS Access
Visualization: Seaborn, Bokeh, Matplotlib, GGplot
Software: MS Office (MS Excel, MS Power Point, MS Access)
Data Mining: Data reduction, Clustering, Classification, Anomaly detection, Text mining
Big Data Ecosystem: Hadoop, Spark
Machine Learning: Linear/Logistic regression, RFC, KNN, K-Means, Dimensionality reduction algorithms
BI Tools: Power BI, Tableau
PROFESSIONAL EXPERIENCE
Clients: Nestle, P&G, Coca-Cola, Pfizer, Bayer, Pepsico, Hershey’s, Kellogg’s, Bimbo, MARS, and General Mills.
Data Scientist
Developing Food, Beverages, Homecare, Pharma global data methods for advanced analytics.
Data coding models.
Designing methods to process big data in production lines globally.
Designed category standards, normalized products coding and innovative market segmentations for all consumer
goods industry across globe.
Key Responsibilities:
Page 2 of 3
Performed data parsing and data profiling from large volumes of varied data to learn about behavior with various
features based on transactional data, call center history data and customer personal profile, etc.
Processed the primary quantitative and qualitative market research and loaded the survey responses into
database, in preparation of data exploration
Developed python scripts to automate data sampling process. Ensured the data integrity by checking for
duplication, completeness, accuracy, and validity
Worked on data cleaning and ensured data quality, consistency, integrity using Numpy, S Frame in Python
Developed solutions for market analysis with product association, Share of Market, Neuroscience, Consumer
Behaviour
Applied Principal Component Analysis method in feature engineering to analyze high dimensional data
Application of various machine learning algorithms and statistical modeling - decision tree, lasso regression,
multivariate regression to identify key features using scikit-learn package in python
Evaluated models using k-fold cross validation, log loss function
Ensured that the model has low false positive rate, validated model by interpreting ROC Plot
Built repeatable processes in support of implementation of new features and other initiatives
Communicated and presented the results with product development team for driving best decisions
Environment: Python 3.6, PySpark, Tableau, Nump, Scikit-Learn, Seaborn, Matplotlib, FuzzyWuzzy, Bokeh
Page 3 of 3