You are on page 1of 13

ASSIGNMENT

Presented By
JIKKU VARUGHESE KORUTH
S3, MBA - Daksha
1. Applications of Big data in management :-
Big data has been playing a role of a big game changer for most of the industries over the last
few years. According to Wikibon, worldwide Big Data market revenues for software and
services are projected to increase from $42B in 2018 to $103B in 2027, attaining a Compound
Annual Growth Rate (CAGR) of 10.48%. the primary goal of Big Data applications is to help
companies make more informative business decisions by analyzing large volumes of data. It
could include web server logs, Internet click stream data, social media content and activity
reports, text from customer emails, mobile phone call details and machine data captured by
multiple sensors.

Organisations from different domain are investing in Big Data applications, for examining large
data sets to uncover all hidden patterns, unknown correlations, market trends, customer
preferences and other useful business information.

 Big Data Applications in Healthcare


 Big Data Applications in Manufacturing

 Big Data Applications in Media & Entertainment

 Big Data Applications in IoT

 Big Data Applications in Government

Healthcare

Technological innovations such as Big Data have greatly influenced the operations in the
healthcare industry. Analysis of big data has boosted the overall efficiency of healthcare service
delivery. Both physicians and patients can now interact and analyze the progress of the patient’s
treatment by checking out the patient’s history.

The healthcare facilities that have adopted Big Data oriented medical devices can now monitor
the behaviors of their current and previous clients so as to rationalize better delivery of services
that their customers need. For instance, through a watch fitted on the patient’s hand, a physician
can easily analyze the collected data about the patient’s heartbeat and temperature and prescribe
a medicine without visiting the patient. The technology behind Nanobots and how they can
manage to deliver oxygen to tissues or fight harmful germs in the body is outstanding.

Manufacturing

Predictive manufacturing provides near-zero downtime and transparency. It requires an


enormous amount of data and advanced prediction tools for a systematic process of data into
useful information.

Major benefits of using Big Data applications in manufacturing industry are:

 Product quality and defects tracking


 Supply planning

 Manufacturing process defect tracking

 Output forecasting

 Increasing energy efficiency

 Testing and simulation of new manufacturing processes

 Support for mass-customization of manufacturing

Media & Entertainment

Various companies in the media and entertainment industry are facing new business models, for
the way they – create, market and distribute their content. This is happening because of
current consumer’s search and the requirement of accessing content anywhere, any time, on any
device.

Big Data provides actionable points of information about millions of individuals. Now,
publishing environments are tailoring advertisements and content to appeal consumers. These
insights are gathered through various data-mining activities. Big Data applications benefits
media and entertainment industry by:
 Predicting what the audience wants
 Scheduling optimization

 Increasing acquisition and retention

 Ad targeting

Internet of Things (IoT)


Data extracted from devices provides a mapping of device inter-connectivity. Such mappings
have been used by various companies and governments to increase efficiency. IoT is also
increasingly adopted as a means of gathering sensory data, and this sensory data is used in
medical and manufacturing contexts.

Government
The use and adoption of Big Data within governmental processes allows efficiencies in terms of
cost, productivity, and innovation. In government use cases, the same data sets are often applied
across multiple applications & it requires multiple departments to work in collaboration.

Since Government majorly acts in all the domains, thus it plays an important role in innovating
Big Data applications in each and every domain. Let me address some of the major areas:

Cyber security & Intelligence

The federal government launched a cyber security research and development plan that relies on
the ability to analyze large data sets in order to improve the security of U.S. computer networks.

The National Geospatial-Intelligence Agency is creating a “Map of the World” that can gather
and analyze data from a wide variety of sources such as satellite and social media data. It
contains a variety of data from classified, unclassified, and top-secret networks.

Crime Prediction and Prevention

Police departments can leverage advanced, real-time analytics to provide actionable intelligence
that can be used to understand criminal behaviour, identify crime/incident patterns, and uncover
location-based threats.
Pharmaceutical Drug Evaluation

According to a McKinsey report, Big Data technologies could reduce research and development
costs for pharmaceutical makers by $40 billion to $70 billion. The FDA and NIH use Big Data
technologies to access large amounts of data to evaluate drugs and treatment.

Scientific Research

The National Science Foundation has initiated a long-term plan to:

 Implement new methods for deriving knowledge from data


 Develop new approaches to education

 Create a new infrastructure to “manage, curate, and serve data to communities”.

Weather Forecasting

The NOAA (National Oceanic and Atmospheric Administration) gathers data every minute of
every day from land, sea, and space-based sensors. Daily NOAA uses Big Data to analyze and
extract value from over 20 terabytes of data.

Tax Compliance

Big Data Applications can be used by tax organizations to analyze both unstructured and
structured data from a variety of sources in order to identify suspicious behavior and multiple
identities. This would help in tax fraud identification.

Traffic Optimization

Big Data helps in aggregating real-time traffic data gathered from road sensors, GPS devices and
video cameras. The potential traffic problems in dense areas can be prevented by adjusting public
transportation routes in real time.
2. Data Visualization System

Data visualization is a general term that describes any effort to help people understand the
significance of data by placing it in a visual context. Patterns, trends and correlations that might
go undetected in text-based data can be exposed and recognized easier with data visualization
software. Data visualization is the process of displaying data/information in graphical charts,
figures and bars. It is used as means to deliver visual reporting to users for the performance,
operations or general statistics of an application, network, hardware or virtually any IT asset.

Importance of data visualization

Data visualization has become the de facto standard for modern business intelligence (BI). The
success of the two leading vendors in the BI space, Tableau and Qlik -- both of which heavily
emphasize visualization -- has moved other vendors toward a more visual approach in their
software. Virtually all BI software has strong data visualization functionality.

Data visualization tools have been important in democratizing data and analytics and making
data-driven insights available to workers throughout an organization. They are typically easier to
operate than traditional statistical analysis software or earlier versions of BI software. This has
led to a rise in lines of business implementing data visualization tools on their own, without
support from IT.

Data visualization software also plays an important role in big data and advanced analytics
projects. As businesses accumulated massive troves of data during the early years of the big data
trend, they needed a way to quickly and easily get an overview of their data. Visualization tools
were a natural fit.
Visualization is central to advanced analytics for similar reasons. When a data scientist is writing
advanced predictive analytics or machine learning algorithms, it becomes important to visualize
the outputs to monitor results and ensure that models are performing as intended. This is because
visualizations of complex algorithms are generally easier to interpret than numerical outputs.

Examples of data visualization

Data visualization tools can be used in a variety of ways. The most common use today is as a BI
reporting tool. Users can set up visualization tools to generate automatic dashboards that track
company performance across key performance indicators and visually interpret the results.

Many business departments implement data visualization software to track their own initiatives.
For example, a marketing team might implement the software to monitor the performance of an
email campaign, tracking metrics like open rate, click-through rate and conversion rate.

As data visualization vendors extend the functionality of these tools, they are increasingly being
used as front ends for more sophisticated big data environments. In this setting, data visualization
software helps data engineers and scientists keep track of data sources and do basic exploratory
analysis of data sets prior to or after more detailed advanced analyses.

3. Supply Chain Analytics


Supply Chain Analytics is helping to improve operational efficiency and effectiveness by
enabling data-driven decisions at strategic, operational and tactical levels. Supply chain analytics
is the application of mathematics, statistics, predictive modeling and machine-learning
techniques to find meaningful patterns and knowledge in order, shipment and transactional and
sensor data. An important goal of supply chain analytics is to improve forecasting and efficiency
and be more responsive to customer needs. For example, predictive analytics on point-of-sale
terminal data stored in a demand signal repository can help a business anticipate consumer
demand, which in turn can lead to cost-saving adjustments to inventory and faster delivery.
Supply chain analytics software

Supply chain analytics software is generally available in two forms: embedded in supply chain
software, or in a separate, dedicated business intelligence and analytics tool that has access to
supply chain data. Most ERP vendors offer supply chain analytics features, as do vendors of
specialized supply chain management software.

Some ERP and SCM vendors have begun applying complex event processing (CEP) to their
platforms for real-time supply chain analytics. Most ERP and SCM vendors have one-to-one
integrations but there is no standard. However, the Supply Chain Operations Reference (SCOR)
model provides standard metrics for comparing supply chain performance to industry
benchmarks. Ideally, supply chain analytics software would be applied to the entire chain, but in
practice, it is often focused on key operational subcomponents, such as demand planning,
manufacturing production, inventory management or transportation management. For example,
supply chain finance analytics can help identify increased capital costs or opportunities to boost
working capital and procure-to-pay analytics can help identify the best suppliers and provide
early warning of budget overruns in certain expense categories, and transportation analytics
software can predict the impact of weather on shipments.

4. Correlation
Correlation is a statistical measure that indicates the extent to which two or more variables
fluctuate together. A positive correlation indicates the extent to which those variables increase or
decrease in parallel; a negative correlation indicates the extent to which one variable increases as
the other decreases.

There could be essentially two types of data you can work with when determining correlation

Univariate Data: In a simple set up we work with a single variable. We measure central tendency
to enquire about the representative data, dispersion to measure the deviations around the central
tendency, skewness to measure the shape and size of the distribution and kurtosis to measure the
concentration of the data at the central position. This data, relating to a single variable is called
univariate data.
Bivariate data: But it often becomes essential in our analysis to study two variables
simultaneously. For example, height and weight of a person, b> age and blood pressure, etc.
This statistical data on two characters of any individual, measured simultaneously are termed as
bivariate data.

In fact, there may or may not be any association between these bivariate data. Here the word
‘correlation’ means the extent of association between any pair of bivariate data, recorded on any
individual.

Now let’s discuss the four types of correlation:

 Positive correlation

 Negative correlation

 Zero correlation

 Spurious correlation

 Positive correlation: If due to increase of any of the two data, the other data also
increases, we say that those two data are positively correlated.

For example, height and weight of a male or female are positively correlated.

 Negative correlation: If due to increase of any of the two, the other decreases, we say that
those two data are negatively correlated.

For example, the price and demand of a commodity are negatively correlated. When the
price increases, the demand generally goes down.

 Zero correlation: If in between the two data, there is no clear-cut trend. i.e. , the change in
one does not guarantee the co-directional change in the other, the two data are said to be
non-correlated or may be said to possess, zero correlation.
For example, quality like affection, kindness is in most cases non-correlated with the
academic achievements, or better to say that intellect of a person is purely non-correlated
with complexion.

 Spurious correlation: If the correlation is due to the influence of any other ‘third’ variable,
the data is said to be spuriously correlated.

For example, children with “body control problems” and clumsiness has been reported as
being associated with adult obesity. One can probably say that uncontrolled and clumsy
kids participate less in sports and outdoor activities and that is the ‘third’ variable here. At
most times, it is difficult to figure out the ‘third’ variable and even if that is achieved, it is
even more difficult to gauge the extent of its influence on the two primary variables.

5. Types of Decision Making Environment


The decisions are taken in different types of environment. The type of environment also
influences the way the decision is made

There are three types of environment in which decisions are made.

1. Certainty:

In this type of decision making environment, there is only one type of event that can take place.
It is very difficult to find complete certainty in most of the business decisions. However, in many
routine type of decisions, almost complete certainty can be noticed. These decisions, generally,
are of very little significance to the success of business.

2. Uncertainty:

In the environment of uncertainty, more than one type of event can take place and the decision
maker is completely in dark regarding the event that is likely to take place. The decision maker is
not in a position, even to assign the probabilities of happening of the events.
Such situations generally arise in cases where happening of the event is determined by external
factors. For example, demand for the product, moves of competitors, etc. are the factors that
involve uncertainty.

3. Risk :-

Under the condition of risk, there are more than one possible events that can take place.
However, the decision maker has adequate information to assign probability to the happening or
non- happening of each possible event. Such information is generally based on the past
experience.

Virtually, every decision in a modern business enterprise is based on interplay of a number of


factors. New tools of analysis of such decision making situations are being developed. These
tools include risk analysis, decision trees and preference theory.

Modern information systems help in using these techniques for decision making under conditions
of uncertainty and risk.

6. Cluster Analysis

Cluster analysis is a class of techniques that are used to classify objects or cases into relative
groups called clusters. Cluster analysis is also called classification analysis or numerical
taxonomy. In cluster analysis, there is no prior information about the group or cluster
membership for any of the objects.

Cluster Analysis has been used in marketing for various purposes. Segmentation of consumers
in cluster analysis is used on the basis of benefits sought from the purchase of the product. It can
be used to identify homogeneous groups of buyers.

Cluster analysis involves formulating a problem, selecting a distance measure, selecting a


clustering procedure, deciding the number of clusters, interpreting the profile clusters and finally,
assessing the validity of clustering.

The variables on which the cluster analysis is to be done should be selected by keeping past
research in mind. It should also be selected by theory, the hypotheses being tested, and the
judgment of the researcher. An appropriate measure of distance or similarity should be selected;
the most commonly used measure is the Euclidean distance or its square.

Clustering procedures in cluster analysis may be hierarchical, non-hierarchical, or a two-step


procedure. A hierarchical procedure in cluster analysis is characterized by the development of a
tree like structure. A hierarchical procedure can be agglomerative or divisive. Agglomerative
methods in cluster analysis consist of linkage methods, variance methods, and centroid methods.
Linkage methods in cluster analysis are comprised of single linkage, complete linkage, and
average linkage.

The non-hierarchical methods in cluster analysis are frequently referred to as K means clustering.
The two-step procedure can automatically determine the optimal number of clusters by
comparing the values of model choice criteria across different clustering solutions. The choice
of clustering procedure and the choice of distance measure are interrelated. The relative sizes of
clusters in cluster analysis should be meaningful. The clusters should be interpreted in terms of
cluster centroids.

Clustering is a fundamental modelling technique, which is all about grouping. The steps involved
in clustering are valid for all techniques.

Here are the steps for Cluster Analysis:

1.Choose the Right Variable – The concept involves identifying what is the right attribute and
how much is it worth it. Here, one must select a variable that one feels may be important for
identifying and understanding differences among groups of observation within the data.

2.Scaling the Data – In this, the data samples from different sources may be grouped in different
scales. For example, if we are working on personal data, such as age where it goes from 0 to 100,
weight between 40-180 and height between 1-6 feet. Here, the variables in the analysis vary in
range; the variable with the largest range will have the greatest impact on the results.

3.Calculate Distances- Here, if the variables in the analysis vary in range, the variable with the
largest range will have the greatest impact on the results.

A Point to note is that each of the attributes has different scales. If we try to come out with an
equation, then normalization must be considered, where we may have to bring all attributes and
variables. For example, given that we are doing analysis on weather and evaluate the sample data
from India & US, the scale is different in this case. This is because one would be using metric
system and the other is using US system. Thus, our objective is to bring them to the same
standard. Also, the basic purpose of Cluster Analysis is to calculate distances

Calculation of Distance between Points in a Cluster

Here, one objective can be to group similar points together into one cluster.

1) One way is that we can take the center of the cluster and find out the center of the next
group and calculate distance between the centers.

2) Or take the closest point and find distance between closest points.

3) Or take the largest distance points and find out the distant between them.

Simple linkage – produces elongated clusters. It is the shortest distance between a point in one
cluster and a point in the other cluster.

Complete linkage– longest distance between a point in one cluster and a point in the other cluster

Average linkage– average distance between each point in one cluster and each point in the other
cluster

Centroid – distance between the centroids (mean vector over the variables) of the two clusters

Ward– combines clusters that lead to the smallest distance within clusters, sum of all squares
over all variables

You might also like