This document discusses data mining, data warehousing, and business intelligence. It then summarizes the visualization of various retail laptop data metrics using Tableau. Specifically, it analyzes:
1) Retail price versus sales volume, finding most common prices between $450-500.
2) A direct correlation between hard disk size and price, with 80GB and 120GB having similar median prices and can be combined.
3) No relationship between retail price and customer-store distance.
4) The highest and lowest selling store postcodes and no outliers in sales volumes between stores.
5) An inverse correlation between customer-store distance and sales volume, with outliers removed strengthening this trend.
This document discusses data mining, data warehousing, and business intelligence. It then summarizes the visualization of various retail laptop data metrics using Tableau. Specifically, it analyzes:
1) Retail price versus sales volume, finding most common prices between $450-500.
2) A direct correlation between hard disk size and price, with 80GB and 120GB having similar median prices and can be combined.
3) No relationship between retail price and customer-store distance.
4) The highest and lowest selling store postcodes and no outliers in sales volumes between stores.
5) An inverse correlation between customer-store distance and sales volume, with outliers removed strengthening this trend.
This document discusses data mining, data warehousing, and business intelligence. It then summarizes the visualization of various retail laptop data metrics using Tableau. Specifically, it analyzes:
1) Retail price versus sales volume, finding most common prices between $450-500.
2) A direct correlation between hard disk size and price, with 80GB and 120GB having similar median prices and can be combined.
3) No relationship between retail price and customer-store distance.
4) The highest and lowest selling store postcodes and no outliers in sales volumes between stores.
5) An inverse correlation between customer-store distance and sales volume, with outliers removed strengthening this trend.
Data mining: The process of using querying and sorting techniques on
huge chunks of data looking for a data pattern. Businesses often use data mining to understand supply and demand patterns. Data warehousing: It is essentially designing the storage of data between databases and ways to link these databases in order to aid data mining. Business Intelligence: The process of converting huge chunks of data in hand into actionable knowledge. Relationship: Efficient data warehousing will aid data mining. Both data mining and data warehousing are business intelligence tools.
PART 2 Visualization using Tableau:
1.
Retail price vs. Sales volume
The price of laptop ranges between $300 and $650. Most common price range falls between $450 and $500, indicated by two spikes in the center. 2.
Retail price vs. Hard disk size
We could see a direct correlation between hard disk size and retail price i.e., retail price increases as hard disk size increases. Although we could establish a direct correlation between hard disk size and retail price, the relationship is not that profound in the case of 80GB and 120GB hard disks. The median price for both the hard disk sizes is $480. Hence if we have to reduce hard disk categories from four to three, we can combine 120GB and 80 GB hard disks in one category. Since the pricing points for both the hard disk categories are similar (median - $480) combining these 2 categories will not affect sales figure. It also helps in maintaining the profit margin.
3.
Retail price vs. Customer-Store distance
The above graph shows a scatter plot between retail price and customer store distance. From the plot we could clearly see no relationship exists between two factors taken into consideration. Hence customer store distance cannot be used as a predictor for retail price.
Scatter plot showing outliers
The above scatter plot shows outliers for customer store distance.
4.
Sales volume vs. Store postcode
According to above bar chart, the store that sells the most is SW1P 3AU and the store that sells the least is S1P 3AU.
Box plots for sales record of each store
From the box plot, we can find that there are no outliers considering the sales volume of each store since none of the values fall beyond Q1-IQR and Q3+IQR ( Q1 lower quartile, Q2 upper quartile and IQR inter-quartile distance). 5.
Sales volume vs. Avg. customer-store distance (with outliers)
From the above scatter plot, we could say that an inverse correlation exists between customer-store distance and sales volume.
Sales volume vs. Avg. customer-store distance (w/o outliers)
Outliers are SW18 1NN, E7 8NW, CR7 8LE and KT2 5AU. By removing the outliers, we can reduce the p-value of the trend line to less than 0.0001.