You are on page 1of 35

TASK1: Create a student ARFF file and explore it in explorer application in WEKA GUI chooser.

AIM: To Create a student ARFF file and explore it in explorer application in WEKA GUI chooser. DESCRIPTION: Creating a student relation with attributes sid, name, branch,gender,m1,m2,m3,total,avg and class for the analysis of student relation.

PROCEDURE: Step1: open notepad i.e start runtype notepad Step2: type the following program in notepad @relation student @attribute sid numeric @attribute name string @attribute branch string @attribute gender {male, female} @attribute m1 numeric @attribute m2 numeric @attribute m3 numeric @attribute total numeric @attribute avg numeric @attribute class {first, second, third, fail} @data 101,nari,cse,male,90,90,80 170,60,first 102,sudha,cse,male,50,60,70 170,60,first 103,swathi,cse,female,70,60,70 170,60,first 105,chiru,cse,male,30,60,70 170,60,third 106,anadh,cse,male,70,60,70 170,60,first 108,asma,cse,female,42,60,70 170,60,second 107,thabasum,cse,female,56,75,80,160,55,first 109,laxmi,cse,female,47,34,70,160,70,first Step3: save the file as student.arff. Step4: open weka GUI chooser i.e startprogrmasweka 3.7.7weka3.7.exe It displays the weka GUI chooser window

Step5: select explorer, the WEKA explorer will then launch. Step6: Load training data set. We can load the dataset into weka by clicking on open button in the preprocessing interface and selecting the appropriate file i.e student.arff. Step7: Once the data is loaded, weka will recognize the attributes and during the scan of the data, weka will compute some basic strategies on each attribute. The left panel in the figure shows the list of recognized attributes, while the top panel indicates the name of the base relation or table in the current working relation. Step8: Clicking on an attribute in the left panel will show the basic statistics on the attributes for the categorical attributes the frequency of each attribute value is shown, while for continuous attributes we can obtain minimum, maximum, mean, and standard deviation.

Step9: The visualization in the right button panel in the form of cross-tabulation across two attributes. Note :we can select another attribute using the dropdown list.

OUTPUT:

CONCLUSION:

TASK2: create a bankdataset.csv and explore it in explorer application in weka gui chooser and also perform the basic preprocessing operations on data relation such as removing an attribute and filter the attributes.

ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Age 23 45 34 22 31 36 43 27 40 25 26 27 29 30

Gender Female Male Female Male Male Female Male Male Female Male Female Male Male Female

Region Intercity Rural Intercity Suburban Rural Town Suburban Rural Town Intercity Suburban Rural Town Intercity

Income 20000 40000 15000 23000 32000 25000 18000 21000 42000 23000 25000 32000 21000 27000

Married No Yes Yes Yes Yes No Yes Yes Yes No Yes No No Yes

Children 0 1 3 1 2 0 3 2 2 0 1 0 0 2

Car Yes No Yes Yes Yes No No Yes No No Yes Yes No No

Save a/c No Yes Yes No No Yes Yes No No Yes Yes Yes Yes No

Current a/c Yes Yes Yes No No Yes No Yes Yes No Yes Yes No Yes

mortgage No No No Yes Yes Yes Yes No Yes No No Yes Yes Yes

Pep Yes No No Yes Yes No No No Yes Yes No Yes Yes Yes

AIM:
The data has to be pre processing before the bankdataset is to be mining i.e. the unique field has to be removed and the numeric data has to be converted to string type. PROCEDURE: Step1: Create bank_data.csv file in excel and convert it into .arff format by using weka3.7->tools ->arffviewer. Or created bank_data.csv is directly opened by the open file button in the explorer window. Step2: In bank _data relation the id attributes is unique attribute so this attribute has to be removed. Step3: Click on choose button in the filter panel this will show a popup window with a list available Filters in that select weak filters unsupervised attribute remove. Step4: In the resulting dialogue box enter the index of the attribute to be filtered. Step 5: Here enter 1 as the index of the id make sure that the inverter selected option is said to false then click on OK.Now in the filter box it will display as remove -r 1. Now press apply button. Step6: Save this new relation(bank1.arff) with the save option available at the top. Step7: The attribute age, income & children has to be Discretization on numeric or continuous attribute.Because association rule mining can only on Categorical data.

. Step8: Now open the new relation named bank1.arff and perform the discretize operation. Step9: Select choose option it will show a popup window, in that select weka filters un supervised attributed discretized and click ok button now in the text box change the parameter attribute indices as 6 and click ok and keep the remaining parameters as default and click apply button. Step 10: Again choose the discretize operation and enter the attribute indices as 1 for age attribute and bin as 3 click ok and click apply button. Step11: Perform the same operation for income attribute. Step12: Change the range formats to the desired format. Step10: Selecting or filtering attributes. REMOVING AN ATTRIBUTE: When we need to remove an attribute, we can do this by using the attribute filters in weka. In the filter model panel, click on choose button, This will show a popup window with a list of available filters. Scroll down the list and select the weka.filters.unsupervised.attribute.remove filters. Step11: a)Next click the textbox immediately to the right of the choose button. In the resulting dialog box enter the index of the attribute to be filtered out. b)Make sure that invert selection option is set to false. The click OK now in the filter box.you will see Remove-R-7. c)Click the apply button to apply filter to this data. This will remove the attribute and create new working relation. d)Save the new working relation as an arff file by clicking save button on the top(button)panel.(student.arff).

DISCRETIZATION: 1)Sometimes association rule mining can only be performed on categorical data. This requires performing Discretization on numeric or continuous attributes. In the following example let us discretize age attribute. Let us divide the values of age attribute into three bins(intervals). First load the dataset into weka(student.arff) Select the age attribute. Activate filter-dialog box and select WEKA.filters.unsupervised.attribute.discretizefrom the list. To change the defaults for the filters, click on the box immediately to the right of the choose button. We enter the index for the attribute to be discretized.In this case the attribute is age.So we must enter 1 corresponding to the age attribute. Enter 3 as the number of bins. Leave the remaining field values as they are. Click OK button. Click apply in the filter panel. This will result in a new working relation with the selected attribute partition into 3 bins. Save the new working relation in a file called student-data-discretized.arff

OUTPUT: Graph(Noramal Database):

Graph(After Removeing The Unique Field I.E Id):

Graph :(Discretise In Age)

Report: (Age)

Graph (Discretise In Income)

Report: (Income)

CONCLUSION:

TASK3: Using the classification algorithm predict a value for play attributes, use J48 classifier algorithm to generate tree and possible rules. Weather_Nominal.csv: Outlook Sunny Sunny Overcast Rainy Rainy Overcast Overcast Overcast Rainy Sunny Sunny Rainy AIM: The database to be mined for finding the simple rules for Weather_Nominal Relation with the output attribute as play. These instances are to be classified with respect to the values and the correctness of the play attribute has to be predicted. PROCEDURE: Step1: Create the database Weather_Nominal.csv using MS_EXCEL. Step2: Select Tools Arffviewer File Open. Open the Weather_Nominal.csv file to convert it into .arff format. To convert use save as option in file menu. Step3: Select application Explorer in Weka GUI chooser. Step4: select pre process tab in the Explorer window and select open file tab to open Weather_Nominal.arff file. Step5: Select classify tab and click choose option in classifier panel to select J48 true in tree classifier. Step6: Various parameters can be specified by clicking right of the Choose button but we will perform with default setting as J48-CO.25-M2. Where: - C is confidence factor - M is the minimum number of objects.

Temperature Hot Hot Hot Mild Cool Cool Mild Hot Mild Mild Cool Mild

Humidity High High High High Normal Normal High Normal High High Normal Normal

Windy False True False False False True True False True False True True

Play No No Yes Yes No Yes Yes Yes No Yes Yes Yes

Step7: Under the Test options panel select 10 fold cross validation Radio button as our Evaluation approach. To generate Evaluation on dataset select the option panel as play and now click start button to generate the model.

Step8: The evaluation statistics will appear in classifier output panel as the indication of the model construction is completed. Step9: This information can be viewed in the separate window by right clicking The result list panel and selecting option View in separate window from the popup menu. Step10: Classification tree can be visualized by right clicking in result list and selecting Visualize Tree option. It is the graphical representation. Step11: Now to verify the correctness of the instances for the attribute value play that Is predicted by classifier algorithm choose Visualize Classifier Errors in the result list panel by right clicking over it. It will display the predicted graphical representation that can be stored as .arff format by chooseinga save button and specifying a file name. Give file name as Weather_predicted. Step12: This predicted relation will have three new attributes. 1. Relation Number (Integer values) 2. Instance number (used for calculation purpose starts from 0) 3. predicted play column will be inserted just before the actual play attribute.

OUTPUT:
=== Run information === Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: weather.symbolic Instances: 14 Attributes: 5 outlook temperature humidity windy play Test mode: 10-fold cross-validation === Classifier model (full training set) === J48 pruned tree -----------------outlook = sunny | humidity = high: no (3.0) | humidity = normal: yes (2.0) outlook = overcast: yes (4.0) outlook = rainy | windy = TRUE: no (2.0) | windy = FALSE: yes (3.0) Number of Leaves : Size of the tree : 8 5

Time taken to build model: 0.03 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 7 50 Incorrectly Classified Instances 7 50 Kappa statistic -0.0426 Mean absolute error 0.4167 Root mean squared error 0.5984 Relative absolute error 87.5 % Root relative squared error 121.2987 % Coverage of cases (0.95 level) 78.5714 % Mean rel. region size (0.95 level) 64.2857 % Total Number of Instances 14 === Detailed Accuracy By Class ===

% %

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class 0.556 0.600 0.625 0.556 0.588 -0.043 0.633 0.758 yes 0.400 0.444 0.333 0.400 0.364 -0.043 0.633 0.457 no Weighted Avg. 0.500 0.544 0.521 0.500 0.508 -0.043 0.633 0.650 === Confusion Matrix === a b <-- classified as 5 4 | a = yes 3 2 | b = no

CONCLUSION:

TASK4: Using the replace missing values algorithm to fill the weather_nominal.csv data base table Weather_Nominal.csv: Outlook Sunny Overcast Rainy Rainy Temperature Hot Hot Hot Mild Cool Cool Mild Hot Mild Mild Cool Mild Humidity High High High High Normal Normal High Normal High High Normal Normal Windy False True False False False True True False True False True True Play No No Yes Yes No Yes Yes Yes No Yes Yes Yes

Overcast Rainy Sunny Rainy AIM:

The database to be mined weather_nominal.csv data base Table.

Using the replace missing values algorithm to fill the

PROCEDURE: Step 1: Create the database Weather_Nominal.csv using MS_EXCEL. Step 2: Select Tools Arffviewer File Open. Open the Weather_Nominal.csv file to convert it into .arff format. To convert use save as option in file menu. Step 3: Select application Explorer in Weka GUI chooser. Step 4: Go to open file and select whether nominal.csv. It will display the file Now go to Choose weka unsupervisedattributereplace missing values and then apply. Step 5: Now it will be replaced by actual values in outlook attribute

OUTPUT: Missing values :

Replace missing values:

Report viewer:

CONCLUSION:

TASK: 5 Setting up a flow to load an ARFF file (batch mode) and perform a cross-validation using J48 (WEKAs C4.5 implementation). AIM: To Setting up a flow to load an ARFF file (batch mode) and perform a cross-validation using J48 (WEKAs C4.5 implementation). PROCEDURE: Step: 1 Click on the Data Sources tab and choose ArffLoader from the toolbar (the mouse pointer will change to a cross hairs). Step: 2 Next place the ArffLoader component on the layout area by clicking somewhere on the layout (a copy of the ArffLoader icon will appear on the layout area). Step: 3 Next specify an ARFF file to load by first right clicking the mouse over the ArffLoader icon on the layout. A pop-up menu will appear. Select Configure under Edit in the list from this menu and browse to the location of your ARFF file. Step: 4 Next click the Evaluation tab at the top of the window and choose the ClassAssigner (allows you to choose which column to be the class) component from the toolbar. Place this on the layout. Step: 5 Now connect the ArffLoader to the ClassAssigner: first right click over the ArffLoader and select the dataSet under Connections in the menu. A rubber band line will appear. Move the mouse over the ClassAssigner component and left click - a red line labeled dataSet will connect the two components. Step: 6 Next right click over the ClassAssigner and choose Configure from the menu.This will pop up a window from which you can specify which column is the class in your data (last is the default). Step: 7 Next grab a CrossValidationFoldMaker component from the Evaluation toolbar and place it on the layout. Connect the ClassAssigner to the CrossValidationFoldMaker by right clicking over ClassAssigner and selecting dataSet from under Connections in the menu.

Step: 8 Next click on the Classifiers tab at the top of the window and scroll along the toolbar until you reach the J48 component in the trees section. Place a J48 component on the layout.

Step: 9 Connect the CrossValidationFoldMaker to J48 TWICE by first choosing trainingSet and then testSet from the pop-up menu for the CrossValidationFoldMaker.

Step:10 Next go back to the Evaluation tab and place a ClassifierPerformanceEvaluator component on the layout. Connect J48 to this component by selecting the batchClassifier entry from the pop-up menu for J48. Step:11 Next go to the Visualization toolbar and place a TextViewer component on the layout. Connect the ClassifierPerformanceEvaluator to the TextViewer by selecting the text entry from the pop-up menu for ClassifierPerformanceEvaluator. Step: 12 Now start the flow executing by selecting Start loading from the pop-up menu for ArffLoader. Depending on how big the data set is and how long cross-validation takes you will see some animation from some of the icons in the layout (J48s tree will grow in the icon and the ticks will animate on the ClassifierPerformanceEvaluator). You will also see some progress information in the Status bar and Log at the bottom of the window. When finished you can view the results by choosing Show results from the pop-up menu for the TextViewer GraphViewer to J48 in order to view the textual or graphical representations of the trees produced component. Other cool things to add to this flow: connect a TextViewer and/or a for each fold of the cross validation (this is something that is not possible in the Explorer).

OUTPUT:

Text viewer:

CONCLUSION:

TASK6: The Knowledge Flow can draw multiple ROC curves in the same plot window, something that the Explorer cannot do. In this example we use J48 and Random Forest as classifiers. AIM: To draw multiple ROC curves in the same plot window, something that the Explorer cannot do. In this example we use J48 and Random Forest as classifiers. PROCEDURE:
STEP1:

Click on the DataSources tab and choose ArffLoader from the toolbar (the mouse pointer will change to a cross hairs). Next place the Arff Loader component on the layout area by clicking somewhere on the layout (a copy of the Arff Loader icon will appear on the layout area). Next specify an ARFF file to load by first right clicking the mouse over the Arff Loader icon on the layout. A pop-up menu will appear. Select Configure under Edit in the list from this menu and browse to the location of your ARFF file. Next click the Evaluation tab at the top of the window and choose the Class Assigner (allows you to choose which column to be the class) component from the toolbar. Place this on the layout. Now connect the Arff Loader to the Class Assigner: first right click over the Arff Loader and select the data Set under Connections in the menu. A rubber band line will appear. Move the mouse over the Class Assigner component and left click - a red line labeled data Set will connect the two components. Next right click over the Class Assigner and choose Configure from the menu. This will pop up a window from which you can specify which column is the class in your data (last is the default). Next choose the ClassValuePicker (allows you to choose which class label to be evaluated in the ROC) component from the toolbar. Place this on the layout and right click over Class Assigner and select data Set from under Connections in the menu and connect it with the ClassValuePicker. Next grab a CrossValidationFoldMaker component from the Evaluation toolbar and place it on the layout. Connect the Class Assigner to the CrossValidationFoldMaker by right clicking over Class Assigner and selecting data Set from under Connections in the menu. Next click on the Classifiers tab at the top of the window and scroll along the toolbar until you reach the J48 component in the trees section. Place a J48 component on the layout.

STEP2:

STEP3:

STEP4:

STEP5:

STEP6:

STEP7:

STEP8:

STEP9:

STEP10:

Connect the CrossValidationFoldMaker to J48 TWICE by first choosing training Set and then test Set from the pop-up menu for the CrossValidationFoldMaker. Repeat these two steps with the Random Forest classifier.

STEP11: STEP12:

Next go back to the Evaluation tab and place a ClassifierPerformanceEvaluators component on the layout. Connect J48 to this component by selecting the batch Classifier entry from the pop-up menu for J48. Add another ClassifierPerformanceEvaluators for Random Forest and connect them via batch Classifier as well. Next go to the Visualization toolbar and place a ModelPerformanceChart component on the layout. Connect both ClassifierPerformanceEvaluators to the ModelPerformanceChart by selecting the threshold Data entry from the pop-up menu for ClassifierPerformanceEvaluators. Now start the flow executing by selecting Start loading from the pop-up menu for ArffLoader. Depending on how big the data set is and how long cross validation takes you will see some animation from some of the icons in the layout. You will also see some progress information in the Status bar and Log at the bottom of the window. Select Show plot from the popup-menu of the ModelPerformanceChart under the Actions section. Here are the two ROC curves generated from the UCI dataset credit-g, evaluated on the class label good:

STEP13:

STEP14:

STEP15:

OUTPUT:

CONCLUSION:

TASK:7 Some classifiers, clusters and filters in Weka can handle data incrementally in a streaming fashion. Here is an example of training and testing naive Bayes incrementally. The results are sent to a TextViewer and predictions are plotted by a StripChart component. AIM: To Some classifiers, clusters and filters in Weka can handle data incrementally in a streaming fashion. Here is an example of training and testing naive Bayes incrementally. The results are sent to a TextViewer and predictions are plotted by a StripChart component. PROCEDURE: Step: 1 Click on the DataSources tab and choose ArffLoader from the toolbar (the mouse pointer will change to a cross hairs). Step: 2 Next place the ArffLoader component on the layout area by clicking some where on the layout (a copy of the ArffLoader icon will appear on the layout area). Step: 3 Next specify an ARFF file to load by first right clicking the mouse over the ArffLoader icon on the layout. A pop-up menu will appear. Select Configure under Edit in the list from this menu and browse to the location of your ARFF file. Step: 4 Next click the Evaluation tab at the top of the window and choose the Class Assigner (allows you to choose which column to be the class) component from the toolbar. Place this on the layout. Step: 5 Now connect the Arff Loader to the Class Assigner: first right click over the Arff Loader and select the data Set under Connections in the menu. A rubber bandline will appear. Move the mouse over the Class Assigner component and left click - a red line labeled data Set will connect the two components. Step: 6 Next right click over the ClassAssigner and choose Configure from the menu. This will pop up a window from which you can specify which column is the class in your data (last is the default). Step: 7 Now grab a NaiveBayesUpdateable component from the bayes section of the Classifiers panel and place it on the layout. Step: 8 Next connect the ClassAssigner to NaiveBayesUpdateable using a instance connection. Step: 9 Next place an IncrementalClassiferEvaluator from the Evaluation panel onto the layout and connect NaiveBayesUpdateable to it using a incremental Classifier connection. Step:10 Next place a TextViewer component from the Visualization panel on the Layout. Connect the IncrementalClassifierEvaluator to it using a text connection. Step: 11 Next place a StripChart component from the Visualization panel on the layout and connect IncrementalClassifierEvaluator to it using a chart connection.

Step:12 Display the StripCharts chart by right-clicking over it and choosing Show chart from the pop-up menu. Note: the StripChart can be configured with options that control how often data points and labels are displayed.

Step: 13 Finally, start the flow by right-clicking over the ArffLoader and selecting Start loading from the pop-up menu.

OUTPUT:

Strip chart:

CONCLUSION:

TASK:8 Generate association rules using Apriori algorithm with Bank.arff relation. a) Set minimum support range as 20% -100% incremental decrease factor as 5% and confidence factor as 80% and generate 5 rules. b) Set minimum support as 10%, delta 5%, minimum average(Lift) as 150 % and generate 4 rules. Bank.arff

AIM:

The given data set is to be mined to generate association rule with the given minimum support using both confidence & Lift analysis.

PROCEDURE: Step1 : Create the data set Bank. CSV in excel and convert it into .arff format. Step 2 : Open the file Bank.arff format using preprocess tab. Step3 : Apply preprocessing technique to convert numeric to Categoral type and unique value attribute. Step 4 : In Bank.arff relation 3 attributes age, income, children are of numeric and it has unique value so remove id attribute and discretize three attributes age, income, children. Step 5 : a)The children attribute contain only 4 domains so convert the data type numeric to nominal. b) Discretize age and income attribute by selecting weka.filters unsupervised attribute, discretize and click on the textbox, give the index of both age and income and change the bin parameter to 3. Step 6 : Press apply button. Now the dataset is preprocess. Step 7 : lick on "associate tab, select Apriori algorithm by pressing chooses button click on the text box to change the parameters for the first set of association rule as lower BoundMinSupport as 0.2 metricType as confidence minMetric as 0.8 numRules as 5 keep the default parameter. Step 8 : Now click on the start button to run the program . Step 9 : By right clicking on the result first panel the rules can be saved in separate file.

Step 10 : Now we generate the rules with threshold value for support and lift which is the combination of support confidence and Lift measure for quality rules.

Step 11 : Click the text box to change the parameters lowerBoundMinSupport as 0.1, metricType as Lift, minMetric as 1.5 and numRules as 4 .Then function the text box it will be displayed as,"Apriori-N4-T1-C1.5-D0.05-U1.0-M0.1 -S-1.0-C-1". Step 12 : Click on the start button to generate association rule save the rule in separate file.

OUTPUT: a) === Run information === Scheme : weka.associations.Apriori -N 5 -T 0 -C 0.8 -D 0.05 -U 1.0 -M 0.2 -S -1.0 -c -1 Relation: bank-weka.filters.unsupervised.attribute.Remove-R1weka.filters.unsupervised.attribute.Discretize-B3-M-1.0-R1weka.filters.unsupervised.attribute.Remove-R6-weka.filters.unsupervised.attribute.Discretize-B3M-1.0-R4 Instances : 14 Attributes: 10 Age Gender Region Income Married Car Save a/c Current a/c Mortgage Pep === Associator model (full training set) === Apriori ======= Minimum support: 0.35 (5 instances) Minimum metric <confidence>: 0.8 Number of cycles performed: 13 Generated sets of large itemsets: Size of set of large itemsets L(1): 17 Size of set of large itemsets L(2): 25 Best rules found:

1. Gender =Female 6 ==> Current a/c=Yes 6 <conf:(1)> lift:(1.56) lev:(0.15) [2] conv:(2.14) 2. Current a/c=No 5 ==> Gender =Male 5<conf:(1)> lift:(1.75) lev:(0.15) [2] conv:(2.14) 3. Save a/c=No 6 ==> Married =Yes 5<conf:(0.83)> lift:(1.3) lev:(0.08) [1] conv:(1.07) 4. Pep=No 6 ==> Married =Yes 5<conf:(0.83)> lift:(1.3) lev:(0.08) [1] conv:(1.07) 5. Save a/c=No 6 ==> Pep=Yes 5<conf:(0.83)> lift:(1.46) lev:(0.11) [1] conv:(1.29)

b) === Run information === Scheme : weka.associations.Apriori -N 4 -T 1 -C 1.5 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1 Relation : bank-weka.filters.unsupervised.attribute.Remove-R1weka.filters.unsupervised.attribute.Discretize-B3-M-1.0-R1weka.filters.unsupervised.attribute.Remove-R6-weka.filters.unsupervised.attribute.Discretize-B3M-1.0-R4 Instances : 14 Attributes : 10 Age Gender Region Income Married Car Save a/c Current a/c Mortgage Pep === Associator model (full training set) === Apriori ======= Minimum support: 0.35 (5 instances) Minimum metric <lift>: 1.5 Number of cycles performed: 13 Generated sets of large itemsets: Size of set of large itemsets L(1): 17 Size of set of large itemsets L(2): 25 Best rules found: 1.Gender =Male 8 ==> Current a/c=No 5 conf:(0.63) < lift:(1.75)> lev:(0.15)[2] conv:(1.29) 2.Current a/c=No 5 ==> Gender =Male 5 conf:(1) < lift:(1.75)> lev:(0.15) [2]conv:(2.14) 3.Gender =Female 6 ==> Current a/c=Yes 6 conf:(1) < lift:(1.56)> lev:(0.15)[2] conv:(2.14) 4. Current a/c=Yes 9 ==> Gender =Female 6 conf:(0.67) < lift:(1.56)> lev:(0.15) [2] conv:(1.29)

CONCLUSION:

TASK: 9 Perform clustering using k means algorithm to clustering the customers in the bank dataset and characterize the customer segment. Bank.csv
Age 23 45 34 22 31 36 43 27 40 25 26 27 29 30 Gender Female Male Female Male Male Female Male Male Female Male Female Male Male Female Region Intercity Rural Intercity Suburban Rural Town Suburban Rural Town Intercity Suburban Rural Town Intercity Income 20000 40000 15000 23000 32000 25000 18000 21000 42000 23000 25000 32000 21000 27000 Married No Yes Yes Yes Yes No Yes Yes Yes No Yes No No Yes Child ren 0 1 3 1 2 0 3 2 2 0 1 0 0 2 Car Yes No Yes Yes Yes No No Yes No No Yes Yes No No Save a/c No Yes Yes No No Yes Yes No No Yes Yes Yes Yes No Current a/c Yes Yes Yes No No Yes No Yes Yes No Yes Yes No Yes Mort gage No No No Yes Yes Yes Yes No Yes No No Yes Yes Yes Pep Yes No No Yes Yes No No No Yes Yes No Yes Yes Yes

ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14

AIM: The given dataset is to be mined to build cluster as customer in the bank dataset and characterize the customer segments. PROCEDURE: Step1: Create bank.csv with atleast twenty instances and convert into. arff format. Step2: Load the bank.arff relation by selecting application->explorer->preprocess->open file. Step3: To perform clustering select the cluster tab in the explorer and click on choose button. Step4: Select simple k-means algorithm in a drop down list.

Step5: Click on the textbox to open the popup window for editing the clustering parameter. Step6: In the popup window enter 5 as the numClusters and consider the send option as same (random number selection (random number selection). Step7: In the cluster mode panel select use training set radio button and click the start button.

Step8: Right click on the result list panel to view the results in the separate window. Step9: The result window will show the centroid for each cluster, statistics, % of instances assigned to clusters. Step10: To understand clustering select visualizes in the result panel by selecting the option visualize cluster assignments.

Step11: Test the visualize with sample data for x-axis, y-axis and color dimension. Choose as yaxis and gender attribute color dimension. Now the result in the visualize will be represented as distribution of males and females in each cluster. Step12: Resulting dataset can be saved using save button in the visualization window here the new attribute will be added i.e., clustd.

OUTPUT: === Run information === Scheme: weka.clusterers.SimpleKMeans -N 5 A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots 1 -S 10 Relation: bank-weka.filters.unsupervised.attribute.Remove-R1weka.filters.unsupervised.attribute.Discretize-B3-M-1.0-R1weka.filters.unsupervised.attribute.Remove-R6-weka.filters.unsupervised.attribute.Discretize-B3M-1.0-R4 Instances : 14 Attributes : 10 Age Gender Region Income Married Car Save a/c Current a/c Mortgage Pep Test mode: evaluate on training data

=== Clustering model (full training set) === KMeans ====== Number of iterations: 2 Within cluster sum of squared errors: 32.0 Missing values globally replaced with mean/mode Cluster centroids: Cluster# Full Data 0 1 (14) (2) (3) (4)

Attribute

2 (3)

3 (2)

===================================================================== ========== Age '(young]' '(young]' '(senior)' '(middle]' '(young]' '(young]' Gender Male Female Male Female Female Male Region Intercity Intercity Suburban Town Intercity Rural Income '(low]' '(low]' '(low]' '(medium]' '(low]' '(low]' Married Yes No Yes Yes Yes No Car Yes Yes No No Yes Yes Save a/c Yes No Yes No Yes Yes Current a/c Yes Yes No Yes Yes Yes Mortgage Yes No Yes Yes No Yes Pep Yes Yes No Yes No Yes Time taken to build model (full training data) : 0.03 seconds === Model and evaluation on training set === Clustered Instances 0 1 2 3 4 2 ( 14%) 3 ( 21%) 4 ( 29%) 3 ( 21%) 2 ( 14%)

CONCLUSION:

You might also like