SPAD7 Data Miner Guide PDF

22 quai gallieni - 92150 Suresnes - France
Tl : +33 1 57 32 60 60 - Fax : +33 1 57 32 62 00

spad@coheris.com www.coheris.com
Siret : 399 467 927 00105 - APE : 5829C
Register number training: 11-92-1522492

DATA MINER
GUIDE

Descriptive Statistics - Factorial Analyses - Clustering
Linear Models Discriminant Analyses
Scoring Decision Trees

Tl : +33 1 57 32 60 60 - Fax : +33 1 57 32 62 00
www.coheris.com
Siret : 399 467 927 00105 - APE : 5829C
Register number training : 11-92-1522492

Data Miner Guide

Copyright 1996, 2008 SPAD. All rights reserved.

For any further information about the SPAD software, training and consulting activities, please visit
us at www.coheris.com or contact us by email:

About E-mail
SPAD Software info-spad@coheris.com
SPAD Hot line support-spad@coheris.com
Training formation-spad@coheris.com
Consulting consulting-spad@coheris.com
Books publication-spad@coheris.com

For further information about the COHERIS Group offer (CRM, BI, Data Mining, Data Quality
Management, Merchandising Sfa), visit us at www.coheris.com
3
Table of contents
DESCRIPTIVE STATISTICS WITH SPAD 4
STATS - MARGINAL DISTRIBUTIONS, HISTOGRAMS 5
DEMOD AUTOMATIC CHARACTERIZATION OF A QUALITATIVE VARIABLE 16
DESCO - AUTOMATIC CHARACTERIZATION OF A CONTINUOUS VARIABLE 21
TABLE - CROSS TABLES 25
BIVAR - BIVARIATE ANALYSIS 28
FACTORIAL ANALYSES WITH SPAD 30
PCA - PRINCIPAL COMPONENT ANALYSIS 32
SCA - SIMPLE CORRESPONDENCE ANALYSIS 45
MCA - MULTIPLE CORRESPONDENCE ANALYSIS 50
CLUSTERING WITH SPAD 62
RECIP / SEMIS - CLUSTERING ON FACTORS SCORES 63
PARTI - DECLA - CUT OF THE TREE AND CLUSTERS DESCRIPTION 69
CLASS - MINER - CLUSTERS DESCRIPTION 78
ESCAL - STORING THE FACTORIAL AXES AND THE PARTITIONS 79
THE LINEAR MODEL AND ITS APPLICATIONS 80
REGRESSION AND ANALYSIS OF VARIABCE, GENERAL LINEAR MODEL 80
OPTIMAL REGRESSIONS RESEARCH 85
LOGISTIC REGRESSION 94
THE DISCRIMINANT AND ITS METHODS 105
FUWILD - OPTIMAL DISCRIMINANT ANALYSIS 105
DIS2GD - LINEAR DISCRIMINANT ANALYSIS BASED ON CONTINUOUS VARIABLES 117
DIS2GFP - LINEAR DISCRIMINANT ANALYSIS BASED ON PRINCIPAL FACTORS 126
DISCO - DISCRIMINANT ANALYSIS BASED ON QUALITATIVE VARIABLES 134
SCORE - SCORING FUNCTION 134
IDT 1 - INTERACTIVE DECISION TREE 1 154
IDT 2 - INTERACTIVE DECISION TREE 2 154

4

DESCRIPTIVE STATISTICS WITH SPAD

STATS : marginal distributions, histograms, matrix plot, box plot

DEMOD : automatic characterization of a qualitative variable

DESCO : automatic characterization of a continuous variable

TABLE : Crossed tables

BIVAR : Bivariate analysis

Descriptive Statistics with SPAD
5

STATS - MARGINAL DISTRIBUTIONS, HISTOGRAMS

This procedure supplies a rapid and automatic description of your nominal and
continuous variables.

The Survey.sba base is an opinion survey file, which will be used for this example. The file is
supplied with the application and installed automatically on your PC.

SET THE PARAMETERS FOR A METHOD

Before it can be executed, a method must have its parameters set.

To access the parameter settings of a method, right click on the method then on the Set the
method command or double-click on the method icon.

The rules for calculation and parameter settings of each of the methods are available on line.

The Cases, Weighting and Parameters tabs are available for almost all SPAD methods.

Cases: the Cases tab lets you select the cases used for the method
Weighting: the weighting tab allows you to adjust the distribution of the cases in the sample
Parameters: options and settings of the method
STATS - marginal distributions, Histograms
6
The Cases tab
The Cases tab lets you select the cases with one of the following methods:

All the available cases
One or more logical filters (selection criteria combined with AND/OR)
A name list of cases
A selection made in one or more intervals
Random draw
Apply a logical filter

In case of error, you can delete an expression from the filter by selecting the expression to discard,
and click on Delete.

The cases satisfying the filter are considered as active, while the others are supplementary.
Select the individuals from a list

C
Click on Logical filter
C
Select the chosen
variable
C
Click on the operator
C
Click on
Validate

Global Definition
of the filter
C
Click on the
operand
Select the chosen
method by List
Choose your cases in the Available list and
use the transfer buttons to select them.
Select the status of
the cases
7
Select cases by interval

You can save the definition of the selection made, by clicking on the Save button. This allows you
to re-use it later.

Do a Random Draw

This selection lets you apply the method to a sample before applying it to the entire SPAD base.
It also lets you, by executing the same method several times, after having taken the precaution to
change the number of preliminary request, to test the stability of the results of the method.
Indicate the number of preliminary
requests for the random draw. On
another execution of the selection, you
do not need to change the value of this
number unless you want to generate
different draws
Enter the percentage of the
draw by random, or the
sample size after the draw
Click on OK
C Select by interval as the
method of choice
C Select the status of the
cases
C Define the interval as a
function of its rank in the
Base SPAD
C Click on the arrow button
to move your choice to the
cases status window
Click on the Yes radio button
to run a random draw
Click on Define to set the
parameters for the
random draw
8
The Weighting tab
The weighting tab allows you to adjust the distribution of the cases in the sample:

According to a Weighting variable already in the file.
As a function of one or more theoretical percentages (calculation by adjustment).

Enter the theoretical percentage for each category and click on OK.

You can repeat this operation for another variable. In this way you get an adjustment as a function
of several variables with a simple weighting variable. This requires a calculation by successive
approximations, as shown in the window below:

Click on the options in the
first window, to access the
options window for the
weighting system.
In the case of calculation by
adjustment, in the available
variables window, choose the
variable serving to correct and
click on the button Define
Select the
weighting
type
For a category, enter
the theoretical
percentage and hit
Enter
You can use the options
by default, or change the
options for fitting
9
Attention: The weighting calculation in the weighting tab page for a method is temporary (the
weighting variable is not saved). This approach lets you make quick tests and also to measure the
influence of the weighting on the results of the method. When a satisfactory weighting variable has
been obtained, it is preferable to create a permanent weighting variable with the menu Tools
Weighting of the main menu (Data Management Manual, paragraph 4.3).

Then in the weighting tab of a method, we will select this variable as the weight variable.

The Marginal distributions tab

We select the categorical variables in the list below.

The Parameters button allows you to display or not the categories without any
respondent and to display or not the missing data as a new category.

The Statistics button displays summary statistics on the selected variables. For example,
select the Region where the respondent lives (V1), then click on the statistics button. A
window opens with statistics on the variable:

This statistics window shows for the categorical
variables: the count and percentage associated for
each category. For the continuous variables; the
statistic window shows the count, the mean, the
standard deviation, as well as the minimum and
maximum.
10
The Histograms - Categorization tab

This tab allows you to select continuous variables both for histograms/summary statistics
and for categorization (marginal distributions of the variables values)

The Parameters button allows you to set global or specific parameters for the histograms
characteristics such as the number of classes, the min and max bounds and the histogram
bar width.

You can also select continuous variables for categorization. As a result, each distinct value
is displayed with its frequency.
It is a preliminary step before splitting the continuous variable into classes.

It is not allowed to do both histograms and categorization for the same variable.

11
The Marginal distributions by categories tab

This tab is useful for variables that are based on the same categories. The categories of
theses variables must have the same labels and must be ranked in the same order (we can
check it with the marginal distributions tab).

The Parameters tab

This tab allows you to export the results into excel or not.

12
Once you have specified your request, then you validate the method by clicking on the
OK button.

RESULTS

Results are accessible in the Execution view or by right-clicking on the method and
choosing the Results command. Then, depending on the method, different choices are
available between the results editor, the Graphics gallery and Excel results.

The results editor

The Result Editor opens up in a new window.

The information list has a tree structure.
By clicking on you open a branch of the tree, and by clicking on you close a
branch of the tree. You can use the mouse to navigate through the tree.
By double clicking on the title, you display the relevant results in the new window.

The Layout option of the File menu allows you to customize results display on the screen.
The results can be printed or copied into your word processor, but they cannot be changed
in this editor.

13

THE RESULTS OF THE STATS METHOD

SUMMARY STATISTICS OF THE VARIABLES
MARGI NAL DI STRI BUTI ONS OF CATEGORI CAL VARI ABLES
- - - - - - - - COUNTS - - - - - - - -
ACTUAL %/ TOTAL %/ EXPR. HI STOGRAM OF WEI GHTS
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1 . Region where the respondent lives
Rg1 - Par i s r egi on 56 17. 78 17. 78 *********
Rg2 - Par i s Basi n 51 16. 19 16. 19 ********
Rg3 - nor t h 24 7. 62 7. 62 ****
Rg4 - east 29 9. 21 9. 21 *****
Rg5 - west 45 14. 29 14. 29 *******
Rg6 - sout h- west 38 12. 06 12. 06 ******
Rg7 - cent er east 36 11. 43 11. 43 ******
Rg8 - medi t er r anean 36 11. 43 11. 43 ******
OVERALL 315 100. 00 100. 00
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
2 . Urban area size ( number of inhabitants)
Agg1 - l ess t han 2000 84 26. 67 26. 67 *************
Agg2 - 2001 t o 5000 18 5. 71 5. 71 ***
Agg3 - 5001 t o 10000 18 5. 71 5. 71 ***
Agg4 - 10001 t o 20000 12 3. 81 3. 81 **
Agg5 - 20001 t o 50000 23 7. 30 7. 30 ****
Agg6 - 50001 t o 100000 18 5. 71 5. 71 ***
Agg7 - 100001 t o 200000 28 8. 89 8. 89 *****
Agg8 - mor e t han 200000 68 21. 59 21. 59 **********
Agg9 - par i s, par i s. aggl o 46 14. 60 14. 60 *******
OVERALL 315 100. 00 100. 00
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
3 . Sex of respondent
Sex1 - mal e 138 43. 81 43. 81 *********************
Sex2 - f emal e 177 56. 19 56. 19 **************************
OVERALL 315 100. 00 100. 00
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

MARGI NAL DI STRI BUTI ONS CATEGORI ZED VARI ABLES
- - - - - - - - - - - COUNTS - - - - - - - - - - - -
ACTUAL %/ TOTAL %/ EXPR. %CUM. HI STOGRAM OF WEI GHTS
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
14
26 . Number of persons in a housing
1. 000 38 12. 06 12. 06 12. 06 ******
2. 000 90 28. 57 28. 57 40. 63 *************
3. 000 69 21. 90 21. 90 62. 54 **********
4. 000 71 22. 54 22. 54 85. 08 **********
5. 000 34 10. 79 10. 79 95. 87 *****
6. 000 7 2. 22 2. 22 98. 10 *
7. 000 4 1. 27 1. 27 99. 37 *
8. 000 2 0. 63 0. 63 100. 00 *
OVERALL 315 100. 00 100. 00
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
28 . Number of children
0. 000 70 22. 22 22. 22 22. 22 **********
1. 000 67 21. 27 21. 27 43. 49 **********
2. 000 94 29. 84 29. 84 73. 33 *************
3. 000 54 17. 14 17. 14 90. 48 ********
4. 000 9 2. 86 2. 86 93. 33 **
5. 000 11 3. 49 3. 49 96. 83 **
6. 000 2 0. 63 0. 63 97. 46 *
7. 000 2 0. 63 0. 63 98. 10 *
8. 000 2 0. 63 0. 63 98. 73 *
9. 000 4 1. 27 1. 27 100. 00 *
OVERALL 315 100. 00 100. 00
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

SUMMARY STATI STI CS OF CONTI NUOUS VARI ABLES
TOTAL COUNT : 315
TOTAL WEI GHT : 315. 00
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +
| NUM . LABEL COUNT WEI GHT | MEAN STD. DEV. | MI NI MUM MAXI MUM | MI N. 2 MAX. 2 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +
| 4 . Age of r espondent 315 315. 00 | 43. 756 16. 581 | 18. 000 86. 000 | 19. 000 83. 000 |
| 41 . Fami l y, chi l dr en : i 315 315. 00 | 6. 651 1. 062 | 1. 000 7. 000 | 2. 000 6. 000 |
| 42 . Wor k, pr of essi on : i 315 315. 00 | 5. 956 1. 544 | 1. 000 7. 000 | 2. 000 6. 000 |
| 43 . Fr ee t i me, r el ax: i m 315 315. 00 | 5. 295 1. 454 | 0. 000 7. 000 | 1. 000 6. 000 |
| 44 . Fr i ends, acquai nt anc 315 315. 00 | 5. 190 1. 424 | 1. 000 7. 000 | 2. 000 6. 000 |
| 45 . Rel at i ves, br ot her s, 315 315. 00 | 5. 629 1. 436 | 1. 000 7. 000 | 2. 000 6. 000 |
| 46 . Rel i gi on : i mpor t anc 315 315. 00 | 3. 241 2. 022 | 0. 000 7. 000 | 1. 000 6. 000 |
| 47 . Pol i t i c, pol i t i cal l 315 315. 00 | 3. 111 1. 770 | 0. 000 7. 000 | 1. 000 6. 000 |
| 50 . St at e benef i t s : ave 283 283. 00 | 533. 795 926. 899 | 0. 000 5100. 000 | 15. 000 4980. 000 |
| 51 . Sal ar y of t he r espon 267 267. 00 | 4408. 547 4575. 339 | 0. 000 40000. 000 | 300. 000 24000. 000 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +

HI STOGRAMS OF CONTI NUOUS VARI ABLES
VARI ABLE 4 : Age of respondent
LOW. LI MI T| MEAN | WEI GHT| HI STOGRAM ( BETWEEN 16. 00 I NCLUDED AND 88. 00 EXCLUDED,
BAR I NTERVAL WI DTH = 2. 00)
- - - - - - - - - - +- - - - - - - - - - +- - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
16. 00 | 20. 93 | 28 | XXXXXXXXXXXXXX
24. 00 | 27. 85 | 68 | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
32. 00 | 35. 31 | 58 | XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
40. 00 | 43. 35 | 37 | XXXXXXXXXXXXXXXXXX
48. 00 | 52. 08 | 39 | XXXXXXXXXXXXXXXXXXX
56. 00 | 59. 06 | 33 | XXXXXXXXXXXXXXXX
64. 00 | 67. 09 | 33 | XXXXXXXXXXXXXXXX
72. 00 | 74. 71 | 14 | XXXXXXX
80. 00 | 82. 20 | 5 | XX
+- - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| | OVERALL | HI STOGRAM |
| | ( FROM 18. 00 TO 86. 00) | ( FROM 16. 00 TO 88. 00) |
+- - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| WEI GHT | 315. 00 | 315. 00 |
| MEAN | 43. 756 | 43. 756 |
| STD. DEV. | 16. 581 | 16. 440 |
+- - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
WEI GHTS OF REMAI NI NG CASES : STRI CTLY LESS THAN . . . . . 16. 00 : 0. 00
GREATER THAN OR EQUAL TO 88. 00 : 0. 00

15

MARGI NAL DI STRI BUTI ONS OF GROUPED VARI ABLES
COMMAND NUMBER 1
- - - - - - - - COUNTS - - - - - - - -
ACTUAL %/ TOTAL %/ EXPR.
DI STRI BUTI ON OF ANSWER : yes
FOR VARI ABLES
Have you r ecent l y been ner vous 155. 00 49. 21 49. 21
Have you r ecent l y had backaches 149. 00 47. 30 47. 30
Have you r ecent l y had headaches 115. 00 36. 51 36. 51
Have you r ecent l y been depr essed 50. 00 15. 87 15. 87
DI STRI BUTI ON OF ANSWER : no
FOR VARI ABLES
Have you r ecent l y been depr essed 265. 00 84. 13 84. 13
Have you r ecent l y had headaches 200. 00 63. 49 63. 49
Have you r ecent l y had backaches 166. 00 52. 70 52. 70
Have you r ecent l y been ner vous 160. 00 50. 79 50. 79

DEMOD Automatic Characterization of a qualitative variable
16

DEMOD AUTOMATIC CHARACTERIZATION OF A
QUALITATIVE VARIABLE

This extremely powerful procedure provides the automatic characterization of any
categorical variable.
This is the IDEAL procedure to find out everything about a variable in one question. The
well-structured outputs form comprehensive study reports.

One can characterize either each category of a variable, or globally the variable itself. All
the elements available (active and illustrative) may participate in the characterization: the
categorical variables of the categorical variables, the categorical variables themselves, and
the continuous variables.

The following table summarizes all the capabilities of the DEMOD procedure:

Elements to characterize Characterizing elements
Groups of cases (defined by the categories of the
variable to characterize)

We describe each category with all its significant characterizing
elements.
categories
categorical variables
continuous variables
The categorical variable to characterize

We cross the variable with all the characterizing elements and
display only the elements that are dependant from the variable
to characterize.
categories
categorical variables

A group of cases is defined by a category of the variable to characterize. We have as much
groups of cases as the number of categories of the variable to characterize.

Double-click on the demod icon in order to access the settings of the method.

17
THE VARIABLES TAB

The scrolling menu allows you to select the variables to characterize and the characterizing
elements.

In this example, the variable to characterize is V8 The family is the only place where you
feel well. All the other variables whether categorical or continuous are selected as
characterizing elements.

18
THE PARAMETERS TAB

This tab allows you to modify the default parameters for the DEMOD method.

Once you have set the parameters, then you validate the method by clicking on the OK
button and run the chain.

19
THE DEMOD RESULTS

THE DEMOD-5 EXCEL SHEET

% of category in group :
Frequency of the category in the group divided by the frequency of the group

% of category in set:
Frequency of the category in the population

% of group in category:
Frequency of the group in the category divided by the frequency of category

Test-value:
When the test-value is greater than zero, it means that the category is over-
represented in the group. The category is under-represented if the test-value is
negative. By default, SPAD displays only characterizing elements with a test-value
greater equal than 1.96 (i.e. a probability equal to 0.025 for an unilateral test).

Probability:
The probability evaluates the scale of the difference between the percentage of the
category in the group and the percentage of the category in the population. Lower is
the probability, more significant is the difference and greater is the test-value related
to this probability (the test-value is the fractile of the normal law that corresponds to
the same probability).

Weight:
Weight of the cases in the category
Characterisation by categories of groups of
The family is the only place where you feel well
Group: Yes (Count: 230 - Percentage: 73.02)
Variable label
Caracteristic
categories
% of
category in
group
% of
category in
set
% of group
in category
Test-value Probability Weight
Marital status married 78,26 70,79 80,72 4,55 0,000 223
Do you watch TV every day 62,61 55,87 81,82 3,83 0,000 176
Opinion about marriage indissoluble 31,30 25,71 88,89 3,79 0,000 81
Are you worried about the risk of a nuclear plant accident a lot 32,61 28,25 84,27 2,76 0,003 89
Do you have children yes 81,30 77,14 76,95 2,68 0,004 243
Are you worried about the risk of a road accident a lot 40,87 36,51 81,74 2,55 0,005 115
Educational level of the respondent primary school 20,43 17,14 87,04 2,50 0,006 54
Current situation of the respondent retired people 20,43 17,14 87,04 2,50 0,006 54
Are you worried about the risk of a mugging a lot 33,04 29,21 82,61 2,38 0,009 92
Do you think the society needs to change I do not know 11,30 9,21 89,66 2,01 0,022 29
Current situation of the respondent unemployed person 5,22 7,30 52,17 -2,02 0,022 23
Are you worried about the risk of a mugging not at all 23,04 26,35 63,86 -2,02 0,022 83
Current situation of the respondent student 2,17 3,81 41,67 -2,06 0,020 12
Educational level of the respondent technical and GCSE 3,48 5,40 47,06 -2,10 0,018 17
Marital status cohabitation 3,04 5,08 43,75 -2,30 0,011 16
Do you have work-personal life problems yes 20,43 24,13 61,84 -2,33 0,010 76
Urban area size (number of inhabitants) more than 200000 17,83 21,59 60,29 -2,46 0,007 68
Your opinion on the life conditions in the future improving a lot 3,91 6,67 42,86 -2,81 0,002 21
Do you watch TV quite often 19,57 24,13 59,21 -2,90 0,002 76
Marital status single 9,57 13,33 52,38 -2,93 0,002 42
Do you have children no 17,39 21,90 57,97 -2,96 0,002 69
Opinion about marriage dissolved if agreem 30,87 36,19 62,28 -3,07 0,001 114
Are you worried about the risk of a road accident a little 15,65 20,32 56,25 -3,13 0,001 64
Educational level of the respondent more high school 9,13 13,65 48,84 -3,49 0,000 43
20
THE DEMOD-13 EXCEL SHEET

Category mean:
Weighted mean of the variable in the category

Overall mean:
Weighted mean of the category in the overall population

Interpretation:
One can see that the Age of respondent is the most characterizing continuous
variable of the group who answered yes to the question The family is the only
place where you feel well .
This group is significantly older than the average respondent with an average age of
46 years old, compared to 43.75 years old for the overall population.

Characterisation by continuous variables of categories of
The family is the only place where you feel well
Yes (Weight = 230.00 Count = 230 )
Characteristic variables
Category
mean
Overall
mean
Category Std.
deviation
Overall Std.
deviation
Test-value Probability
Age of respondent 46,100 43,756 16,752 16,581 4,12 0,000
Religion : importance given 3,383 3,241 2,081 2,022 2,04 0,021
Relatives, brothers, sisters ... : importance given 5,726 5,629 1,380 1,436 1,98 0,024
Salary of the respondent 4044,990 4408,550 3690,140 4575,340 -2,09 0,018
No (Weight = 83.00 Count = 83 )
Category
mean
Overall
mean
Category Std.
deviation
Overall Std.
deviation
Salary of the respondent 5377,780 4408,550 6311,000 4575,340 2,10 0,018
Number of children 1,542 1,860 1,772 1,671 -2,02 0,022
Age of respondent 36,855 43,756 13,971 16,581 -4,41 0,000
21

DESCO - AUTOMATIC CHARACTERIZATION OF A
CONTINUOUS VARIABLE

This procedure provides the statistical characterization of one or more continuous
variables by:
The other continuous variables, with the support of correlations.
The categories of the categorical variables, by comparison of means.
The categorical variables themselves, with the help of Fisher's statistic.

THE VARIABLES TAB

A continuous variable can be characterized with the other variables whether categorical or
continuous, called characterizing variables.

The scrolling menu allows you to select the variables to characterize and the characterizing
elements.

DESCO - Automatic Characterization of a continuous variable
22
THE PARAMETERS TAB

The parameter Minimum relative weight of charactering elements is useful if you do
not want to display characterizing categories whose the frequency in the population is
lower than 2% (threshold by default).

Display the categories whose the
related probabilities are lower
equal than 0.025. It corresponds
to a test-value of 1.96.
23
THE DESCO RESULTS

CHARACTERISATION OF CONTINUOUS VARIABLES
DESCRI PTI ON OF : Salary of the respondent
DESCRI PTI ON BY CATEGORI ES
OF CONTI NUOUS VARI ABLE : Salary of the respondent
ON 267. 0 ACTI VE CASES MEAN = 4408. 547
STD. DEV. = 4575. 339
+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - +
| TEST PROB. | MEAN STD. DEV. | CATEGORI ES | VARI ABLE LABEL | WEI GHT |
| VALUE | | | | |
+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - +
| 8. 16 0. 000 | 7060. 53 4921. 82 | yes, f ul l t i me | At t he moment , do you have a pr of essi onal act i vi t y | 114. 00 |
| 7. 58 0. 000 | 6496. 32 4736. 16 | empl oyed | Cur r ent si t uat i on of t he r espondent | 136. 00 |
| 7. 28 0. 000 | 6617. 07 4883. 30 | no | Have you been unempl oyed dur i ng t he l ast t wel ve mont hs | 123. 00 |
| 6. 69 0. 000 | 6533. 19 5486. 12 | mal e | Sex of r espondent | 117. 00 |
| 4. 60 0. 000 | 6452. 63 5414. 05 | no | Do you have wor k- per sonal l i f e pr obl ems | 76. 00 |
| 4. 25 0. 000 | 6698. 25 6784. 83 | qui t e of t en | Do you wat ch TV | 57. 00 |
| 3. 73 0. 000 | 6331. 15 3880. 83 | yes | Do you have wor k- per sonal l i f e pr obl ems | 61. 00 |
| 3. 47 0. 000 | 6797. 37 6049. 03 | mor e hi gh school | Educat i onal l evel of t he r espondent | 38. 00 |
| 3. 35 0. 000 | 4860. 06 4834. 30 | no | Have you r ecent l y been depr essed | 217. 00 |
| 3. 18 0. 001 | 5291. 85 5418. 67 | no | Have you r ecent l y been ner vous | 135. 00 |
| 3. 10 0. 001 | 6950. 00 5579. 71 | yes | Do you have a pi ano | 28. 00 |
| 2. 89 0. 002 | 6529. 41 5935. 61 | yes | Do you have a second house | 34. 00 |
| 2. 88 0. 002 | 6330. 00 7536. 22 | yes | Do you have a vi deo- t ape | 40. 00 |
| 2. 65 0. 004 | 5937. 26 6786. 27 | Par i s r egi on | Regi on wher e t he r espondent l i ves | 51. 00 |
| 2. 43 0. 008 | 5179. 34 5246. 40 | a l ot | Has t he r espondent been i nt er est ed by t he sur vey | 117. 00 |
| 2. 17 0. 015 | 6906. 67 4638. 46 | a l ot bet t er | Your opi ni on on t he evol ut i on of t he dai l y per sonal l i f e | 15. 00 |
| 2. 10 0. 018 | 5377. 78 6311. 00 | No | The f ami l y i s t he onl y pl ace wher e you f eel wel l | 72. 00 |
| - 2. 01 0. 022 | 3301. 51 2735. 77 | qui t e agr ee | Per sons l i ke me of t en f eel al one | 55. 00 |
| - 2. 09 0. 018 | 4044. 99 3690. 14 | Yes | The f ami l y i s t he onl y pl ace wher e you f eel wel l | 193. 00 |
| - 2. 14 0. 016 | 3769. 06 3573. 01 | a l ot | Ar e you wor r i ed about t he r i sk of havi ng a ser i ous i l l ness | 125. 00 |
| - 2. 23 0. 013 | 3196. 12 3440. 69 | a l ot wor se | Your opi ni on on t he evol ut i on of Fr ench peopl e l i f e l evel | 56. 00 |
| - 2. 47 0. 007 | 3319. 48 2735. 76 | a l ot | Ar e you wor r i ed about t he r i sk of a nucl ear pl ant acci dent | 77. 00 |
| - 2. 54 0. 006 | 1971. 43 1864. 75 | unempl oyed per son | Cur r ent si t uat i on of t he r espondent | 21. 00 |
| - 2. 57 0. 005 | 760. 00 1356. 61 | st udent | Cur r ent si t uat i on of t he r espondent | 10. 00 |
| - 2. 66 0. 004 | 2606. 41 3255. 77 | a l ot wor se | Your opi ni on on t he evol ut i on of t he dai l y per sonal l i f e | 39. 00 |
| - 2. 86 0. 002 | 3726. 34 3277. 03 | ever y day | Do you wat ch TV | 155. 00 |
| - 2. 88 0. 002 | 4069. 97 3721. 48 | no | Do you have a vi deo- t ape | 227. 00 |
| - 2. 89 0. 002 | 4099. 07 4253. 85 | no | Do you have a second house | 233. 00 |
| - 3. 10 0. 001 | 4110. 81 4346. 66 | no | Do you have a pi ano | 239. 00 |
| - 3. 18 0. 001 | 3505. 18 3271. 07 | yes | Have you r ecent l y been ner vous | 132. 00 |
| - 3. 35 0. 000 | 2449. 00 2373. 53 | yes | Have you r ecent l y been depr essed | 50. 00 |
| - 3. 49 0. 000 | 2263. 04 2043. 80 | no qual i f i cat i ons | Educat i onal l evel of t he r espondent | 46. 00 |
| - 4. 36 0. 000 | 832. 14 1563. 89 | I have never wor ked | At t he moment , do you have a pr of essi onal act i vi t y | 28. 00 |
| - 4. 85 0. 000 | 2691. 10 3397. 40 | no | At t he moment , do you have a pr of essi onal act i vi t y | 103. 00 |
| - 6. 54 0. 000 | 488. 54 1396. 02 | housewi f e w/ o pr of . | Cur r ent si t uat i on of t he r espondent | 48. 00 |
| - 6. 69 0. 000 | 2751. 33 2742. 02 | f emal e | Sex of r espondent | 150. 00 |
| - 7. 28 0. 000 | 2311. 41 3196. 29 | mi ssi ng cat egor y | Do you have wor k- per sonal l i f e pr obl ems | 130. 00 |
| - 7. 28 0. 000 | 2311. 41 3196. 29 | mi ssi ng cat egor y | Have you been unempl oyed dur i ng t he l ast t wel ve mont hs | 130. 00 |
+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - +
| | 4408. 55 4575. 34 | OVERALL | 267. 00 |
+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - +

DESCRI PTI ON BY CATEGORI CAL VARI ABLES
OF VARI ABLE : Salary of the respondent
+- - - - - - - - - - - - +- - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - +
| TEST- VALUE | PROBA. | NUM . VARI ABLE LABEL | DEN. DEG. FREE. | FI SHER|
+- - - - - - - - - - - - +- - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - +
| 8. 56 | 0. 000 | 5 . Cur r ent si t uat i on of t he r espondent | 261 | 21. 44|
| 8. 48 | 0. 000 | 18 . At t he moment , do you have a pr of essi onal act i vi t y | 263 | 31. 95|
| 7. 50 | 0. 000 | 20 . Have you been unempl oyed dur i ng t he l ast t wel ve mont hs | 264 | 35. 01|
| 7. 28 | 0. 000 | 19 . Do you have wor k- per sonal l i f e pr obl ems | 264 | 32. 89|
| 6. 98 | 0. 000 | 3 . Sex of r espondent | 265 | 53. 58|
| 3. 48 | 0. 000 | 7 . Educat i onal l evel of t he r espondent | 258 | 3. 87|
| 3. 47 | 0. 000 | 33 . Do you wat ch TV | 263 | 6. 57|
| 3. 38 | 0. 001 | 24 . Have you r ecent l y been depr essed | 265 | 11. 69|
| 3. 21 | 0. 001 | 23 . Have you r ecent l y been ner vous | 265 | 10. 50|
| 3. 12 | 0. 002 | 16 . Do you have a pi ano | 265 | 9. 94|
| 2. 90 | 0. 004 | 17 . Do you have a second house | 265 | 8. 58|
| 2. 89 | 0. 004 | 15 . Do you have a vi deo- t ape | 265 | 8. 50|
| 2. 04 | 0. 021 | 52 . Has t he r espondent been i nt er est ed by t he sur vey | 264 | 3. 92|
| 1. 92 | 0. 054 | 21 . Have you r ecent l y had headaches | 265 | 3. 74|
| 1. 77 | 0. 039 | 30 . Your opi ni on on t he evol ut i on of t he dai l y per sonal l i f e | 261 | 2. 38|
| 1. 56 | 0. 059 | 25 . Ar e you sat i sf i ed of your heal t h | 263 | 2. 51|
| 1. 33 | 0. 092 | 40 . Ar e you wor r i ed about t he r i sk of a nucl ear pl ant acci dent | 263 | 2. 16|
| 1. 31 | 0. 189 | 29 . Do you r egul ar l y i mpose r est r i ct i ons | 265 | 1. 73|
| 1. 24 | 0. 107 | 8 . The f ami l y i s t he onl y pl ace wher e you f eel wel l | 264 | 2. 24|
| 1. 12 | 0. 132 | 1 . Regi on wher e t he r espondent l i ves | 259 | 1. 61|
| 1. 07 | 0. 143 | 39 . Ar e you wor r i ed about t he r i sk of umempl oyment | 263 | 1. 82|
| 1. 03 | 0. 151 | 35 . The comput er sci ence di f f usi on i s. . . | 263 | 1. 78|
| 1. 02 | 0. 154 | 34 . Do you t hi nk t he soci et y needs t o change | 264 | 1. 86|
| 0. 92 | 0. 179 | 49 . Per sons l i ke me of t en f eel al one | 263 | 1. 64|
| 0. 89 | 0. 186 | 31 . Your opi ni on on t he evol ut i on of Fr ench peopl e l i f e l evel | 260 | 1. 48|
| 0. 86 | 0. 194 | 36 . Ar e you wor r i ed about t he r i sk of havi ng a ser i ous i l l ness| 263 | 1. 58|
| 0. 79 | 0. 428 | 22 . Have you r ecent l y had backaches | 265 | 0. 63|
| 0. 78 | 0. 217 | 11 . Ar e you sat i sf i ed of your housi ng | 263 | 1. 49|
| 0. 65 | 0. 257 | 37 . Ar e you wor r i ed about t he r i sk of a muggi ng | 263 | 1. 35|
| 0. 45 | 0. 327 | 13 . Occupat i on st at us of housi ng | 262 | 1. 16|
| 0. 22 | 0. 412 | 27 . Do you have chi l dr en | 264 | 0. 88|
| 0. 13 | 0. 446 | 38 . Ar e you wor r i ed about t he r i sk of a r oad acci dent | 263 | 0. 89|
| 0. 10 | 0. 459 | 6 . Mar i t al st at us | 262 | 0. 91|
| 0. 08 | 0. 469 | 9 . Opi ni on about mar r i age | 263 | 0. 85|
| - 0. 15 | 0. 561 | 32 . Your opi ni on on t he l i f e condi t i ons i n t he f ut ur e | 261 | 0. 79|
| - 0. 21 | 0. 585 | 12 . Ar e you sat i sf i ed of your dai l y l i f e | 263 | 0. 65|
DESCO - Automatic Characterization of a continuous variable
24
| - 0. 23 | 0. 591 | 14 . The housi ng expenses ar e f or you | 260 | 0. 77|
| - 0. 53 | 0. 702 | 10 . Housekeepi ng wor ks, t ake car e of chi l dr en. . . | 263 | 0. 47|
| - 0. 59 | 0. 724 | 2 . Ur ban ar ea si ze ( number of i nhabi t ant s) | 258 | 0. 66|
| - 0. 64 | 0. 740 | 48 . Your opi ni on on t he j ust i ce r unni ng i n 1986 | 261 | 0. 55|
+- - - - - - - - - - - - +- - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - +

TOTAL COUNT 315
TOTAL WEI GHT 315. 00
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - +
| NUM . I DEN - LABEL COUNT WEI GHT | MEAN STD. DEV. | MI NI MUM MAXI MUM |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - +
| 4 . Age - Age of r espondent 267 267. 00 | 43. 61 16. 88 | 18. 00 83. 00 |
| 26 . Nbpr - Number of per sons i n 267 267. 00 | 3. 04 1. 43 | 1. 00 8. 00 |
| 28 . Nbef - Number of chi l dr en 267 267. 00 | 1. 85 1. 69 | 0. 00 9. 00 |
| 41 . Fami - Fami l y, chi l dr en : i 267 267. 00 | 6. 65 1. 07 | 1. 00 7. 00 |
| 42 . Tr av - Wor k, pr of essi on : i 267 267. 00 | 5. 90 1. 57 | 1. 00 7. 00 |
| 43 . Loi s - Fr ee t i me, r el ax: i m 267 267. 00 | 5. 30 1. 43 | 0. 00 7. 00 |
| 44 . Ami s - Fr i ends, acquai nt anc 267 267. 00 | 5. 18 1. 41 | 1. 00 7. 00 |
| 45 . Par t - Rel at i ves, br ot her s, 267 267. 00 | 5. 63 1. 44 | 1. 00 7. 00 |
| 46 . Rel i - Rel i gi on : i mpor t anc 267 267. 00 | 3. 15 1. 96 | 1. 00 7. 00 |
| 47 . Pol i - Pol i t i c, pol i t i cal l 267 267. 00 | 3. 15 1. 79 | 1. 00 7. 00 |
| 50 . Pr Fm- St at e benef i t s : ave 244 244. 00 | 583. 10 966. 04 | 0. 00 5100. 00 |
| 51 . Sal r - Sal ar y of t he r espon 267 267. 00 | 4408. 55 4575. 34 | 0. 00 40000. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - +

CORRELATI ONS WI TH CONTI NUOUS VARI ABLES
OF VARI ABLE : Salary of the respondent
+- - - - - - - - - - - - +- - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - +
| TEST- VALUE | PROB. | CORRELATI ON | NUM . VARI ABLE LABEL | WEI GHT |
+- - - - - - - - - - - - +- - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - +
| 99. 90 | 0. 000 | 1. 000 | 51 . Sal ar y of t he r espondent | 267. 000 |
| - 2. 53 | 0. 006 | - 0. 162 | 50 . St at e benef i t s : aver age mont hl y amount | 244. 000 |
+- - - - - - - - - - - - +- - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - +

25

TABLE - CROSS TABLES

With this procedure, you can obtain in one go an unlimited number of tables for members,
means or frequencies.

THE TABLES TAB

This tab allows you to define the cross tables to create.

The tables cells can display weights, % raw, % column, average and standard deviation
depending on the parameters and settings.

The scrolling menu allows you to define the cross tables you want to display with or
without supplementary information such as mean or frequency related to another
variable.

If a variable appears in the Means column, each cell of the cross table will display the
weighted average corresponding to the cases of the cell.

If a variable appears in the Frequencies column, each cell of the cross table will display
the weighted sum of the values of the variable for the cases of the cell.

By clicking on local filter, you can define a specific filter for each command.
TABLE - Cross tables
26
THE PARAMETERS TAB

27
THE TABLE RESULTS

CROSS-TABS
LI ST OF COMMANDS
COMMAND 1
TABLE 1 BY ROW : 9 . Opi ni on about mar r i age
BY COLUMN : 3 . Sex of r espondent
COMMAND 2
TABLE 2 BY ROW : 9 . Opi ni on about mar r i age
BY COLUMN : 3 . Sex of r espondent
MEANS OF : 4 . Age of r espondent
LI ST OF CROSS- TABS
TABLE 1 BY ROW : Opinion about marriage TOTAL WEI GHT: 315.
BY COLUMN : Sex of respondent
WEI GHT | mal e | f emal e | OVERALL
COLUMN PERC. | | |
ROWPERC. | | |
- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -
| 41 | 40 | 81
i ndi ssol ubl e | 29. 71 | 22. 60 | 25. 71
| 50. 62 | 49. 38 | 100. 00
- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -
| 39 | 69 | 108
di ssol ved ser i ous pb | 28. 26 | 38. 98 | 34. 29
| 36. 11 | 63. 89 | 100. 00
- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -
| 50 | 64 | 114
di ssol ved i f agr eem | 36. 23 | 36. 16 | 36. 19
| 43. 86 | 56. 14 | 100. 00
- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -
| 8 | 4 | 12
I do not know | 5. 80 | 2. 26 | 3. 81
| 66. 67 | 33. 33 | 100. 00
- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -
| 138 | 177 | 315
OVERALL | 100. 00 | 100. 00 | 100. 00
| 43. 81 | 56. 19 | 100. 00
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
KHI 2 = 6. 67 / 3 DEGREES OF FREEDOM / 0 EXPECTED FREQUENCI ES LESS THAN 5
PROB. ( KHI 2 > 6. 67 ) = 0. 083 / TEST- VALUE = 1. 38
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TABLE 2 BY ROW : Opinion about marriage TOTAL WEI GHT: 315.
BY COLUMN : Sex of respondent
MEANS OF : Age of respondent
WEI GHT | mal e | f emal e | OVERALL
MEAN | | |
STD. DEV. | | |
- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -
| 41 | 40 | 81
i ndi ssol ubl e | 45. 829 | 48. 325 | 47. 062
| 17. 234 | 17. 084 | 17. 206
- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -
| 39 | 69 | 108
di ssol ved ser i ous pb | 43. 000 | 46. 362 | 45. 148
| 14. 739 | 18. 260 | 17. 148
- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -
| 50 | 64 | 114
di ssol ved i f agr eem | 41. 300 | 38. 484 | 39. 719
| 15. 442 | 14. 330 | 14. 893
- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -
| 8 | 4 | 12
I do not know | 50. 250 | 41. 250 | 47. 250
| 15. 618 | 8. 842 | 14. 377
- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -
| 138 | 177 | 315
OVERALL | 43. 645 | 43. 842 | 43. 756
| 16. 007 | 17. 015 | 16. 581

BIVAR - Bivariate Analysis
28

BIVAR - BIVARIATE ANALYSIS

The BIVAR procedure lets you characterize a sample from the viewpoint of two particular
continuous variables (AXES variables or base variables). The sample can be described by
categorical variables and by other continuous variables.

THE VARIABLES TAB

With this tab, the SPAD user selects the two continuous variables for the bivariate
analysis.

It is possible to include in the analysis some supplementary variables (whether continuous
or categorical).

The graph editor of the BIVAR method is the same that is used for factorial analyses.
The capabilities of the graph editor will be described in the section Factorial analyses.

29

BIVAR - Bivariate Analysis
30

FACTORIAL ANALYSES WITH SPAD

PCA : Principal Component Analysis (PCA)

SCA : Simple Correspondence Analysis (SCA)

MCA : Multiple Correspondence Analysis (MCA)

DEFAC : Factors description

SPAD provides the main techniques in multidimensional exploratory analysis, combined
with procedures for clustering. One area of application concerns the processing of large-
scale surveys in market research and socio-economic research.

The main applications of factorial analyses are: (1) to reduce the number of dimensions
and (2) to detect structure in the relationships between variables. Therefore, factor analysis
is applied as a data reduction or structure detection method.

Factorial Analyses with SPAD
31
VOCABULARY

Active Variables Variables used to perform the factorial analysis

Supplementary variables Variables that are not used to perform the original analysis
but used to illustrate the main results of the analysis.

Contribution Criteria that measures the contribution of an element
(category, variable, frequency or case) to the inertia (total
inertia, dimensions inertia)

Cosines Criteria that measures the quality of representation of an
element (category, variable, case or frequency) for each
dimension.

Axes, factors, dimensions These terms correspond to the factors computed or
extracted by the analysis. Consecutive factors are
uncorrelated or orthogonal to each other. Factors are
consecutively extracted by maximizing the remaining
variability in the active data.

PCA - Principal Component Analysis
32

PCA - PRINCIPAL COMPONENT ANALYSIS

This method performs the principal component analysis of a sample of cases described
with continuous variables. The analysis can be performed on original variables or normed
variables (centered and normalized) whether the active variables are on the same scale or
not.
It is possible to introduce supplementary elements such as: cases, other continuous
variables or categorical variables.

Import the Sba dataset Cars.sba.
Drag and drop the PCA method on the Cars dataset as follows.

The two goals of the analysis are:

Capture the main interrelationships between correlated variables in small number
of summary characteristics: dimension reduction

Identify automobile models with similar attributes: Useful step for developing
clustering or classification model

The dataset contains measurements on 6 variables for 24 models: cubic capacity, power,
speed, weight, length and width.

Due to strong differences in measurement scales, we will perform a PCA on normed
variables.
KIDEN
Cubic
capacity Power Speed Weight Length Width
Honda civic 1396 90 174 850 369 166
Peugeot 205 Rallye 1294 103 189 805 370 157
Seat Ibiza SX I 1461 100 181 925 363 161
Citron AX Sport 1294 95 184 730 350 160
Renault 19 1721 92 180 965 415 169
Fiat Tipo 1580 83 170 970 395 170
Peugeot 405 1769 90 180 1080 440 169
Renault 21 2068 88 180 1135 446 170
Citron BX 1769 90 182 1060 424 168
Opel Omega 1998 122 190 1255 473 177
Peugeot 405 Break 1905 125 194 1120 439 171
Ford Sierra 1993 115 185 1190 451 172
33
Renault Espace 1995 120 177 1265 436 177
Nissan Vanette 1952 87 144 1430 436 169
VW Caravelle 2109 112 149 1320 457 184
Audi 90 Quattro 1994 160 214 1220 439 169
BMW 530i 2986 188 226 1510 472 175
Rover 827i 2675 177 222 1365 469 175
Renault 25 2548 182 226 1350 471 180
BMW 325iX 2494 171 208 1600 432 164
Ford Scorpio 2933 150 200 1345 466 176
Fiat Uno 1116 58 145 780 364 155
Peugeot 205 1580 80 159 880 370 156
Ford Fiesta 1117 50 135 810 371 162

The matrix plot, performed with the STATS method, is a good overview of the pair wise
relationships between variables.

34

The SETTING OPTIONS

THE VARIABLES TAB

This tab allows the SPAD user to define the following elements:

Active continuous variables
Supplementary continuous variables
Supplementary categorical variables

In our example, we select all the available continuous variables as active. We do not have
any more available variable for supplementary information.

35
THE CASES TAB

The Cases tab allows you to define the role of the cases in the analysis.

The cases retained are the ACTIVE cases, those not retained are called ILLUSTRATIVES
or SUPPLEMENTARY. By using the selections by list or interval, we can also define the
ABANDONNED cases (which are neither active nor illustrative).

All the calculations that lead to the factorial planes, to the hierarchical classification tree
and to the final partitions are carried out only on the active cases. The illustrative cases
may be projected onto the factorial planes constructed, and re-assigned during the
partition into classes, of which they are the closest or form a missing data class.

The cases abandoned are completely ignored in the calculations and affected automatically
to a missing data class in the partitions.

If you conduct many analyses on a particular sub-population, it may be preferable to
create a BASE corresponding this one. To do this, use the Recoding chain in the Tools
menu.

In the Cars example, we select all the cases as active.

THE PARAMETERS TAB

NORMED PCA AND NOT NORMED PCA
Cases coordinates are not
displayed by default.
36

Normed PCA means that all the active variables are previously centered and standardized
by SPAD. The consequence is that all the variables are assigned the same contribution to
the overall inertia.
When the PCA is not normed (only centered), the distance between the variable and the
origin is equal to the variance of the variable.

Most of the time, it is advised to perform a normed analysis in order to assign the same
importance to each active variable. It is particularly recommended when the
measurements scales are different.

In our example, we can see that the measurements scales are strongly different. Thus, we
will perform a normed PCA.

RETAINED COORDINATES

The number of retained coordinates is useful for the methods that follow the PCA in the
chain. These methods can be DEFAC (factors description) and RECIP/SEMIS (clustering).

37
THE PCA RESULTS

PRINCIPAL COMPONENTS ANALYSIS
TOTAL COUNT : 24 TOTAL WEI GHT : 24. 00
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +
| NUM . I DEN - LABEL COUNT WEI GHT | MEAN STD. DEV. | MI NI MUM MAXI MUM |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +
| 1 . CYLI - Cubi c capaci t y 24 24. 00 | 1906. 13 516. 79 | 1116. 00 2986. 00 |
| 2 . PUI S - Power 24 24. 00 | 113. 67 37. 97 | 50. 00 188. 00 |
| 3 . VI TE - Speed 24 24. 00 | 183. 08 24. 68 | 135. 00 226. 00 |
| 4 . POI D - Wei ght 24 24. 00 | 1123. 33 243. 20 | 730. 00 1600. 00 |
| 5 . LONG - Lengt h 24 24. 00 | 421. 58 40. 47 | 350. 00 473. 00 |
| 6 . LARG - Wi dt h 24 24. 00 | 168. 83 7. 49 | 155. 00 184. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +
CORRELATI ON MATRI X
| CYLI PUI S VI TE POI D LONG LARG
- - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CYLI | 1. 00
PUI S | 0. 86 1. 00
VI TE | 0. 69 0. 89 1. 00
POI D | 0. 90 0. 77 0. 51 1. 00
LONG | 0. 86 0. 69 0. 53 0. 86 1. 00
LARG | 0. 71 0. 55 0. 36 0. 70 0. 86 1. 00
- - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The linear correlation coefficient points out the intensity of the relationship between two
continuous variable. The coefficient correlation ranges from 1 to 1. The closer the
correlation coefficient is to +1 or -1, the more closely the two variables are related.

TEST- VALUES MATRI X
- - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CYLI | 99. 99
PUI S | 6. 35 99. 99
VI TE | 4. 19 7. 06 99. 99
POI D | 7. 14 4. 99 2. 74 99. 99
LONG | 6. 42 4. 14 2. 90 6. 40 99. 99
LARG | 4. 34 3. 05 1. 86 4. 25 6. 41 99. 99
- - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

This matrix is related to the previous one. SPAD translates the test of correlation in terms
of test-value. In this example, the higher is the test-value, the more closely are the two
variables. We can consider that a test-value lower than 2 means no linear relationship
between the two variables.

EI GENVALUES
COMPUTATI ONS PRECI SI ON SUMMARY : TRACE BEFORE DI AGONALI SATI ON. . 6. 0000
SUM OF EI GENVALUES. . . . . . . . . . . . 6. 0000
HI STOGRAM OF THE FI RST 6 EI GENVALUES
+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| NUMBER | EI GENVALUE | PERCENTAGE | CUMULATED | |
| | | | PERCENTAGE | |
+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| 1 | 4. 6173 | 76. 96 | 76. 96 | ********************************************************************************|
| 2 | 0. 8788 | 14. 65 | 91. 60 | **************** |
| 3 | 0. 3035 | 5. 06 | 96. 66 | ****** |
| 4 | 0. 1055 | 1. 76 | 98. 42 | ** |
| 5 | 0. 0732 | 1. 22 | 99. 64 | ** |
| 6 | 0. 0216 | 0. 36 | 100. 00 | * |
+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

In the second column (Eigenvalue) above, we find the variance on the new factors that
were successively extracted. In the third column, these values are expressed as a percent of
the total variance. As we can see, factor 1 accounts for 77 percent of the variance, factor 2
for 15 percent, and so on. As expected, the sum of the eigenvalues is equal to the number
38
of variables. The third column contains the cumulative variance extracted. The variances
extracted by the factors are called the eigenvalues. This name derives from the
computational issues involved.

Eigenvalues and the Number-of-Factors Problem
Now that we have a measure of how much variance each successive factor extracts, we can
return to the question of how many factors to retain. By its nature this is an arbitrary
decision. However, there are some guidelines that are commonly used, and that, in
practice, seem to yield the best results.
The Kaiser criterion. First, we can retain only factors with eigenvalues greater than 1. In
essence this is like saying that, unless a factor extracts at least as much as the equivalent of
one original variable, we drop it. This criterion was proposed by Kaiser (1960), and is
probably the one most widely used. In our example above, using this criterion, we would
retain 1 factor (principal component).
The scree test. A graphical method is the scree test first proposed by Cattell (1966). We can
plot the eigenvalues shown above in a simple line plot.

0,0
1,0
2,0
3,0
4,0
5,0
1 2 3 4 5 6

Cattell suggests to find the place where the smooth decrease of eigenvalues appears to
level off to the right of the plot. To the right of this point, presumably, one finds only
"factorial scree" -- "scree" is the geological term referring to the debris which collects on the
lower part of a rocky slope. According to this criterion, we would probably retain 1 or 2
factors in our example.

RESEARCH OF I RREGULARI TI ES ( THI RD DI FFERENCES)
+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| I RREGULARI TY | I RREGULARI TY | |
| BETWEEN | VALUE | |
+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| 1 - - 2 | - 2785. 86 | **************************************************** |
+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
RESEARCH OF I RREGULARI TI ES ( SECOND DI FFERENCES)
+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| 1 - - 2 | 3163. 20 | **************************************************** |
| 2 - - 3 | 377. 34 | ******* |
+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
39
ANDERSON'S LAPLACE I NTERVALS
WI TH 0.95 THRESHOLD
+- - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| NUMBER | LOWER LI MI T EI GENVALUE UPPER LI MI T |
+- - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| 1 | 1. 9486 4. 6173 7. 2860 |
| 2 | 0. 3709 0. 8788 1. 3868 |
| 3 | 0. 1281 0. 3035 0. 4789 |
| 4 | 0. 0445 0. 1055 0. 1665 |
| 5 | 0. 0309 0. 0732 0. 1154 |
+- - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
LENGTH AND RELATI VE POSI TI ON OF I NTERVALS
1 . . . . . . . . . . . . . . . . . *- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - *.
2 . . . *- - - - - - - - +- - - - - - - - *. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 . *- - +- - *. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 *+* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 +*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Third and second differences as well as Andersons laplace intervals are other guidelines
to help the SPAD User to choose the number of dimensions to retain for further analyses.

LOADI NGS OF VARI ABLES ON AXES 1 TO 5
ACTI VE VARI ABLES
- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
VARI ABLES | LOADI NGS | VARI ABLE- FACTOR CORRELATI ONS | NORMED EI GENVECTORS
- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
I DEN - SHORT LABEL | 1 2 3 4 5 | 1 2 3 4 5 | 1 2 3 4 5
- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CYLI - Cubi c capaci t y| 0. 96 0. 01 - 0. 15 0. 04 - 0. 23 | 0. 96 0. 01 - 0. 15 0. 04 - 0. 23 | 0. 45 0. 01 - 0. 27 0. 11 - 0. 84
PUI S - Power | 0. 90 0. 38 - 0. 02 - 0. 16 0. 04 | 0. 90 0. 38 - 0. 02 - 0. 16 0. 04 | 0. 42 0. 41 - 0. 03 - 0. 49 0. 15
VI TE - Speed | 0. 75 0. 62 0. 20 0. 08 0. 04 | 0. 75 0. 62 0. 20 0. 08 0. 04 | 0. 35 0. 66 0. 37 0. 26 0. 13
POI D - Wei ght | 0. 91 - 0. 18 - 0. 35 - 0. 06 0. 11 | 0. 91 - 0. 18 - 0. 35 - 0. 06 0. 11 | 0. 42 - 0. 19 - 0. 63 - 0. 18 0. 42
LONG - Lengt h | 0. 92 - 0. 30 0. 05 0. 22 0. 07 | 0. 92 - 0. 30 0. 05 0. 22 0. 07 | 0. 43 - 0. 32 0. 10 0. 69 0. 26
LARG - Wi dt h | 0. 80 - 0. 48 0. 34 - 0. 14 - 0. 02 | 0. 80 - 0. 48 0. 34 - 0. 14 - 0. 02 | 0. 37 - 0. 51 0. 62 - 0. 42 - 0. 06
- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

For normed PCA, correlations (variable factor) and loadings are equivalent.
Apparently, the first factor is generally more highly correlated with the variables than the
second factor. This is to be expected because, as previously described, these factors are
extracted successively and will account for less and less variance overall.

Normed eigen vectors are the coefficients that describe the linear relationship between the
active normed variables and the factors: in this example, we have:

... 35 . 0
) (
) (
42 . 0
) (
) (
45 . 0 1 +

=
PUIS STDEV
PUIS Mean PUIS
CYLI STDEV
CYLI Mean CYLI
Factor

Note:
SPAD does not print out neither the contributions nor the cosinus for the active variables.
However, it is possible to calculate them this way:

) , ( ) , ( j Loading j Cos = for a normed PCA
) , ( ) , ( j n Correlatio j Cos = for both normed and not normed PCA
and
) , ( ) , ( j nVector NormedEige j on Contributi =

40
FACTOR SCORES, CONTRI BUTI ONS AND SQUARED COSI NES OF CASES
AXES 1 TO 5
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +
| CASES | FACTOR SCORES | CONTRI BUTI ONS | SQUARED COSI NES |
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - |
| I DENTI FI ER REL. WT. DI STO | 1 2 3 4 5 | 1 2 3 4 5 | 1 2 3 4 5 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +
| Honda ci vi c 4. 17 4. 59 | - 2. 01 0. 32 0. 50 - 0. 44 - 0. 10 | 3. 6 0. 5 3. 4 7. 6 0. 6 | 0. 88 0. 02 0. 05 0. 04 0. 00 |
| Peugeot 205 Ral l ye 4. 17 7. 37 | - 2. 25 1. 49 0. 14 0. 09 0. 19 | 4. 6 10. 6 0. 3 0. 3 2. 1 | 0. 69 0. 30 0. 00 0. 00 0. 00 |
| Seat I bi za SX I 4. 17 4. 73 | - 1. 92 0. 94 - 0. 06 - 0. 36 0. 00 | 3. 3 4. 2 0. 1 5. 0 0. 0 | 0. 78 0. 19 0. 00 0. 03 0. 00 |
| Ci t r on AX Spor t 4. 17 8. 78 | - 2. 60 1. 29 0. 47 - 0. 32 - 0. 15 | 6. 1 7. 9 3. 0 4. 0 1. 2 | 0. 77 0. 19 0. 02 0. 01 0. 00 |
| Renaul t 19 4. 17 0. 92 | - 0. 78 - 0. 16 0. 48 0. 20 - 0. 12 | 0. 6 0. 1 3. 1 1. 6 0. 8 | 0. 66 0. 03 0. 25 0. 04 0. 01 |
| Fi at Ti po 4. 17 2. 18 | - 1. 30 - 0. 43 0. 43 - 0. 22 - 0. 10 | 1. 5 0. 9 2. 5 2. 0 0. 6 | 0. 77 0. 09 0. 08 0. 02 0. 00 |
| Peugeot 405 4. 17 0. 71 | - 0. 30 - 0. 46 0. 21 0. 58 0. 16 | 0. 1 1. 0 0. 6 13. 1 1. 4 | 0. 12 0. 30 0. 06 0. 47 0. 04 |
| Renaul t 21 4. 17 0. 96 | 0. 15 - 0. 64 0. 01 0. 67 - 0. 21 | 0. 0 1. 9 0. 0 17. 8 2. 5 | 0. 02 0. 42 0. 00 0. 47 0. 05 |
| Ci t r on BX 4. 17 0. 54 | - 0. 52 - 0. 20 0. 17 0. 40 0. 04 | 0. 2 0. 2 0. 4 6. 2 0. 1 | 0. 50 0. 07 0. 06 0. 29 0. 00 |
| Opel Omega 4. 17 3. 25 | 1. 45 - 0. 79 0. 51 0. 31 0. 42 | 1. 9 3. 0 3. 5 3. 7 10. 0 | 0. 64 0. 19 0. 08 0. 03 0. 05 |
| Peugeot 405 Br eak 4. 17 0. 55 | 0. 57 0. 13 0. 39 0. 15 0. 19 | 0. 3 0. 1 2. 0 0. 9 2. 1 | 0. 58 0. 03 0. 27 0. 04 0. 07 |
| For d Si er r a 4. 17 0. 82 | 0. 70 - 0. 43 0. 14 0. 30 0. 16 | 0. 4 0. 9 0. 3 3. 5 1. 4 | 0. 60 0. 23 0. 02 0. 11 0. 03 |
| Renaul t Espace 4. 17 1. 77 | 0. 86 - 0. 87 0. 20 - 0. 44 0. 13 | 0. 7 3. 6 0. 5 7. 7 0. 9 | 0. 42 0. 43 0. 02 0. 11 0. 01 |
| Ni ssan Vanet t e 4. 17 4. 73 | - 0. 11 - 1. 69 - 1. 33 - 0. 05 0. 24 | 0. 0 13. 6 24. 4 0. 1 3. 3 | 0. 00 0. 61 0. 38 0. 00 0. 01 |
| VWCar avel l e 4. 17 7. 58 | 1. 14 - 2. 39 0. 21 - 0. 69 - 0. 06 | 1. 2 27. 1 0. 6 18. 7 0. 2 | 0. 17 0. 75 0. 01 0. 06 0. 00 |
| Audi 90 Quat t r o 4. 17 3. 43 | 1. 39 1. 10 0. 19 - 0. 03 0. 48 | 1. 7 5. 7 0. 5 0. 0 13. 0 | 0. 56 0. 35 0. 01 0. 00 0. 07 |
| BMW530i 4. 17 15. 98 | 3. 88 0. 85 - 0. 35 - 0. 04 - 0. 30 | 13. 6 3. 4 1. 7 0. 1 5. 1 | 0. 94 0. 04 0. 01 0. 00 0. 01 |
| Rover 827i 4. 17 10. 52 | 3. 15 0. 75 0. 13 0. 05 - 0. 13 | 8. 9 2. 7 0. 2 0. 1 0. 9 | 0. 94 0. 05 0. 00 0. 00 0. 00 |
| Renaul t 25 4. 17 12. 39 | 3. 39 0. 57 0. 71 - 0. 23 0. 07 | 10. 4 1. 5 6. 9 2. 1 0. 3 | 0. 93 0. 03 0. 04 0. 00 0. 00 |
| BMW325i X 4. 17 8. 92 | 2. 20 1. 17 - 1. 59 - 0. 24 0. 32 | 4. 4 6. 5 34. 6 2. 3 6. 0 | 0. 54 0. 15 0. 28 0. 01 0. 01 |
| For d Scor pi o 4. 17 8. 28 | 2. 74 - 0. 15 - 0. 19 0. 13 - 0. 83 | 6. 8 0. 1 0. 5 0. 6 39. 1 | 0. 91 0. 00 0. 00 0. 00 0. 08 |
| Fi at Uno 4. 17 14. 29 | - 3. 73 0. 03 - 0. 50 0. 19 0. 01 | 12. 6 0. 0 3. 5 1. 4 0. 0 | 0. 97 0. 00 0. 02 0. 00 0. 00 |
| Peugeot 205 4. 17 7. 70 | - 2. 60 0. 46 - 0. 72 0. 12 - 0. 39 | 6. 1 1. 0 7. 1 0. 6 8. 4 | 0. 88 0. 03 0. 07 0. 00 0. 02 |
| For d Fi est a 4. 17 12. 99 | - 3. 49 - 0. 87 - 0. 13 - 0. 11 - 0. 03 | 11. 0 3. 6 0. 2 0. 5 0. 1 | 0. 94 0. 06 0. 00 0. 00 0. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +

DISTO : the distance between the case and the center of gravity of the overall sample. This
is helpful to determine the Average cars, (close to the center of gravity) and the more
specific ones that are far from the center of gravity.

41
THE FACTORIAL GRAPH EDITOR

To access the factorial graph editor, click on this icon .

To create a new factorial graph, select Graph - New , the following window
appears:

The preselection step allows you to select the different elements to display in the graph:

Active or supplementary cases
Active or supplementary variables

If you forget to select an element, you have to create a new graph and redo the
preselection.

THE TOOL BAR OF THE GRAPH EDITOR

Points Total Delete Cancel
selection Unselection the labels the ghosts

Factors Framing Write Set
selection selection the labels as ghost

42
Information Vertical Correlation
on points symmetric view circle

Refresh Horizontal
symmetric view

SAVE A GRAPH

Internal save is dependent on the chain.
In the case of a re-execution of the chain, or the deletion by the user of the results of the
chain, these internal saves are deleted.
This type of save uses the commands:
Save
Save as internal save of the graphics menu.

When you save in internal format, you give a TITLE to the saved graphic.
Later you can reload this save with the command Open Internal save graphics menu.

The utility of the Save in Internal Format is that all the functions of the annotations and
properties of the factorial planes remain available.

The save in archive format is a save, which is independent of the chain.

This type of save is made using the command Save as Save archive on the graphics
menu.
When saving in archive format, you give a NAME to the graphic saved with the obligatory
extension .GFA.

Later, you can recover this save with the command Open -Save archive in the Graphics
menu.
This save is independent of the chain. Some formats are no longer possible in this type of
save, in particular the formatting of cases.

The editor for the factorial planes also lets you save the graphics in .BMP or .PCX format.
These images can then be inserted into a word processor document.
The EMF Metafile format gives the best image quality.
This type of Save is made with the command Save as - Screen Image BMP/PCX.
43
GENERAL PRINCIPLES

The construction of a graphic after an analysis requires the following general principles:

Go to the New Graphics Menu, which opens the pre-selections Dialogue Box.
For a single analysis, you can open several graphics at once through the Graphics Menu
and make different pre-selections. All the graphics you create can be saved in an internal
or the archive format.

To modify your graph, apply the following rule:

Select the points with the tool bar or the selection menu
Format them with the format menu
Deselect to see the effect of the embellishments.

IMPORTANT
To manipulate (move, change etc.) the labels and the texts on a graphic, enlarge the frame.
For this you have to be in standard mode, that is: no selection mode button is highlighted,
and the status bar is empty.

44

45

SCA - SIMPLE CORRESPONDENCE ANALYSIS

This procedure performs a simple correspondence analysis (SCA) on a contingency table
or a table with non-negative numbers.

Simple correspondence analysis is a powerful statistical tool for the graphical analysis of
contingency tables.

The result of a simple correspondence analysis is a two-dimensional graphical
representation of the association between rows and columns of the table.
The plot contains a point for each row and each column of the table. Rows with similar
patterns of counts produce points that are close together, and columns with similar
patterns of counts produce points that are close together.

Simple correspondence analysis analyzes a contingency table made up of one or more
column variables and one or more row variables.

To illustrate this method, consider the following dataset, a typical two-dimensional
contingency table. The data deal with the perception of different kinds of alcohol.

Select the SPAD dataset ALCOOL.SBA and import it.

PASTIS WHISKY MARTINI SUZE VODKA GIN MALIBU BEER
Like the taste 49 50 42 18 25 23 25 59
With friends 83 83 76 60 69 68 69 74
To relax oneself 61 61 51 32 38 39 39 72
Become expensive 60 88 42 41 75 70 61 19
Refreshing 78 22 18 19 17 19 14 80
Not elegant 26 11 13 17 13 11 13 29
Friendly product 64 64 56 34 45 42 46 68
Good before meals 88 79 85 64 45 46 37 41
Good during the day 24 21 12 10 13 12 13 85
Good during evening 7 61 12 11 53 50 48 54
For all year long 83 87 85 79 83 82 80 90
Liked by youngs 45 77 36 16 65 69 76 89
Good for guests 88 92 87 60 70 67 67 81
Oldy, not trendy 12 4 13 38 5 6 8 7
As well for men as for women 50 62 69 43 49 51 61 60
Close to me 38 41 27 11 16 18 17 49
By habits 36 30 24 16 19 19 17 40
Make snobish 3 35 9 8 28 25 21 4
We can mix it 43 87 29 32 82 80 43 40
For night life / bars / nightclubs 12 91 27 16 84 81 72 67
SCA - Simple correspondence analysis
46
The SETTING OPTIONS

THE COLUMNS TAB

Active frequencies: all

THE ROWS TAB

This tab is exactly similar to the Cases tabs available for the descriptive statistics
methods.

47
THE PARAMETERS TAB

In order to display the rows
results in excel sheets, click
on the Options button
and select Yes
SCA - Simple correspondence analysis
48
THE SCA RESULTS

SIMPLE CORRESPONDENCE ANALYSIS
EI GENVALUES
+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| 1 | 0. 0664 | 49. 37 | 49. 37 | ******************************************************************************** |
| 2 | 0. 0449 | 33. 34 | 82. 72 | ******************************************************* |
| 3 | 0. 0124 | 9. 24 | 91. 96 | *************** |
| 4 | 0. 0069 | 5. 14 | 97. 09 | ********* |
| 5 | 0. 0029 | 2. 18 | 99. 27 | **** |
| 6 | 0. 0008 | 0. 63 | 99. 90 | ** |
| 7 | 0. 0001 | 0. 10 | 100. 00 | * |
+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
COORDI NATES, CONTRI BUTI ONS OF FREQUENCI ES ON AXES 1 TO 5
ACTI VE FREQUENCI ES
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +
| FREQUENCI ES | COORDI NATES | CONTRI BUTI ONS | SQUARED COSI NES |
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - |
| I DEN LABEL REL. WT DI STO | 1 2 3 4 5 | 1 2 3 4 5 | 1 2 3 4 5 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +
| PAST - PASTI S 13. 12 0. 17 | - 0. 36 - 0. 05 0. 16 0. 11 - 0. 04 | 26. 3 0. 6 26. 5 23. 5 8. 3 | 0. 76 0. 01 0. 14 0. 07 0. 01 |
| WHI S - WHI SKY 15. 83 0. 05 | 0. 19 0. 02 0. 09 - 0. 02 0. 09 | 8. 4 0. 1 9. 7 0. 6 39. 5 | 0. 67 0. 01 0. 15 0. 00 0. 14 |
| MART - MARTI NI 11. 23 0. 11 | - 0. 17 - 0. 21 0. 09 - 0. 17 0. 00 | 4. 9 10. 5 7. 2 49. 7 0. 0 | 0. 26 0. 38 0. 07 0. 28 0. 00 |
| SUZE - SUZE 8. 63 0. 30 | - 0. 22 - 0. 43 - 0. 24 0. 05 0. 04 | 6. 3 35. 6 40. 7 3. 2 3. 9 | 0. 16 0. 62 0. 20 0. 01 0. 00 |
| VODK - VODKA 12. 35 0. 10 | 0. 30 0. 00 - 0. 01 0. 06 0. 00 | 16. 8 0. 0 0. 0 7. 2 0. 0 | 0. 94 0. 00 0. 00 0. 04 0. 00 |
| GI N - GI N 12. 13 0. 08 | 0. 28 0. 00 - 0. 01 0. 06 - 0. 01 | 14. 3 0. 0 0. 1 5. 9 0. 7 | 0. 94 0. 00 0. 00 0. 04 0. 00 |
| MALI - MALI BU 11. 42 0. 07 | 0. 21 0. 02 - 0. 06 - 0. 07 - 0. 11 | 7. 9 0. 1 3. 0 8. 7 45. 9 | 0. 67 0. 00 0. 05 0. 08 0. 17 |
| BI ER - BEER 15. 30 0. 23 | - 0. 26 0. 39 - 0. 10 - 0. 02 0. 02 | 15. 2 53. 1 12. 7 1. 1 1. 7 | 0. 28 0. 67 0. 04 0. 00 0. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +
COORDI NATES, CONTRI BUTI ONS AND SQUARED COSI NES OF CASES
AXES 1 TO 5
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +
| CASES | COORDI NATES | CONTRI BUTI ONS | SQUARED COSI NES |
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - |
| I DENTI FI ER REL. WT. DI STO | 1 2 3 4 5 | 1 2 3 4 5 | 1 2 3 4 5 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +
| Li ke t he t ast e 4. 02 0. 08 | - 0. 21 0. 10 0. 12 - 0. 08 0. 06 | 2. 6 0. 9 4. 4 3. 9 5. 2 | 0. 55 0. 13 0. 18 0. 09 0. 05 |
| Wi t h f r i ends 8. 04 0. 01 | - 0. 04 - 0. 10 0. 00 - 0. 01 - 0. 04 | 0. 1 1. 9 0. 0 0. 2 3. 9 | 0. 09 0. 79 0. 00 0. 01 0. 10 |
| To r el ax onesel f 5. 43 0. 03 | - 0. 14 0. 04 0. 04 - 0. 04 0. 02 | 1. 6 0. 2 0. 7 1. 1 0. 7 | 0. 79 0. 07 0. 06 0. 06 0. 02 |
| Become expensi ve 6. 30 0. 12 | 0. 25 - 0. 19 0. 09 0. 11 - 0. 03 | 5. 7 5. 0 4. 2 10. 1 1. 8 | 0. 51 0. 30 0. 07 0. 09 0. 01 |
| Ref r eshi ng 3. 69 0. 48 | - 0. 56 0. 30 0. 07 0. 25 - 0. 07 | 17. 5 7. 3 1. 5 33. 0 6. 9 | 0. 66 0. 19 0. 01 0. 13 0. 01 |
| Not el egant 1. 84 0. 14 | - 0. 32 0. 03 - 0. 12 0. 11 - 0. 08 | 2. 9 0. 0 2. 0 3. 0 3. 9 | 0. 76 0. 01 0. 10 0. 08 0. 05 |
| Fr i endl y pr oduct 5. 79 0. 01 | - 0. 10 0. 00 0. 05 - 0. 04 - 0. 01 | 0. 8 0. 0 1. 2 1. 6 0. 2 | 0. 67 0. 00 0. 18 0. 13 0. 01 |
| Good bef or e meal s 6. 70 0. 14 | - 0. 18 - 0. 30 0. 11 - 0. 03 0. 06 | 3. 1 13. 0 6. 7 0. 8 8. 5 | 0. 23 0. 64 0. 09 0. 01 0. 03 |
| Good dur i ng t he day 2. 62 0. 69 | - 0. 43 0. 66 - 0. 25 - 0. 04 0. 11 | 7. 2 25. 1 13. 0 0. 5 10. 6 | 0. 26 0. 63 0. 09 0. 00 0. 02 |
| Good dur i ng eveni ng 4. 09 0. 25 | 0. 40 0. 26 - 0. 12 - 0. 01 0. 03 | 10. 0 6. 0 5. 1 0. 0 0. 9 | 0. 66 0. 27 0. 06 0. 00 0. 00 |
| For al l year l ong 9. 24 0. 02 | - 0. 02 - 0. 11 - 0. 08 - 0. 01 - 0. 03 | 0. 1 2. 7 4. 2 0. 3 3. 7 | 0. 02 0. 60 0. 26 0. 01 0. 05 |
| Li ked by youngs 6. 53 0. 09 | 0. 17 0. 22 - 0. 02 - 0. 03 - 0. 09 | 2. 8 7. 0 0. 2 0. 7 17. 5 | 0. 33 0. 55 0. 01 0. 01 0. 09 |
| Good f or guest s 8. 45 0. 02 | - 0. 06 - 0. 10 0. 03 - 0. 04 - 0. 01 | 0. 5 1. 7 0. 7 2. 2 0. 2 | 0. 23 0. 57 0. 07 0. 11 0. 00 |
| Ol dy, not t r endy 1. 28 1. 41 | - 0. 46 - 0. 84 - 0. 68 0. 11 0. 08 | 4. 1 20. 2 47. 5 2. 3 2. 9 | 0. 15 0. 50 0. 33 0. 01 0. 00 |
| As wel l f or men as f or w 6. 15 0. 03 | - 0. 01 - 0. 09 - 0. 02 - 0. 14 - 0. 06 | 0. 0 1. 2 0. 3 16. 3 6. 6 | 0. 00 0. 28 0. 02 0. 59 0. 10 |
| Cl ose t o me 3. 00 0. 11 | - 0. 22 0. 19 0. 13 - 0. 05 0. 10 | 2. 2 2. 3 4. 1 1. 0 9. 6 | 0. 42 0. 30 0. 15 0. 02 0. 08 |
| By habi t s 2. 78 0. 05 | - 0. 21 0. 08 0. 06 0. 02 0. 03 | 1. 8 0. 4 0. 8 0. 2 0. 6 | 0. 80 0. 11 0. 06 0. 01 0. 01 |
| Make snobi sh 1. 84 0. 40 | 0. 61 - 0. 09 0. 03 0. 02 0. 09 | 10. 4 0. 4 0. 1 0. 1 4. 6 | 0. 95 0. 02 0. 00 0. 00 0. 02 |
| We can mi x i t 6. 02 0. 13 | 0. 31 - 0. 03 0. 03 0. 16 0. 07 | 8. 6 0. 1 0. 5 22. 3 11. 4 | 0. 72 0. 01 0. 01 0. 19 0. 04 |
| For ni ght l i f e / bar s / 6. 21 0. 23 | 0. 44 0. 18 - 0. 07 - 0. 02 0. 01 | 17. 9 4. 4 2. 7 0. 3 0. 1 | 0. 84 0. 14 0. 02 0. 00 0. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +

49

The following graph has been designed with the SPAD Amado procedure.
Using the SCA results, rows and columns are ranked by decreasing first factor
coordinates. It gives a visual structure to the table. The width of a column is proportional
to its frequency.

28
84
53
82
75
65
49
83
69 70
45
38
45
25
19 16 13 13
5
17
25
81
50
80
70 69
51
82
68 67
42 39
46
23 19 18
11 12
6
19
21
72
48
43
61
76
61
80
69 67
46
39 37
25
17 17 13 13
8
14
35
91
61
87 88
77
62
87 83
92
64 61
79
50
30
41
11
21
4
22
9
27
12
29
42
36
69
85
76
87
56
51
85
42
24 27
13 12 13
18
8
16
11
32
41
16
43
79
60 60
34 32
64
18 16
11
17
10
38
19
4
67
54
40
19
89
60
90
74
81
68 72
41
59
40
49
29
85
7
80
3
12
7
43
60
45
50
83 83
88
64 61
88
49
36 38
26 24
12
78
VODKA
GIN
MALIBU
WHISKY
MARTINI
SUZE
BEER
PASTIS
M
a
k
e

s
n
o
b
i
s
h

F
o
r

n
i
g
h
t

l
i
f
e

/

b
a
r
s

/

n
i
g
h
t
c
l
u
b
s

G
o
o
d

d
u
r
i
n
g

e
v
e
n
i
n
g

W
e

c
a
n

m
i
x

i
t

B
e
c
o
m
e

e
x
p
e
n
s
i
v
e

L
i
k
e
d

b
y

y
o
u
n
g
s

A
s

w
e
l
l

f
o
r

m
e
n

a
s

f
o
r

w
o
m
e
n

F
o
r

a
l
l

y
e
a
r

l
o
n
g

W
i
t
h

f
r
i
e
n
d
s

G
o
o
d

f
o
r

g
u
e
s
t
s

F
r
i
e
n
d
l
y

p
r
o
d
u
c
t

T
o

r
e
l
a
x

o
n
e
s
e
l
f

G
o
o
d

b
e
f
o
r
e

m
e
a
l
s

L
i
k
e

t
h
e

t
a
s
t
e

B
y

h
a
b
i
t
s

C
l
o
s
e

t
o

m
e

N
o
t

e
l
e
g
a
n
t

G
o
o
d

d
u
r
i
n
g

t
h
e

d
a
y

O
l
d
y
.

n
o
t

t
r
e
n
d
y

R
e
f
r
e
s
h
i
n
g

MCA - Multiple Correspondence Analysis
50

MCA - MULTIPLE CORRESPONDENCE ANALYSIS

The multiple correspondence analysis extends the simple correspondence analysis
properties to n-way tables.
The procedure requires more than 2 active categorical variables, observed on a set of cases.
As well as for the other factorial analyses, it is possible to add some supplementary
elements such as illustrative cases, illustrative continuous or categorical variables.

We will perform the MCA on the ASPI1000.SBA dataset.

VARIABLES DESCRIPTION OF THE ASPI1000.SBA DATASET

ACTIVE CATEGORICAL VARIABLES - 7 VARIABLES - 28 CATEGORIES
11 . Gender ( 2 categories )
29 . Do you own securities ? ( 2 categories )
39 . Urban area size (number of inhabitants) ( 5 categories )
49 . Job category ( 5 categories )
51 . Diploma in 5 categories ( 5 categories )
52 . Occupation status of housing in 4 categories ( 4 categories )
53 . Age in 5 categories ( 5 categories )

SUPPLEMENTARY CATEGORICAL VARIABLES - 35 VARIABLES - 152 CATEGORIES
All available categorical variables

SUPPLEMENTARY CONTINUOUS VARIABLES - 8 VARIABLES
All available continuous variables

51
The SETTING OPTIONS

THE VARIABLES TAB

52
THE PARAMETERS TAB

C Random assignment of active categories inferior to (in %)
To assure the robustness of the analysis, it may be useful, on the definition of the
axes of the analysis, to take into account only the categorical variables of a sufficient
weight.
For each question, the cases concerned by a weak total weight category will be
assigned at random to one of the other categories of the variable with a sufficient
weight in the question considered. This cleaning operation allows the data table to
conserve its completely disjunctive property.

The parameter PCMIN fixes the percentage of the total weight of the active cases
below which a category is considered to have a weight too weak. If all the cases
have the weight 1, PCMIN is the percentage of the number of active cases below
which a category will be broken down.

If all the categories for a question (or all except one) have too weak weight, the
question itself will be made illustrative for the calculation of the axes.
The default value (2%) is suitable for most analyses. If the parameter is set to 0.0,
only the categories with a null weight will be eliminated.

C Retained coordinates
The number of retained coordinates is useful for the methods that follow the MCA
in the chain. These methods can be DEFAC (factors description) and RECIP/SEMIS
(clustering).
By default, cases coordinates
are not displayed.
C
C
53
THE MCA RESULTS

MULTIPLE CORRESPONDENCE ANALYSIS

ELI MI NATI ON OF ACTI VE CATEGORI ES WI TH SMALL WEI GHTS

THRESHOLD ( PCMI N) : 2. 00 % WEI GHT: 20. 00
BEFORE CLEANI NG : 7 ACTI VE QUESTI ONS 28 ASSOCI ATE CATEGORI ES
AFTER CLEANI NG : 7 ACTI VE QUESTI ONS 28 ASSOCI ATE CATEGORI ES
TOTAL WEI GHT OF ACTI VE CASES : 1000. 00

MARGI NAL DI STRI BUTI ONS OF ACTI VE QUESTI ONS
- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CATEGORI ES | BEFORE CLEANI NG | AFTER CLEANI NG
I DENT LABEL | COUNT WEI GHT | COUNT WEI GHT HI STOGRAM OF RELATI VE WEI GHTS,
- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
11 . Gender
masc - mal e | 469 469. 00 | 469 469. 00 *****************************
f mi - gender | 531 531. 00 | 531 531. 00 ********************************
- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
29 . Do you own some secur i t i es ?
vmo1 - Yes | 121 121. 00 | 121 121. 00 ********
vmo2 - No | 879 879. 00 | 879 879. 00 *************************************************
- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
39 . Ur ban ar ea si ze ( number of i nhabi t ant s)
agg1 - Lower t han 2. 000 | 83 83. 00 | 83 83. 00 *****
agg2 - 2. 000 - 20. 000 | 87 87. 00 | 87 87. 00 ******
agg3 - 20. 000 - 100. 000 | 175 175. 00 | 175 175. 00 ***********
agg4 - gr eat er t han 100. 000 | 329 329. 00 | 329 329. 00 ********************
agg5 - Par i s | 326 326. 00 | 326 326. 00 ********************
- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
49 . J ob cat egor y
emp1 - Wor ker | 263 263. 00 | 263 263. 00 ****************
emp2 - Empl oyee | 335 335. 00 | 335 335. 00 *********************
emp3 - Manager | 229 229. 00 | 229 229. 00 **************
emp4 - Ot her | 48 48. 00 | 48 48. 00 ==RAND. ASSI GN. ==
49_ - mi ssi ng cat egor y | 125 125. 00 | 125 125. 00 ********
- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
51 . Di pl oma i n 5 cat egor i es
di e1 - No one | 189 189. 00 | 189 189. 00 ************
di e2 - CEP | 321 321. 00 | 321 321. 00 ********************
di e3 - BEPC- BE- BEPS | 158 158. 00 | 158 158. 00 **********
di e4 - Bac - Br evet sup. | 182 182. 00 | 182 182. 00 ***********
di e5 - Uni ver si t y | 150 150. 00 | 150 150. 00 **********
- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
52 . Occupat i on st at us of housi ng i n 4 cat egor i es
sl o1 - homeowner | 120 120. 00 | 120 120. 00 ********
sl o2 - owner | 290 290. 00 | 290 290. 00 ******************
sl o3 - t enant | 523 523. 00 | 523 523. 00 ********************************
sl o4 - f r ee housi ng, ot her | 67 67. 00 | 67 67. 00 *****
- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
53 . Age i n 5 cat egor i es
agc1 - Lower t han 25 yo | 150 150. 00 | 150 150. 00 **********
agc2 - 25 t o 34 yo | 284 284. 00 | 284 284. 00 ******************
agc3 - 35 t o 49 yo | 209 209. 00 | 209 209. 00 *************
agc4 - 50 t o 64 yo | 188 188. 00 | 188 188. 00 ************
agc5 - 65 yo and mor e | 169 169. 00 | 169 169. 00 ***********
- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

54

EI GENVALUES
+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| 1 | 0. 2703 | 9. 46 | 9. 46 | ******************************************************************************** |
| 2 | 0. 2369 | 8. 29 | 17. 75 | *********************************************************************** |
| 3 | 0. 2084 | 7. 29 | 25. 05 | ************************************************************** |
| 4 | 0. 1922 | 6. 73 | 31. 77 | ********************************************************* |
| 5 | 0. 1846 | 6. 46 | 38. 23 | ******************************************************* |
| 6 | 0. 1578 | 5. 52 | 43. 76 | *********************************************** |
| 7 | 0. 1534 | 5. 37 | 49. 13 | ********************************************** |
| 8 | 0. 1493 | 5. 23 | 54. 35 | ********************************************* |
| 9 | 0. 1441 | 5. 04 | 59. 40 | ******************************************* |
| 10 | 0. 1398 | 4. 89 | 64. 29 | ****************************************** |
| 11 | 0. 1326 | 4. 64 | 68. 93 | **************************************** |
| 12 | 0. 1300 | 4. 55 | 73. 48 | *************************************** |
| 13 | 0. 1284 | 4. 49 | 77. 97 | ************************************** |
| 14 | 0. 1222 | 4. 28 | 82. 25 | ************************************* |
| 15 | 0. 1070 | 3. 74 | 86. 00 | ******************************** |
| 16 | 0. 1015 | 3. 55 | 89. 55 | ******************************* |
| 17 | 0. 0954 | 3. 34 | 92. 89 | ***************************** |
| 18 | 0. 0821 | 2. 87 | 95. 76 | ************************* |
| 19 | 0. 0748 | 2. 62 | 98. 38 | *********************** |
| 20 | 0. 0462 | 1. 62 | 100. 00 | ************** |
+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
RESEARCH OF I RREGULARI TI ES ( THI RD DI FFERENCES)
+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| 5 - - 6 | - 27. 77 | **************************************************** |
| 14 - - 15 | - 10. 42 | ******************** |
| 17 - - 18 | - 6. 67 | ************* |
| 13 - - 14 | - 5. 44 | *********** |
| 10 - - 11 | - 3. 77 | ******** |
| 2 - - 3 | - 3. 66 | ******* |
| 8 - - 9 | - 1. 53 | *** |
+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
RESEARCH OF I RREGULARI TI ES ( SECOND DI FFERENCES)
+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| 5 - - 6 | 22. 31 | **************************************************** |
| 2 - - 3 | 12. 28 | ***************************** |
| 14 - - 15 | 9. 83 | *********************** |
| 3 - - 4 | 8. 62 | ********************* |
| 1 - - 2 | 4. 94 | ************ |
| 10 - - 11 | 4. 67 | *********** |
| 11 - - 12 | 0. 90 | *** |
| 8 - - 9 | 0. 81 | ** |
| 6 - - 7 | 0. 40 | * |
+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

Irregularity 2
nd
diff between 5 and 6 = [ (
7

6
) (
6

5
) ] * 1000

The two tables below are the equivalent of the scree test (or Cattel test).
This procedure detects the main irregularities in the graph and ranks them by decreasing
importance.

55
LOADI NGS, CONTRI BUTI ONS AND SQUARED COSI NES OF ACTI VE CATEGORI ES
AXES 1 TO 5
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +
| CATEGORI ES | LOADI NGS | CONTRI BUTI ONS | SQUARED COSI NES |
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - |
| I DEN - LABEL REL. WT. DI STO | 1 2 3 4 5 | 1 2 3 4 5 | 1 2 3 4 5 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +
| 11 . Gender |
| masc - mal e 6. 70 1. 13 | - 0. 29 0. 08 0. 43 - 0. 47 - 0. 25 | 2. 1 0. 2 6. 0 7. 6 2. 3 | 0. 07 0. 01 0. 16 0. 19 0. 06 |
| f mi - gender 7. 59 0. 88 | 0. 26 - 0. 07 - 0. 38 0. 41 0. 22 | 1. 8 0. 2 5. 3 6. 7 2. 0 | 0. 07 0. 01 0. 16 0. 19 0. 06 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - CUMULATED CONTRI BUTI ON = 3. 9 0. 3 11. 2 14. 4 4. 3 +- - - - - - - - - - - - - - - - - - - - - - - - - - +
| 29 . Do you own some secur i t i es ? |
| vmo1 - Yes 1. 73 7. 26 | 0. 69 1. 46 - 0. 25 - 0. 23 0. 06 | 3. 1 15. 5 0. 5 0. 5 0. 0 | 0. 07 0. 29 0. 01 0. 01 0. 00 |
| vmo2 - No 12. 56 0. 14 | - 0. 10 - 0. 20 0. 03 0. 03 - 0. 01 | 0. 4 2. 1 0. 1 0. 1 0. 0 | 0. 07 0. 29 0. 01 0. 01 0. 00 |
| 39 . Ur ban ar ea si ze ( number of i nhabi t ant s) |
| agg1 - Lower t han 2. 000 1. 19 11. 05 | - 1. 06 0. 83 - 1. 06 0. 75 - 0. 06 | 5. 0 3. 4 6. 4 3. 5 0. 0 | 0. 10 0. 06 0. 10 0. 05 0. 00 |
| agg2 - 2. 000 - 20. 000 1. 24 10. 49 | - 0. 55 0. 26 0. 28 0. 80 - 0. 61 | 1. 4 0. 3 0. 5 4. 2 2. 5 | 0. 03 0. 01 0. 01 0. 06 0. 04 |
| agg3 - 20. 000 - 100. 000 2. 50 4. 71 | - 0. 27 0. 07 - 0. 17 0. 07 - 0. 12 | 0. 7 0. 1 0. 3 0. 1 0. 2 | 0. 02 0. 00 0. 01 0. 00 0. 00 |
| agg4 - gr eat er t han 100. 000 4. 70 2. 04 | - 0. 04 - 0. 40 0. 05 - 0. 22 - 0. 27 | 0. 0 3. 2 0. 0 1. 2 1. 9 | 0. 00 0. 08 0. 00 0. 02 0. 04 |
| agg5 - Par i s 4. 66 2. 07 | 0. 60 0. 08 0. 24 - 0. 22 0. 52 | 6. 2 0. 1 1. 3 1. 2 6. 7 | 0. 18 0. 00 0. 03 0. 02 0. 13 |
| 49 . J ob cat egor y |
| emp1 - Wor ker 3. 94 2. 62 | - 0. 88 - 0. 47 0. 54 - 0. 66 - 0. 20 | 11. 2 3. 6 5. 6 8. 9 0. 8 | 0. 29 0. 08 0. 11 0. 17 0. 01 |
| emp2 - Empl oyee 4. 91 1. 91 | - 0. 19 - 0. 20 - 0. 38 0. 67 0. 63 | 0. 6 0. 8 3. 5 11. 4 10. 5 | 0. 02 0. 02 0. 08 0. 23 0. 21 |
| emp3 - Manager 3. 44 3. 15 | 0. 80 0. 89 0. 74 0. 02 - 0. 14 | 8. 2 11. 4 9. 0 0. 0 0. 4 | 0. 21 0. 25 0. 17 0. 00 0. 01 |
| 49_ - mi ssi ng cat egor y 1. 99 6. 19 | 0. 80 - 0. 12 - 1. 41 - 0. 38 - 0. 91 | 4. 7 0. 1 18. 9 1. 5 9. 0 | 0. 10 0. 00 0. 32 0. 02 0. 13 |
| 51 . Di pl oma i n 5 cat egor i es |
| di e1 - No one 2. 70 4. 29 | - 0. 70 - 0. 23 - 0. 23 - 0. 93 0. 34 | 5. 0 0. 6 0. 7 12. 1 1. 7 | 0. 12 0. 01 0. 01 0. 20 0. 03 |
| di e2 - CEP 4. 59 2. 12 | - 0. 80 0. 08 0. 05 0. 29 - 0. 07 | 10. 9 0. 1 0. 1 2. 0 0. 1 | 0. 30 0. 00 0. 00 0. 04 0. 00 |
| di e3 - BEPC- BE- BEPS 2. 26 5. 33 | 0. 23 - 0. 62 - 0. 17 0. 47 0. 56 | 0. 4 3. 7 0. 3 2. 6 3. 8 | 0. 01 0. 07 0. 01 0. 04 0. 06 |
| di e4 - Bac - Br evet sup. 2. 60 4. 49 | 0. 93 - 0. 06 - 0. 32 0. 26 - 0. 95 | 8. 3 0. 0 1. 3 0. 9 12. 6 | 0. 19 0. 00 0. 02 0. 01 0. 20 |
| di e5 - Uni ver si t y 2. 14 5. 67 | 1. 23 0. 84 0. 73 - 0. 26 0. 27 | 12. 1 6. 4 5. 5 0. 8 0. 8 | 0. 27 0. 13 0. 10 0. 01 0. 01 |
| 52 . Occupat i on st at us of housi ng i n 4 cat egor i es |
| sl o1 - homeowner 1. 71 7. 33 | - 0. 31 - 0. 06 0. 85 1. 02 - 1. 30 | 0. 6 0. 0 5. 9 9. 2 15. 7 | 0. 01 0. 00 0. 10 0. 14 0. 23 |
| sl o2 - owner 4. 14 2. 45 | - 0. 44 1. 00 - 0. 51 - 0. 07 - 0. 01 | 3. 0 17. 6 5. 2 0. 1 0. 0 | 0. 08 0. 41 0. 11 0. 00 0. 00 |
| sl o3 - t enant 7. 47 0. 91 | 0. 27 - 0. 51 0. 15 - 0. 15 0. 33 | 2. 0 8. 2 0. 8 0. 9 4. 4 | 0. 08 0. 28 0. 03 0. 03 0. 12 |
| sl o4 - f r ee housi ng, ot her 0. 96 13. 93 | 0. 34 - 0. 25 - 0. 50 - 0. 33 - 0. 20 | 0. 4 0. 3 1. 2 0. 6 0. 2 | 0. 01 0. 00 0. 02 0. 01 0. 00 |
| 53 . Age i n 5 cat egor i es |
| agc1 - Lower t han 25 yo 2. 14 5. 67 | 0. 81 - 0. 98 - 0. 89 - 0. 68 - 0. 80 | 5. 2 8. 7 8. 2 5. 2 7. 4 | 0. 12 0. 17 0. 14 0. 08 0. 11 |
| agc2 - 25 t o 34 yo 4. 06 2. 52 | 0. 35 - 0. 45 0. 63 0. 47 0. 41 | 1. 9 3. 4 7. 8 4. 8 3. 7 | 0. 05 0. 08 0. 16 0. 09 0. 07 |
| agc3 - 35 t o 49 yo 2. 99 3. 78 | - 0. 33 0. 36 0. 41 0. 41 - 0. 69 | 1. 2 1. 6 2. 5 2. 6 7. 6 | 0. 03 0. 03 0. 05 0. 04 0. 12 |
| agc4 - 50 t o 64 yo 2. 69 4. 32 | - 0. 51 0. 30 - 0. 42 0. 21 0. 25 | 2. 6 1. 0 2. 3 0. 6 0. 9 | 0. 06 0. 02 0. 04 0. 01 0. 01 |
| agc5 - 65 yo and mor e 2. 41 4. 92 | - 0. 34 0. 84 - 0. 32 - 0. 93 0. 59 | 1. 0 7. 2 1. 2 10. 8 4. 6 | 0. 02 0. 14 0. 02 0. 17 0. 07 |

P.REL : relative weight of the category .
P.REL = ( n
q
* 100 ) / ( n * Q ) where n
q
is the weight of the category, n the overall weight
and Q the number of active variables.
For example, for the male category, P.REL = ( 469 * 100 ) / ( 1000 * 7 ) = 6.70 .

DISTO : distance between the category and the center of gravity. This criteria depends on
the weight of the category. The formula is the following :
d (j,G) = ( n / n
j
) 1 where n
j
is the weight of the category j and n the overall weight

LOADI NGS AND TEST- VALUES OF CATEGORI ES
AXES 1 TO 5
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| CATEGORI ES | TEST- VALUES | LOADI NGS | |
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - |
| I DEN - LABEL COUNT ABS. WT | 1 2 3 4 5 | 1 2 3 4 5 | DI STO. |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 11 . Gender |
| masc - mal e 469 469. 00 | - 8. 6 2. 3 12. 8 - 13. 9 - 7. 5 | - 0. 29 0. 08 0. 43 - 0. 47 - 0. 25 | 1. 13 |
| f mi - gender 531 531. 00 | 8. 6 - 2. 3 - 12. 8 13. 9 7. 5 | 0. 26 - 0. 07 - 0. 38 0. 41 0. 22 | 0. 88 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 29 . Do you own some secur i t i es ? |
| vmo1 - Yes 121 121. 00 | 8. 1 17. 1 - 2. 9 - 2. 7 0. 7 | 0. 69 1. 46 - 0. 25 - 0. 23 0. 06 | 7. 26 |
| vmo2 - No 879 879. 00 | - 8. 1 - 17. 1 2. 9 2. 7 - 0. 7 | - 0. 10 - 0. 20 0. 03 0. 03 - 0. 01 | 0. 14 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 39 . Ur ban ar ea si ze ( number of i nhabi t ant s) |
| agg1 - Lower t han 2. 000 83 83. 00 | - 10. 1 7. 9 - 10. 1 7. 2 - 0. 6 | - 1. 06 0. 83 - 1. 06 0. 75 - 0. 06 | 11. 05 |
| agg2 - 2. 000 - 20. 000 87 87. 00 | - 5. 4 2. 5 2. 7 7. 8 - 5. 9 | - 0. 55 0. 26 0. 28 0. 80 - 0. 61 | 10. 49 |
| agg3 - 20. 000 - 100. 000 175 175. 00 | - 3. 9 1. 1 - 2. 4 1. 0 - 1. 7 | - 0. 27 0. 07 - 0. 17 0. 07 - 0. 12 | 4. 71 |
| agg4 - gr eat er t han 100. 000 329 329. 00 | - 0. 9 - 8. 8 1. 0 - 4. 8 - 6. 0 | - 0. 04 - 0. 40 0. 05 - 0. 22 - 0. 27 | 2. 04 |
| agg5 - Par i s 326 326. 00 | 13. 2 1. 8 5. 2 - 4. 9 11. 3 | 0. 60 0. 08 0. 24 - 0. 22 0. 52 | 2. 07 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 49 . J ob cat egor y |
| emp1 - Wor ker 263 263. 00 | - 16. 1 - 9. 7 10. 7 - 12. 6 - 3. 5 | - 0. 86 - 0. 51 0. 57 - 0. 67 - 0. 18 | 2. 80 |
| emp2 - Empl oyee 335 335. 00 | - 3. 6 - 5. 0 - 8. 5 15. 2 14. 2 | - 0. 16 - 0. 22 - 0. 38 0. 68 0. 63 | 1. 99 |
| emp3 - Manager 229 229. 00 | 14. 6 14. 9 13. 2 0. 2 - 2. 1 | 0. 85 0. 86 0. 77 0. 01 - 0. 12 | 3. 37 |
| emp4 - Ot her 48 48. 00 | - 5. 2 5. 3 - 3. 5 - 0. 2 - 3. 3 | - 0. 73 0. 75 - 0. 50 - 0. 03 - 0. 47 | 19. 83 |
| 49_ - mi ssi ng cat egor y 125 125. 00 | 11. 4 - 2. 4 - 16. 6 - 5. 0 - 10. 9 | 0. 96 - 0. 20 - 1. 39 - 0. 42 - 0. 91 | 7. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
56
| 51 . Di pl oma i n 5 cat egor i es |
| di e1 - No one 189 189. 00 | - 10. 8 - 3. 5 - 3. 5 - 14. 2 5. 3 | - 0. 70 - 0. 23 - 0. 23 - 0. 93 0. 34 | 4. 29 |
| di e2 - CEP 321 321. 00 | - 17. 4 1. 8 1. 2 6. 3 - 1. 5 | - 0. 80 0. 08 0. 05 0. 29 - 0. 07 | 2. 12 |
| di e3 - BEPC- BE- BEPS 158 158. 00 | 3. 1 - 8. 5 - 2. 3 6. 5 7. 7 | 0. 23 - 0. 62 - 0. 17 0. 47 0. 56 | 5. 33 |
| di e4 - Bac - Br evet sup. 182 182. 00 | 13. 9 - 0. 9 - 4. 8 3. 8 - 14. 1 | 0. 93 - 0. 06 - 0. 32 0. 26 - 0. 95 | 4. 49 |
| di e5 - Uni ver si t y 150 150. 00 | 16. 4 11. 2 9. 7 - 3. 5 3. 6 | 1. 23 0. 84 0. 73 - 0. 26 0. 27 | 5. 67 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 52 . Occupat i on st at us of housi ng i n 4 cat egor i es |
| sl o1 - homeowner 120 120. 00 | - 3. 6 - 0. 7 9. 9 11. 9 - 15. 2 | - 0. 31 - 0. 06 0. 85 1. 02 - 1. 30 | 7. 33 |
| sl o2 - owner 290 290. 00 | - 8. 9 20. 2 - 10. 3 - 1. 4 - 0. 2 | - 0. 44 1. 00 - 0. 51 - 0. 07 - 0. 01 | 2. 45 |
| sl o3 - t enant 523 523. 00 | 9. 0 - 16. 9 5. 1 - 5. 0 10. 9 | 0. 27 - 0. 51 0. 15 - 0. 15 0. 33 | 0. 91 |
| sl o4 - f r ee housi ng, ot her 67 67. 00 | 2. 8 - 2. 1 - 4. 3 - 2. 8 - 1. 7 | 0. 34 - 0. 25 - 0. 50 - 0. 33 - 0. 20 | 13. 93 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 53 . Age i n 5 cat egor i es |
| agc1 - Lower t han 25 yo 150 150. 00 | 10. 7 - 13. 0 - 11. 8 - 9. 1 - 10. 6 | 0. 81 - 0. 98 - 0. 89 - 0. 68 - 0. 80 | 5. 67 |
| agc2 - 25 t o 34 yo 284 284. 00 | 7. 0 - 8. 9 12. 6 9. 5 8. 1 | 0. 35 - 0. 45 0. 63 0. 47 0. 41 | 2. 52 |
| agc3 - 35 t o 49 yo 209 209. 00 | - 5. 3 5. 9 6. 7 6. 6 - 11. 2 | - 0. 33 0. 36 0. 41 0. 41 - 0. 69 | 3. 78 |
| agc4 - 50 t o 64 yo 188 188. 00 | - 7. 7 4. 6 - 6. 4 3. 1 3. 9 | - 0. 51 0. 30 - 0. 42 0. 21 0. 25 | 4. 32 |
| agc5 - 65 yo and mor e 169 169. 00 | - 4. 8 12. 0 - 4. 5 - 13. 2 8. 4 | - 0. 34 0. 84 - 0. 32 - 0. 93 0. 59 | 4. 92 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 1 . The f ami l y i s t he onl y pl ace wher e you f eel wel l |
| f bi 1 - Yes 561 561. 00 | - 14. 5 4. 4 - 3. 6 0. 6 0. 5 | - 0. 40 0. 12 - 0. 10 0. 02 0. 02 | 0. 78 |
| f bi 2 - No 431 431. 00 | 14. 6 - 4. 5 3. 6 - 0. 4 - 0. 8 | 0. 53 - 0. 16 0. 13 - 0. 02 - 0. 03 | 1. 32 |
| 1_ - mi ssi ng cat egor y 8 8. 00 | - 0. 3 0. 6 - 0. 4 - 1. 0 1. 5 | - 0. 11 0. 20 - 0. 13 - 0. 34 0. 54 | 124. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 2 . Opi ni on about weddi ng |
| Mar 1 - i ndi ssol ubl e 231 231. 00 | - 7. 9 4. 1 - 3. 3 - 3. 2 0. 3 | - 0. 46 0. 23 - 0. 19 - 0. 19 0. 02 | 3. 33 |
| Mar 2 - di ssol ved ser i ous pb 342 342. 00 | - 1. 8 3. 4 - 1. 8 3. 2 - 0. 7 | - 0. 08 0. 15 - 0. 08 0. 14 - 0. 03 | 1. 92 |
| Mar 3 - di ssol ved i f agr eem 387 387. 00 | 8. 7 - 6. 4 4. 7 - 0. 2 0. 2 | 0. 35 - 0. 25 0. 19 - 0. 01 0. 01 | 1. 58 |
| Mar 4 - I do not know 39 39. 00 | - 0. 3 - 1. 3 0. 0 - 0. 4 0. 5 | - 0. 05 - 0. 21 0. 00 - 0. 06 0. 08 | 24. 64 |
| 2_ - mi ssi ng cat egor y 1 1. 00 | 0. 8 0. 8 - 0. 8 0. 1 - 0. 4 | 0. 79 0. 77 - 0. 81 0. 09 - 0. 42 | 999. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 3 . Housekeepi ng wor ks, t ake car e of chi l dr en. . . |
| Mn1 - onl y women do i t 42 42. 00 | - 3. 5 - 0. 6 - 0. 9 - 0. 9 - 0. 4 | - 0. 52 - 0. 08 - 0. 14 - 0. 14 - 0. 06 | 22. 81 |
| Mn2 - usual l y t he women 336 336. 00 | - 2. 4 4. 9 - 1. 4 - 2. 3 2. 0 | - 0. 11 0. 22 - 0. 06 - 0. 10 0. 09 | 1. 98 |
| Mn3 - men and women 599 599. 00 | 3. 6 - 4. 3 2. 1 2. 9 - 2. 1 | 0. 09 - 0. 11 0. 05 0. 07 - 0. 05 | 0. 67 |
| Mn4 - I do not know 19 19. 00 | 0. 7 - 0. 3 - 2. 1 - 0. 7 0. 1 | 0. 15 - 0. 07 - 0. 47 - 0. 15 0. 02 | 51. 63 |
| 3_ - mi ssi ng cat egor y 4 4. 00 | 0. 2 - 1. 3 1. 2 - 0. 9 1. 9 | 0. 11 - 0. 64 0. 62 - 0. 43 0. 93 | 249. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 4 . Ar e you sat i sf i ed of your dai l y l i f e |
| Cad1 - a l ot 259 259. 00 | - 0. 8 5. 3 - 3. 4 1. 7 - 0. 9 | - 0. 04 0. 28 - 0. 18 0. 09 - 0. 05 | 2. 86 |
| Cad2 - enough 549 549. 00 | - 0. 9 0. 1 1. 2 0. 1 0. 2 | - 0. 03 0. 00 0. 03 0. 00 0. 00 | 0. 82 |
| Cad3 - a l i t t l e 145 145. 00 | 1. 9 - 4. 8 1. 3 - 1. 3 1. 1 | 0. 14 - 0. 37 0. 10 - 0. 10 0. 08 | 5. 90 |
| Cad4 - not at al l 46 46. 00 | 0. 6 - 3. 3 2. 0 - 1. 6 - 0. 3 | 0. 08 - 0. 47 0. 29 - 0. 23 - 0. 04 | 20. 74 |
| 4_ - mi ssi ng cat egor y 1 1. 00 | 0. 4 1. 5 0. 7 - 0. 6 - 0. 1 | 0. 35 1. 52 0. 72 - 0. 56 - 0. 12 | 999. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 5 . The envi r onment al pr ot ect i on and mai nt enance i s. . . |
| env1 - ver y i mpor t ant 657 657. 00 | 8. 0 0. 0 1. 8 0. 8 - 1. 4 | 0. 18 0. 00 0. 04 0. 02 - 0. 03 | 0. 52 |
| env2 - qui t e i mpor t ant 298 298. 00 | - 7. 1 - 0. 1 - 0. 7 0. 3 0. 6 | - 0. 34 0. 00 - 0. 04 0. 02 0. 03 | 2. 36 |
| env3 - not i mpor t ant 36 36. 00 | - 3. 0 - 0. 1 - 2. 8 - 1. 7 2. 4 | - 0. 49 - 0. 01 - 0. 46 - 0. 27 0. 39 | 26. 78 |
| env4 - not at al l i mpor t ant 7 7. 00 | - 0. 4 0. 1 - 0. 1 - 2. 5 - 0. 3 | - 0. 16 0. 05 - 0. 02 - 0. 94 - 0. 13 | 141. 86 |
| 5_ - mi ssi ng cat egor y 2 2. 00 | 0. 3 0. 8 - 0. 1 - 0. 3 - 0. 5 | 0. 20 0. 59 - 0. 10 - 0. 24 - 0. 37 | 499. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 6 . Do sci ent i f i c di scover i es amel i or at e t he qual i t y of l i f e ? |
| sci 1 - Yes, a l i t t l e 509 509. 00 | - 1. 9 - 0. 2 0. 3 0. 5 - 0. 6 | - 0. 06 0. 00 0. 01 0. 02 - 0. 02 | 0. 96 |
| sci 2 - Yes, a l ot 383 383. 00 | 3. 1 1. 8 - 1. 2 0. 8 - 0. 3 | 0. 12 0. 07 - 0. 05 0. 03 - 0. 01 | 1. 61 |
| sci 3 - Not at al l 105 105. 00 | - 1. 6 - 2. 3 1. 3 - 2. 0 1. 6 | - 0. 15 - 0. 22 0. 12 - 0. 19 0. 15 | 8. 52 |
| 6_ - mi ssi ng cat egor y 3 3. 00 | - 1. 1 - 1. 5 0. 1 - 0. 8 - 0. 6 | - 0. 65 - 0. 89 0. 07 - 0. 49 - 0. 36 | 332. 33 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 7 . Ar e you sat i sf i ed of your heal t h |
| Snt 1 - a l ot 267 267. 00 | 3. 8 0. 3 0. 4 - 1. 1 - 2. 1 | 0. 20 0. 02 0. 02 - 0. 06 - 0. 11 | 2. 75 |
| Snt 2 - sat i sf i ed 600 600. 00 | - 2. 7 0. 4 0. 4 2. 0 2. 3 | - 0. 07 0. 01 0. 01 0. 05 0. 06 | 0. 67 |
| Snt 3 - a l i t t l e 115 115. 00 | - 0. 6 - 0. 8 - 1. 1 - 1. 3 - 1. 2 | - 0. 05 - 0. 07 - 0. 09 - 0. 12 - 0. 10 | 7. 70 |
| Snt 4 - not at al l 18 18. 00 | - 1. 1 - 0. 5 - 0. 2 - 0. 3 1. 3 | - 0. 25 - 0. 11 - 0. 05 - 0. 06 0. 30 | 54. 56 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 8 . Evol ut i on of your dai l y l i f e f or t he l ast 10 year s |
| Ft r 1 - i mpr ovi ng a l ot 102 102. 00 | 1. 7 0. 9 0. 5 1. 8 - 0. 8 | 0. 16 0. 08 0. 04 0. 17 - 0. 08 | 8. 80 |
| Ft r 2 - i mpr ovi ng a l i t t l e 316 316. 00 | - 1. 2 - 1. 5 1. 8 4. 2 - 0. 9 | - 0. 05 - 0. 07 0. 08 0. 20 - 0. 04 | 2. 16 |
| Ft r 3 - t he same 250 250. 00 | 0. 8 2. 3 - 2. 6 - 3. 0 - 2. 1 | 0. 05 0. 12 - 0. 14 - 0. 16 - 0. 11 | 3. 00 |
| Ft r 4 - a l i t t l e wor se 190 190. 00 | - 2. 2 0. 3 0. 8 - 2. 1 3. 7 | - 0. 14 0. 02 0. 05 - 0. 14 0. 24 | 4. 26 |
| Ft r 5 - a l ot wor se 114 114. 00 | 0. 3 - 0. 1 1. 3 - 0. 1 1. 6 | 0. 03 - 0. 01 0. 12 - 0. 01 0. 14 | 7. 77 |
| Ft r 6 - I do not know 26 26. 00 | 2. 9 - 4. 0 - 3. 2 - 1. 9 - 2. 3 | 0. 55 - 0. 78 - 0. 61 - 0. 36 - 0. 45 | 37. 46 |
| 8_ - mi ssi ng cat egor y 2 2. 00 | - 0. 7 - 0. 3 - 1. 2 - 1. 8 - 1. 0 | - 0. 47 - 0. 23 - 0. 83 - 1. 30 - 0. 73 | 499. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 9 . Your opi ni on on t he j ust i ce r unni ng i n 1986 |
| J us1 - ver y wel l 13 13. 00 | 0. 0 1. 7 - 2. 2 - 2. 1 0. 2 | 0. 01 0. 47 - 0. 60 - 0. 57 0. 06 | 75. 92 |
| J us2 - qui t e wel l 243 243. 00 | - 0. 8 3. 4 - 0. 1 - 0. 2 - 0. 8 | - 0. 05 0. 19 - 0. 01 - 0. 01 - 0. 04 | 3. 12 |
| J us3 - qui t e bad 398 398. 00 | 0. 6 - 1. 0 - 1. 7 1. 2 - 1. 8 | 0. 02 - 0. 04 - 0. 06 0. 05 - 0. 07 | 1. 51 |
| J us4 - ver y bad 256 256. 00 | 1. 3 - 2. 9 3. 9 - 1. 3 1. 1 | 0. 07 - 0. 16 0. 21 - 0. 07 0. 06 | 2. 91 |
| J us5 - I do not know 65 65. 00 | - 3. 3 0. 5 - 2. 1 0. 0 0. 6 | - 0. 40 0. 05 - 0. 26 0. 00 0. 07 | 14. 38 |
| J us6 - do not answer 25 25. 00 | 2. 2 - 0. 1 - 0. 4 1. 9 3. 4 | 0. 43 - 0. 02 - 0. 09 0. 37 0. 68 | 39. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 10 . Do you t hi nk t he soci et y needs t o change |
| Soc1 - yes 759 759. 00 | 1. 8 - 4. 8 3. 1 - 0. 3 - 0. 4 | 0. 03 - 0. 08 0. 05 - 0. 01 - 0. 01 | 0. 32 |
| Soc1 - no 170 170. 00 | - 0. 6 4. 4 - 2. 3 0. 9 - 0. 6 | - 0. 04 0. 31 - 0. 16 0. 06 - 0. 04 | 4. 88 |
| Soc1 - I do not know 71 71. 00 | - 2. 1 1. 5 - 1. 7 - 0. 8 1. 7 | - 0. 24 0. 17 - 0. 20 - 0. 09 0. 19 | 13. 08 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 12 . Educat i onal l evel of t he r espondent |
| di p1 - No one 189 189. 00 | - 10. 8 - 3. 5 - 3. 5 - 14. 2 5. 3 | - 0. 70 - 0. 23 - 0. 23 - 0. 93 0. 34 | 4. 29 |
| di p2 - CEP 321 321. 00 | - 17. 4 1. 8 1. 2 6. 3 - 1. 5 | - 0. 80 0. 08 0. 05 0. 29 - 0. 07 | 2. 12 |
| di p3 - BEPC- BE- BEPS 158 158. 00 | 3. 1 - 8. 5 - 2. 3 6. 5 7. 7 | 0. 23 - 0. 62 - 0. 17 0. 47 0. 56 | 5. 33 |
| di p4 - Bac 162 162. 00 | 13. 2 - 1. 7 - 5. 1 3. 7 - 14. 0 | 0. 95 - 0. 12 - 0. 37 0. 26 - 1. 01 | 5. 17 |
| di p5 - br evet sup. 20 20. 00 | 3. 4 2. 2 0. 2 0. 9 - 2. 1 | 0. 76 0. 48 0. 05 0. 19 - 0. 46 | 49. 00 |
| di p6 - Uni ver si t y 142 142. 00 | 15. 8 10. 9 9. 8 - 3. 5 3. 3 | 1. 23 0. 85 0. 76 - 0. 27 0. 26 | 6. 04 |
| di p7 - ot her 8 8. 00 | 3. 7 2. 2 0. 6 - 0. 2 1. 4 | 1. 31 0. 77 0. 21 - 0. 07 0. 50 | 124. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 13 . What do you t hi nk about publ i c nur ser i es |
| cr e1 - ver y sat i sf yi ng 139 139. 00 | 1. 6 - 3. 6 1. 5 1. 8 2. 3 | 0. 13 - 0. 28 0. 12 0. 14 0. 18 | 6. 19 |
| cr e2 - qui t e sat i sf yi ng 386 386. 00 | 1. 8 2. 8 1. 8 0. 7 - 0. 7 | 0. 07 0. 11 0. 07 0. 03 - 0. 03 | 1. 59 |
| cr e3 - not ver y sat i sf yi ng 242 242. 00 | 1. 6 0. 4 - 0. 4 - 1. 3 - 1. 5 | 0. 09 0. 02 - 0. 02 - 0. 08 - 0. 08 | 3. 13 |
| cr e4 - not at al l sat i sf . 92 92. 00 | - 0. 9 - 1. 8 - 0. 8 - 1. 1 0. 8 | - 0. 09 - 0. 18 - 0. 08 - 0. 11 0. 08 | 9. 87 |
| cr e5 - does not know 139 139. 00 | - 5. 8 0. 6 - 2. 7 - 0. 1 0. 0 | - 0. 45 0. 05 - 0. 21 - 0. 01 0. 00 | 6. 19 |
| 13_ - mi ssi ng cat egor y 2 2. 00 | 2. 0 0. 5 - 1. 6 - 0. 9 - 1. 4 | 1. 40 0. 37 - 1. 11 - 0. 66 - 0. 98 | 499. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
57
| 14 . What do you t hi nk about at - home mot her s |
| cr e1 - ver y sat i sf yi ng 786 786. 00 | - 6. 8 2. 9 - 4. 5 3. 5 - 0. 3 | - 0. 11 0. 05 - 0. 07 0. 06 - 0. 01 | 0. 27 |
| cr e2 - qui t e sat i sf yi ng 129 129. 00 | 6. 0 - 1. 7 1. 9 - 1. 8 - 0. 7 | 0. 50 - 0. 14 0. 16 - 0. 14 - 0. 06 | 6. 75 |
| cr e3 - not ver y sat i sf yi ng 35 35. 00 | 2. 7 - 1. 5 2. 8 - 1. 5 1. 3 | 0. 45 - 0. 25 0. 47 - 0. 25 0. 21 | 27. 57 |
| cr e4 - not at al l sat i sf . 20 20. 00 | 2. 8 - 1. 0 2. 4 - 1. 3 1. 0 | 0. 63 - 0. 22 0. 53 - 0. 29 0. 21 | 49. 00 |
| cr e5 - does not know 29 29. 00 | - 1. 0 - 1. 5 2. 1 - 2. 3 0. 2 | - 0. 19 - 0. 27 0. 38 - 0. 43 0. 03 | 33. 48 |
| 14_ - mi ssi ng cat egor y 1 1. 00 | 0. 8 0. 8 - 0. 8 0. 1 - 0. 4 | 0. 79 0. 77 - 0. 81 0. 09 - 0. 42 | 999. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 16 . Do you l i ke your l andscape vi ew |
| Log1 - a l ot 516 516. 00 | - 4. 8 4. 7 - 4. 1 2. 2 0. 0 | - 0. 15 0. 14 - 0. 13 0. 07 0. 00 | 0. 94 |
| Log2 - enough 296 296. 00 | 3. 2 - 0. 4 3. 7 - 0. 3 - 0. 5 | 0. 16 - 0. 02 0. 18 - 0. 01 - 0. 02 | 2. 38 |
| Log3 - a l i t t l e 82 82. 00 | 1. 3 - 2. 6 1. 8 - 1. 0 1. 5 | 0. 14 - 0. 27 0. 19 - 0. 10 0. 16 | 11. 20 |
| Log4 - not at al l 104 104. 00 | 1. 7 - 4. 7 - 0. 4 - 2. 0 - 0. 5 | 0. 15 - 0. 44 - 0. 03 - 0. 19 - 0. 05 | 8. 62 |
| 16_ - mi ssi ng cat egor y 2 2. 00 | 1. 0 0. 3 0. 1 - 2. 4 0. 1 | 0. 69 0. 23 0. 05 - 1. 68 0. 05 | 499. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 17 . Do you own a di sh washi ng machi ne ? |
| l av1 - Yes 211 211. 00 | 4. 6 7. 4 1. 0 2. 9 - 6. 0 | 0. 28 0. 45 0. 06 0. 18 - 0. 37 | 3. 74 |
| l av2 - Not 789 789. 00 | - 4. 6 - 7. 4 - 1. 0 - 2. 9 6. 0 | - 0. 07 - 0. 12 - 0. 02 - 0. 05 0. 10 | 0. 27 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 18 . Do you own a col or TV ? |
| t co1 - Yes 373 373. 00 | - 2. 5 3. 8 - 0. 6 0. 4 0. 2 | - 0. 10 0. 16 - 0. 02 0. 02 0. 01 | 1. 68 |
| t co2 - Not 624 624. 00 | 2. 6 - 3. 7 0. 5 - 0. 4 - 0. 4 | 0. 06 - 0. 09 0. 01 - 0. 01 - 0. 01 | 0. 60 |
| 18_ - mi ssi ng cat egor y 3 3. 00 | - 1. 0 - 0. 3 0. 8 0. 1 1. 0 | - 0. 59 - 0. 17 0. 45 0. 08 0. 59 | 332. 33 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 20 . Occupat i on st at us of housi ng |
| Occ1 - homeowner 120 120. 00 | - 3. 6 - 0. 7 9. 9 11. 9 - 15. 2 | - 0. 31 - 0. 06 0. 85 1. 02 - 1. 30 | 7. 33 |
| Occ2 - owner 290 290. 00 | - 8. 9 20. 2 - 10. 3 - 1. 4 - 0. 2 | - 0. 44 1. 00 - 0. 51 - 0. 07 - 0. 01 | 2. 45 |
| Occ3 - t enant 523 523. 00 | 9. 0 - 16. 9 5. 1 - 5. 0 10. 9 | 0. 27 - 0. 51 0. 15 - 0. 15 0. 33 | 0. 91 |
| Occ4 - f r ee housi ng 58 58. 00 | 2. 5 - 2. 2 - 3. 3 - 2. 6 - 0. 7 | 0. 32 - 0. 28 - 0. 42 - 0. 33 - 0. 09 | 16. 24 |
| Occ5 - ot her 9 9. 00 | 1. 3 - 0. 2 - 3. 2 - 1. 1 - 2. 9 | 0. 44 - 0. 06 - 1. 05 - 0. 38 - 0. 96 | 110. 11 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 21 . The housi ng expenses ar e f or you |
| Dp1 - uni mpor t ant 113 113. 00 | 0. 1 2. 8 - 4. 0 - 2. 1 0. 9 | 0. 01 0. 24 - 0. 36 - 0. 19 0. 08 | 7. 85 |
| Dp2 - wi t hout bi g pr obl em 444 444. 00 | - 2. 0 2. 6 1. 9 - 0. 4 - 1. 6 | - 0. 07 0. 09 0. 07 - 0. 01 - 0. 06 | 1. 25 |
| Dp3 - a bi g pr obl em 352 352. 00 | 1. 1 - 2. 9 1. 9 2. 8 1. 9 | 0. 05 - 0. 12 0. 08 0. 12 0. 08 | 1. 84 |
| Dp4 - a ver y bi g pr obl em 55 55. 00 | 0. 2 - 3. 0 1. 1 0. 1 0. 6 | 0. 03 - 0. 39 0. 14 0. 01 0. 07 | 17. 18 |
| Dp5 - do not f ace wi t h 6 6. 00 | - 0. 2 1. 2 0. 8 - 0. 8 - 0. 1 | - 0. 10 0. 47 0. 32 - 0. 32 - 0. 03 | 165. 67 |
| Dp6 - I do not know 22 22. 00 | 2. 0 - 1. 2 - 4. 1 - 1. 6 - 2. 3 | 0. 42 - 0. 25 - 0. 86 - 0. 34 - 0. 48 | 44. 45 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 22 . Ar e you embar assed wi t h t he noi se ? |
| br u1 - a l i t t l e 196 196. 00 | 1. 7 0. 4 0. 6 - 0. 7 - 0. 2 | 0. 11 0. 03 0. 04 - 0. 05 - 0. 02 | 4. 10 |
| br u2 - a l ot 197 197. 00 | 2. 8 - 3. 3 0. 9 - 3. 8 1. 8 | 0. 18 - 0. 21 0. 06 - 0. 25 0. 11 | 4. 08 |
| br u3 - not at al l 606 606. 00 | - 3. 6 2. 3 - 1. 2 3. 7 - 1. 3 | - 0. 09 0. 06 - 0. 03 0. 09 - 0. 03 | 0. 65 |
| 22_ - mi ssi ng cat egor y 1 1. 00 | - 1. 0 0. 9 - 1. 1 0. 4 1. 0 | - 0. 99 0. 92 - 1. 15 0. 36 0. 95 | 999. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 23 . Do you par t i ci pat e t o t he envi r onment al pr ot ect i on ? |
| df 1 - Yes 126 126. 00 | 6. 5 0. 9 1. 9 - 1. 3 - 3. 6 | 0. 54 0. 07 0. 16 - 0. 11 - 0. 30 | 6. 94 |
| df 2 - No 874 874. 00 | - 6. 5 - 0. 9 - 1. 9 1. 3 3. 6 | - 0. 08 - 0. 01 - 0. 02 0. 02 0. 04 | 0. 14 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 24 . Last j ob |
| cs01 - manoeuvr e 13 13. 00 | - 3. 1 - 1. 6 1. 7 - 1. 6 - 0. 9 | - 0. 87 - 0. 44 0. 46 - 0. 43 - 0. 26 | 75. 92 |
| cs02 - ouvr i er spci al i s 98 98. 00 | - 9. 1 - 5. 6 5. 6 - 6. 0 - 1. 6 | - 0. 87 - 0. 54 0. 53 - 0. 58 - 0. 16 | 9. 20 |
| cs03 - ouvr i er qual i f i 152 152. 00 | - 11. 3 - 6. 7 8. 0 - 10. 0 - 2. 6 | - 0. 85 - 0. 50 0. 60 - 0. 75 - 0. 19 | 5. 58 |
| cs04 - empl oy de commer ce 39 39. 00 | - 0. 6 - 2. 6 - 2. 0 5. 5 4. 7 | - 0. 09 - 0. 41 - 0. 31 0. 86 0. 74 | 24. 64 |
| cs05 - aut r e empl oy qual . 68 68. 00 | 2. 3 - 3. 9 - 2. 5 7. 2 5. 1 | 0. 27 - 0. 45 - 0. 30 0. 84 0. 59 | 13. 71 |
| cs06 - aut r e emp. non qual . 91 91. 00 | - 1. 5 - 2. 4 - 3. 8 6. 3 6. 5 | - 0. 15 - 0. 24 - 0. 38 0. 63 0. 65 | 9. 99 |
| cs07 - per sonnel de ser vi ce 70 70. 00 | - 2. 8 - 1. 9 - 4. 6 5. 7 5. 6 | - 0. 32 - 0. 22 - 0. 53 0. 65 0. 65 | 13. 29 |
| cs08 - cont r ema t r e 14 14. 00 | - 1. 9 1. 5 - 0. 9 0. 2 0. 7 | - 0. 49 0. 39 - 0. 24 0. 04 0. 20 | 70. 43 |
| cs09 - ar t i san 18 18. 00 | - 2. 3 - 0. 4 - 1. 2 3. 0 3. 3 | - 0. 55 - 0. 09 - 0. 28 0. 70 0. 78 | 54. 56 |
| cs10 - pet i t commer cant 35 35. 00 | - 2. 8 1. 1 - 2. 6 3. 5 3. 7 | - 0. 47 0. 19 - 0. 43 0. 58 0. 62 | 27. 57 |
| cs11 - cadr e moyen 135 135. 00 | 9. 1 6. 6 7. 9 2. 3 - 3. 5 | 0. 73 0. 53 0. 64 0. 19 - 0. 28 | 6. 41 |
| cs12 - pat r on i ndus. commer . 10 10. 00 | 2. 2 4. 4 1. 8 - 1. 9 0. 7 | 0. 70 1. 38 0. 57 - 0. 61 0. 22 | 99. 00 |
| cs13 - pr of essi on l i br al e 15 15. 00 | 3. 9 6. 7 3. 0 - 0. 5 0. 5 | 0. 99 1. 72 0. 76 - 0. 14 0. 12 | 65. 67 |
| cs14 - cadr e supr i eur 69 69. 00 | 9. 2 10. 8 9. 0 - 1. 8 0. 8 | 1. 07 1. 25 1. 05 - 0. 21 0. 09 | 13. 49 |
| cs15 - expl oi t ant agr i col e 32 32. 00 | - 7. 2 5. 5 - 4. 6 0. 8 - 2. 5 | - 1. 25 0. 95 - 0. 80 0. 13 - 0. 44 | 30. 25 |
| cs16 - sal ar i agr i col e 0 0. 00 | 0. 0 0. 0 0. 0 0. 0 0. 0 | 0. 00 0. 00 0. 00 0. 00 0. 00 | 0. 00 |
| cs17 - aut r e act i f 13 13. 00 | 1. 2 1. 2 1. 2 - 1. 0 - 1. 9 | 0. 34 0. 32 0. 34 - 0. 27 - 0. 51 | 75. 92 |
| cs99 - i nconnu 3 3. 00 | 0. 2 0. 7 - 1. 7 - 1. 4 - 0. 9 | 0. 12 0. 43 - 0. 97 - 0. 78 - 0. 50 | 332. 33 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 25 . Does your j ob expose you t o heal t h r i sk ? |
| t r a1 - Lot s of r i sks 108 108. 00 | - 4. 7 - 2. 1 4. 5 - 0. 9 - 3. 7 | - 0. 43 - 0. 19 0. 40 - 0. 08 - 0. 33 | 8. 26 |
| t r a2 - Few r i sks 192 192. 00 | - 2. 1 - 1. 5 7. 1 0. 5 - 1. 5 | - 0. 14 - 0. 09 0. 46 0. 03 - 0. 10 | 4. 21 |
| t r a3 - No r i sk 276 276. 00 | 2. 5 - 2. 5 5. 1 6. 8 3. 9 | 0. 13 - 0. 13 0. 26 0. 35 0. 20 | 2. 62 |
| 25_ - mi ssi ng cat egor y 424 424. 00 | 2. 4 4. 7 - 13. 1 - 6. 0 0. 0 | 0. 09 0. 17 - 0. 48 - 0. 22 0. 00 | 1. 36 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 26 . Do you have wor k- per sonal l i f e pr obl ems |
| con1 - yes 229 229. 00 | 2. 8 - 1. 6 7. 7 3. 2 1. 1 | 0. 16 - 0. 09 0. 45 0. 18 0. 06 | 3. 37 |
| con2 - no 338 338. 00 | - 5. 0 - 3. 4 6. 7 3. 3 - 0. 9 | - 0. 22 - 0. 15 0. 29 0. 15 - 0. 04 | 1. 96 |
| 26_ - mi ssi ng cat egor y 433 433. 00 | 2. 4 4. 6 - 12. 9 - 5. 8 - 0. 1 | 0. 09 0. 17 - 0. 47 - 0. 21 0. 00 | 1. 31 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 27 . Have you r ecent l y been ner vous |
| ner 1 - Yes 273 273. 00 | 2. 6 - 1. 7 0. 6 1. 7 0. 8 | 0. 13 - 0. 09 0. 03 0. 09 0. 04 | 2. 66 |
| ner 2 - No 726 726. 00 | - 2. 6 1. 7 - 0. 6 - 1. 8 - 0. 8 | - 0. 05 0. 03 - 0. 01 - 0. 04 - 0. 02 | 0. 38 |
| 27_ - mi ssi ng cat egor y 1 1. 00 | 0. 4 0. 3 - 0. 5 1. 2 0. 8 | 0. 35 0. 30 - 0. 53 1. 23 0. 77 | 999. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 28 . Have you r ecent l y been depr essed |
| t a1 - Yes 122 122. 00 | 2. 5 - 1. 9 0. 1 0. 6 0. 7 | 0. 21 - 0. 16 0. 01 0. 05 0. 06 | 7. 20 |
| t a2 - No 874 874. 00 | - 2. 1 1. 8 - 0. 2 - 0. 4 - 0. 6 | - 0. 03 0. 02 0. 00 - 0. 01 - 0. 01 | 0. 14 |
| 28_ - mi ssi ng cat egor y 4 4. 00 | - 1. 7 0. 4 0. 5 - 0. 9 - 0. 1 | - 0. 87 0. 18 0. 27 - 0. 45 - 0. 05 | 249. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 30 . Do you own r eal est at e pr oper t i es ? |
| vi m1 - Yes 81 81. 00 | 2. 5 8. 6 - 3. 8 - 0. 3 - 1. 6 | 0. 27 0. 92 - 0. 41 - 0. 03 - 0. 18 | 11. 35 |
| vi m2 - No 918 918. 00 | - 2. 6 - 8. 6 3. 6 0. 2 1. 9 | - 0. 02 - 0. 08 0. 03 0. 00 0. 02 | 0. 09 |
| 30_ - mi ssi ng cat egor y 1 1. 00 | 0. 7 0. 7 1. 6 0. 8 - 2. 2 | 0. 69 0. 66 1. 63 0. 78 - 2. 18 | 999. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 31 . Do you r egul ar l y i mpose r est r i ct i ons |
| r st 1 - Yes 569 569. 00 | 1. 5 - 6. 3 1. 3 0. 8 1. 6 | 0. 04 - 0. 17 0. 04 0. 02 0. 04 | 0. 76 |
| r st 2 - No 414 414. 00 | - 1. 3 6. 4 - 1. 4 - 1. 0 - 1. 6 | - 0. 05 0. 24 - 0. 05 - 0. 04 - 0. 06 | 1. 42 |
| 31_ - mi ssi ng cat egor y 17 17. 00 | - 0. 6 - 0. 1 0. 6 0. 6 0. 1 | - 0. 15 - 0. 01 0. 15 0. 13 0. 02 | 57. 82 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
58
| 32 . Your opi ni on on t he evol ut i on of Fr ench peopl e l i f e l evel |
| Fr 1 - a l ot bet t er 78 78. 00 | - 1. 6 4. 0 - 1. 6 0. 4 0. 3 | - 0. 17 0. 43 - 0. 17 0. 05 0. 03 | 11. 82 |
| Fr 2 - a l i t t l e bet t er 321 321. 00 | - 0. 1 3. 9 - 2. 4 1. 7 - 3. 8 | 0. 00 0. 18 - 0. 11 0. 08 - 0. 18 | 2. 12 |
| Fr 3 - i t i s t he same 159 159. 00 | - 1. 6 - 1. 9 0. 7 0. 4 0. 7 | - 0. 11 - 0. 14 0. 05 0. 03 0. 05 | 5. 29 |
| Fr 4 - a l i t t l e wor se 276 276. 00 | 0. 1 - 1. 7 2. 0 - 1. 3 1. 3 | 0. 00 - 0. 09 0. 10 - 0. 07 0. 07 | 2. 62 |
| Fr 5 - a l ot wor se 108 108. 00 | 3. 2 - 3. 1 3. 4 - 0. 8 2. 6 | 0. 29 - 0. 29 0. 31 - 0. 08 0. 24 | 8. 26 |
| Fr 6 - I do not know 57 57. 00 | - 0. 1 - 1. 8 - 2. 7 - 1. 0 0. 3 | - 0. 02 - 0. 23 - 0. 35 - 0. 12 0. 04 | 16. 54 |
| 32_ - mi ssi ng cat egor y 1 1. 00 | 0. 9 - 1. 4 - 0. 7 0. 5 0. 3 | 0. 94 - 1. 43 - 0. 75 0. 48 0. 25 | 999. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 33 . Do you i nvi t e some f r i ends f or di nner ? |
| bou1 - Of t en 606 606. 00 | 8. 1 - 0. 6 2. 3 4. 5 - 2. 9 | 0. 21 - 0. 02 0. 06 0. 11 - 0. 07 | 0. 65 |
| bou2 - Rar el y 274 274. 00 | - 5. 0 0. 2 0. 5 - 1. 9 1. 1 | - 0. 26 0. 01 0. 02 - 0. 10 0. 06 | 2. 65 |
| bou3 - Never 120 120. 00 | - 5. 4 0. 6 - 4. 0 - 4. 1 2. 9 | - 0. 46 0. 05 - 0. 35 - 0. 35 0. 25 | 7. 33 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 34 . Ar e you a member of a r el i gi ous associ at i on ? |
| asc1 - yes 69 69. 00 | 1. 8 6. 2 - 1. 2 0. 6 - 1. 4 | 0. 21 0. 72 - 0. 14 0. 07 - 0. 16 | 13. 49 |
| asc2 - no 931 931. 00 | - 1. 8 - 6. 2 1. 2 - 0. 6 1. 4 | - 0. 02 - 0. 05 0. 01 - 0. 01 0. 01 | 0. 07 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 35 . Do you wat ch TV |
| Tl 1 - ever y day 419 419. 00 | - 12. 1 2. 6 - 5. 0 - 1. 7 1. 3 | - 0. 45 0. 10 - 0. 19 - 0. 06 0. 05 | 1. 39 |
| Tl 2 - qui t e of t en 226 226. 00 | 1. 5 0. 2 1. 9 0. 6 0. 5 | 0. 09 0. 01 0. 11 0. 04 0. 03 | 3. 42 |
| Tl 3 - not ver y of t en 231 231. 00 | 7. 4 - 1. 7 3. 2 1. 8 - 3. 2 | 0. 43 - 0. 10 0. 18 0. 11 - 0. 19 | 3. 33 |
| Tl 4 - never 124 124. 00 | 6. 9 - 2. 0 1. 0 - 0. 6 1. 5 | 0. 58 - 0. 17 0. 09 - 0. 05 0. 13 | 7. 06 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 36 . I n or der t o change t he soci et y, do we need. . . |
| cha1 - Pr ogr essi ve r ef or ms 490 490. 00 | - 0. 5 - 0. 4 - 1. 7 1. 5 - 2. 4 | - 0. 01 - 0. 01 - 0. 06 0. 05 - 0. 08 | 1. 04 |
| cha2 - Radi cal changes 258 258. 00 | 2. 4 - 4. 2 5. 5 - 1. 4 2. 0 | 0. 13 - 0. 23 0. 30 - 0. 07 0. 11 | 2. 88 |
| cha3 - does not know 29 29. 00 | - 0. 4 - 0. 5 - 1. 8 - 2. 4 1. 7 | - 0. 07 - 0. 09 - 0. 34 - 0. 44 0. 31 | 33. 48 |
| 36_ - mi ssi ng cat egor y 223 223. 00 | - 1. 9 5. 1 - 3. 0 0. 6 0. 1 | - 0. 11 0. 30 - 0. 18 0. 03 0. 01 | 3. 48 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 38 . Ar e you a member of at l eas one associ at i on ? |
| ass1 - yes 536 536. 00 | 3. 3 4. 6 1. 2 1. 9 - 5. 8 | 0. 10 0. 14 0. 03 0. 06 - 0. 17 | 0. 87 |
| ass2 - no 464 464. 00 | - 3. 3 - 4. 6 - 1. 2 - 1. 9 5. 8 | - 0. 11 - 0. 16 - 0. 04 - 0. 06 0. 20 | 1. 16 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 40 . Age and gender |
| enq1 - mal e LE t han 38 yo 101 101. 00 | 4. 1 - 1. 2 0. 4 1. 3 - 1. 4 | 0. 39 - 0. 11 0. 04 0. 12 - 0. 13 | 8. 90 |
| enq2 - mal e GT 38 yo 35 35. 00 | - 3. 1 0. 0 - 0. 6 2. 4 - 3. 3 | - 0. 52 0. 00 - 0. 10 0. 39 - 0. 55 | 27. 57 |
| enq3 - f emal e LE 38 yo 526 526. 00 | 7. 0 1. 3 3. 9 - 2. 6 6. 3 | 0. 21 0. 04 0. 12 - 0. 08 0. 19 | 0. 90 |
| enq4 - f emal e GT 38 yo 338 338. 00 | - 8. 8 - 0. 6 - 4. 1 1. 0 - 4. 5 | - 0. 39 - 0. 03 - 0. 18 0. 04 - 0. 20 | 1. 96 |
| enq* - unknown 0 0. 00 | 0. 0 0. 0 0. 0 0. 0 0. 0 | 0. 00 0. 00 0. 00 0. 00 0. 00 | 0. 00 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 41 . Usual bedt i me ? |
| dod1 - 21h. or bef or e 73 73. 00 | - 6. 4 2. 1 - 2. 9 - 3. 5 2. 6 | - 0. 72 0. 24 - 0. 32 - 0. 39 0. 29 | 12. 70 |
| dod2 - bet ween 21h - 22h. 270 270. 00 | - 7. 5 0. 6 - 0. 8 0. 7 0. 3 | - 0. 39 0. 03 - 0. 04 0. 04 0. 02 | 2. 70 |
| dod3 - bet ween 22h - 23h. 443 443. 00 | 1. 2 - 1. 3 - 0. 5 3. 2 - 1. 1 | 0. 04 - 0. 05 - 0. 02 0. 11 - 0. 04 | 1. 26 |
| dod4 - bet ween 23h - 24h. 134 134. 00 | 8. 1 0. 7 3. 0 - 1. 2 - 0. 5 | 0. 65 0. 06 0. 24 - 0. 10 - 0. 04 | 6. 46 |
| dod5 - af t er mi dni ght 63 63. 00 | 6. 1 - 1. 8 0. 7 - 2. 0 - 0. 7 | 0. 74 - 0. 22 0. 08 - 0. 24 - 0. 08 | 14. 87 |
| dod6 - var i abl e 11 11. 00 | 0. 1 - 1. 3 2. 4 0. 3 0. 3 | 0. 02 - 0. 38 0. 72 0. 10 0. 09 | 89. 91 |
| 41_ - mi ssi ng cat egor y 6 6. 00 | 1. 9 2. 1 - 0. 8 - 1. 9 0. 3 | 0. 75 0. 87 - 0. 31 - 0. 76 0. 13 | 165. 67 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 43 . Has t he r espondent been i nt er est ed by t he sur vey |
| I nt 1 - a l ot 332 332. 00 | - 2. 8 1. 9 - 3. 3 3. 3 - 4. 9 | - 0. 13 0. 08 - 0. 15 0. 15 - 0. 22 | 2. 01 |
| I nt 2 - enough 542 542. 00 | 1. 0 - 1. 7 1. 1 - 1. 6 2. 2 | 0. 03 - 0. 05 0. 03 - 0. 05 0. 06 | 0. 85 |
| I nt 3 - a l i t t l e or not 124 124. 00 | 2. 3 0. 0 3. 1 - 2. 2 3. 8 | 0. 19 0. 00 0. 26 - 0. 18 0. 32 | 7. 06 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 44 . I deal number of chi l dr en ? |
| enf 0 - No one 51 51. 00 | 0. 4 - 0. 4 1. 5 - 1. 9 0. 4 | 0. 06 - 0. 06 0. 20 - 0. 26 0. 06 | 18. 61 |
| enf 1 - one 39 39. 00 | 0. 5 - 1. 3 1. 1 - 0. 1 2. 5 | 0. 07 - 0. 20 0. 17 - 0. 02 0. 39 | 24. 64 |
| enf 2 - t wo 431 431. 00 | - 2. 4 - 3. 1 1. 3 0. 9 1. 9 | - 0. 09 - 0. 11 0. 05 0. 03 0. 07 | 1. 32 |
| enf 3 - t hr ee 393 393. 00 | 0. 1 2. 4 - 2. 0 1. 3 - 1. 5 | 0. 00 0. 10 - 0. 08 0. 05 - 0. 06 | 1. 54 |
| enf 4 - f or and mor e 86 86. 00 | 3. 5 2. 4 - 0. 8 - 2. 3 - 2. 7 | 0. 36 0. 25 - 0. 08 - 0. 24 - 0. 28 | 10. 63 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| 54 . J ob i n 7 cat egor i es |
| csp1 - ex. agr . - ar t - commer 95 95. 00 | - 6. 4 5. 3 - 4. 3 3. 4 2. 5 | - 0. 62 0. 52 - 0. 42 0. 33 0. 25 | 9. 53 |
| csp2 - pr of . l i b. - cad. sup. 84 84. 00 | 10. 1 12. 8 9. 5 - 1. 9 0. 9 | 1. 06 1. 34 1. 00 - 0. 20 0. 09 | 10. 90 |
| csp3 - ouvr i er s 263 263. 00 | - 16. 1 - 9. 7 10. 7 - 12. 6 - 3. 5 | - 0. 86 - 0. 51 0. 57 - 0. 67 - 0. 18 | 2. 80 |
| csp4 - empl oys 198 198. 00 | 0. 1 - 5. 5 - 5. 3 11. 7 10. 2 | 0. 01 - 0. 35 - 0. 34 0. 74 0. 65 | 4. 05 |
| csp5 - cont r ema - cad. moy. 149 149. 00 | 8. 2 6. 8 7. 3 2. 3 - 3. 1 | 0. 62 0. 52 0. 55 0. 17 - 0. 24 | 5. 71 |
| csp6 - per sonnel de ser vi ce 70 70. 00 | - 2. 8 - 1. 9 - 4. 6 5. 7 5. 6 | - 0. 32 - 0. 22 - 0. 53 0. 65 0. 65 | 13. 29 |
| csp7 - aut r es 16 16. 00 | 1. 2 1. 4 0. 4 - 1. 5 - 2. 1 | 0. 30 0. 34 0. 09 - 0. 37 - 0. 51 | 61. 50 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +

CORRELATI ONS BETWEEN CONTI NUOUS VARI ABLES AND FACTORS
AXES 1 A 5
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| VARI ABLES | SUMMARY STATI STI CS | CORRELATI ONS |
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| NUM . ( I DEN) SHORT LABEL | COUNT ABS. WT MEAN ST. DEV. | 1 2 3 4 5 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| 15 . ( r i ng) Engi neer annual sal a | 806 806. 00 8478. 73 3668. 95 | - 0. 04 0. 06 0. 04 - 0. 01 0. 05 |
| 19 . ( r med) Doct or annual sal ar y | 713 713. 00 19383. 85 12608. 83 | - 0. 05 0. 12 0. 02 0. 03 - 0. 06 |
| 37 . ( ge ) Age | 1000 1000. 00 42. 68 17. 50 | - 0. 40 0. 55 - 0. 14 - 0. 21 0. 28 |
| 42 . ( nr ep) Number of mi ssi ng va | 1000 1000. 00 4. 05 4. 19 | - 0. 20 0. 12 - 0. 20 - 0. 08 0. 12 |
| 45 . ( f i n) End of st udy age | 997 997. 00 17. 29 3. 88 | 0. 69 0. 13 0. 24 0. 05 - 0. 11 |
| 46 . ( r sou) Sal ar y wi shes | 915 915. 00 7244. 48 4756. 78 | 0. 26 0. 21 0. 15 0. 03 - 0. 09 |
| 47 . ( r mi n) Resour ces est i mat i on | 897 897. 00 5561. 89 2423. 40 | 0. 19 - 0. 01 0. 14 - 0. 08 0. 14 |
| 48 . ( vaca) Summer hol i days i n n | 1000 1000. 00 18. 31 19. 37 | 0. 38 0. 02 0. 03 - 0. 06 - 0. 07 |
| 50 . ( POND) Wei ght | 1000 1000. 00 1. 00 0. 09 | - 0. 47 0. 27 - 0. 11 0. 25 - 0. 22 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
59

DEFAC - Factors description

This procedure provides help on the interpretation of the factors extracted from a factor
analysis.
A factor can be described quickly and clearly by the most significant elements. These may
be cases, categorical variables, continuous variables or frequencies, and used as active or
illustrative elements in the preceding analysis.

THE DESCRIPTION COMMAND TAB

60
THE PARAMETERS TAB

THE DEFAC RESULTS

INTERPRETATION TOOLS FOR FACTORIAL AXES
PRINTOUT ON FACTOR 1
BY ACTIVE CATEGORIES
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| I DEN. | T. VALUE | CATEGORY LABEL | VARI ABLE LABEL | WEI GHT | NUMBER |
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| di e2 | - 17. 40 | CEP | Di pl oma i n 5 cat egor i es | 321. 00 | 1 |
| emp1 | - 16. 14 | Wor ker | J ob cat egor y | 263. 00 | 2 |
| di e1 | - 10. 75 | No one | Di pl oma i n 5 cat egor i es | 189. 00 | 3 |
| agg1 | - 10. 11 | Lower t han 2. 000 | Ur ban ar ea si ze ( number of i nhabi t ant s) | 83. 00 | 4 |
| sl o2 | - 8. 88 | owner | Occupat i on st at us of housi ng i n 4 cat egor i es | 290. 00 | 5 |
| masc | - 8. 62 | mal e | Gender | 469. 00 | 6 |
| vmo2 | - 8. 14 | No | Do you own some secur i t i es ? | 879. 00 | 7 |
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| M I D D L E A R E A |
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| sl o3 | 8. 98 | t enant | Occupat i on st at us of housi ng i n 4 cat egor i es | 523. 00 | 22 |
| agc1 | 10. 73 | Lower t han 25 yo | Age i n 5 cat egor i es | 150. 00 | 23 |
| 49_ | 11. 44 | mi ssi ng cat egor y | J ob cat egor y | 125. 00 | 24 |
| agg5 | 13. 22 | Par i s | Ur ban ar ea si ze ( number of i nhabi t ant s) | 326. 00 | 25 |
| di e4 | 13. 86 | Bac - Br evet sup. | Di pl oma i n 5 cat egor i es | 182. 00 | 26 |
| emp3 | 14. 64 | Manager | J ob cat egor y | 229. 00 | 27 |
| di e5 | 16. 38 | Uni ver si t y | Di pl oma i n 5 cat egor i es | 150. 00 | 28 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| vmo2 | - 17. 08 | No | Do you own some secur i t i es ? | 879. 00 | 1 |
| sl o3 | - 16. 86 | t enant | Occupat i on st at us of housi ng i n 4 cat egor i es | 523. 00 | 2 |
| agc1 | - 13. 04 | Lower t han 25 yo | Age i n 5 cat egor i es | 150. 00 | 3 |
| emp1 | - 9. 65 | Wor ker | J ob cat egor y | 263. 00 | 4 |
| agc2 | - 8. 90 | 25 t o 34 yo | Age i n 5 cat egor i es | 284. 00 | 5 |
| agg4 | - 8. 83 | gr eat er t han 100. 000 | Ur ban ar ea si ze ( number of i nhabi t ant s) | 329. 00 | 6 |
| di e3 | - 8. 54 | BEPC- BE- BEPS | Di pl oma i n 5 cat egor i es | 158. 00 | 7 |
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| agc3 | 5. 86 | 35 t o 49 yo | Age i n 5 cat egor i es | 209. 00 | 22 |
| agg1 | 7. 86 | Lower t han 2. 000 | Ur ban ar ea si ze ( number of i nhabi t ant s) | 83. 00 | 23 |
| agc5 | 11. 97 | 65 yo and mor e | Age i n 5 cat egor i es | 169. 00 | 25 |
| vmo1 | 17. 08 | Yes | Do you own some secur i t i es ? | 121. 00 | 27 |
| sl o2 | 20. 24 | owner | Occupat i on st at us of housi ng i n 4 cat egor i es | 290. 00 | 28 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
61
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| 49_ | - 16. 60 | mi ssi ng cat egor y | J ob cat egor y | 125. 00 | 1 |
| f mi | - 12. 80 | gender | Gender | 531. 00 | 2 |
| agc1 | - 11. 84 | Lower t han 25 yo | Age i n 5 cat egor i es | 150. 00 | 3 |
| sl o2 | - 10. 29 | owner | Occupat i on st at us of housi ng i n 4 cat egor i es | 290. 00 | 4 |
| agg1 | - 10. 05 | Lower t han 2. 000 | Ur ban ar ea si ze ( number of i nhabi t ant s) | 83. 00 | 5 |
| emp2 | - 8. 50 | Empl oyee | J ob cat egor y | 335. 00 | 6 |
| agc4 | - 6. 40 | 50 t o 64 yo | Age i n 5 cat egor i es | 188. 00 | 7 |
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| sl o1 | 9. 87 | homeowner | Occupat i on st at us of housi ng i n 4 cat egor i es | 120. 00 | 24 |
| emp1 | 10. 73 | Wor ker | J ob cat egor y | 263. 00 | 25 |
| masc | 12. 80 | mal e | Gender | 469. 00 | 27 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

62

CLUSTERING WITH SPAD

HAC / MIXED : clustering on factors scores

PARTI - DECLA : tree cut and cluster description

CLASS - MINER : clusters characterization

ESCAL : Storing the factorial axes and the partitions

The term cluster analysis encompasses a number of different algorithms and methods for
grouping objects (cases) of similar kind into respective categories. A general question
facing researchers in many areas of inquiry is how to organize observed data into
meaningful structures, that is, to develop taxonomies.
In other words cluster analysis is an exploratory data analysis tool which aims at sorting
different objects into groups in a way that the degree of association between two objects is
maximal if they belong to the same group and minimal otherwise.

Note that the above discussions refer to clustering algorithms and do not mention
anything about statistical significance testing. The point here is that, unlike many other
statistical procedures, cluster analysis methods are mostly used when we do not have any
a priori hypotheses, but are still in the exploratory phase of our research. In a sense, cluster
analysis finds the "most significant solution possible."
Clustering with SPAD
63

RECIP / SEMIS - CLUSTERING ON FACTORS
SCORES

JUSTIFICATION FOR THE USE OF FACTORS SCORES

The RECIP/SEMIS procedure allows the SPAD user to perform a cluster analysis on
factors scores.

It is equivalent to perform a cluster analysis on a range of p variables than on the p factors
scores extracted from the factorial analysis.
Indeed, by transforming the original variables into factors, without reducing the number
of dimensions, despite their extraction ranked in decreasing variance explained, we do not
loose any information. Mathematically, it is simply a rotation of the original space.

However, it is interesting to consider a smaller factorial space with q dimensions (with q
lower than p) and perform a cluster analysis on this q first factorial scores. This way, we
focus on the most interesting part of the information (in that sense that the q factors
capture the main part of the overall variability) and we exclude the noisy remaining
information captured by the last factors.

In general, the data reduction by selecting the first q factors provides better and more
robust clustering.

The factors to retain for the cluster analysis are the ones that engender a smaller space
where the point cloud is stable.
In general, we retain a little bit more than the half of the factors (for MCA), even if a scree
appears after few factors on the eigenvalues graph.

In the parameters tab of this method, you can set the number of factors to retain for the
cluster analysis (10 by default).

Working with factorial scores means that whatever the factorial analysis performed, the
cluster analysis will always be processed on quantitative data.
The single distance, the Euclidean distance, will be used to measure the resemblance
between cases, and one aggregating criteria, Ward, will be used to calculate the difference
between two disjoints groups of cases.

RECIP / SEMIS - Clustering on factors scores
64
THE CLUSTERING ALGORITHMS

The clustering algorithms available in the SPAD software are the Hierarchical
Agglomerative Clustering (HAC or AHC, RECIP in SPAD) that provides a partitions
hierarchy, and the k-means method that provides a single partition.

A combining use of these two methods (mixed clustering) enables the consolidation of the
partition and increase its stability (SEMIS).

These two methods present the following disadvantage :
HAC provides a large number of partitions within one has to be chosen : it is not
always simple to select the most significant cut in the clustering tree. Moreover, the
clustering tree is not an optimal tree because the partition produced at a certain
level depends on the one produced at the previous step.
With the k-means method, the number of clusters has to be set by the user before
performing the analysis and the partition produced depends on the initial position of
the centrod clusters.

In order to compensate these disadvantages and to try to approach the optimal partition if
it exists, we can combine the use of the HAC and the k-means method : this is the purpose
of the mixed clustering method, called SEMIS in SPAD.

A first use combining theses two methods is the following : we perform K-means with a
large number of centrods and then we build a hierarchical tree by aggregating
successively the large number of clusters provided by the K-means method.
However, this method is relatively unstable on small size samples.
It is advised to choose HAC for sample size lower than 10 000 cases. For larger samples,
the mixed clustering method reduces a lot time processing and produces stable partitions.

65

In the CORMU parameters, modify the number of retain coordinates : 14.

THE RECIP / SEMIS parameters

The HAC algorithm (RECIP)

C Coordinates used for aggregation
With this parameter, the SPAD User defines the number of factors to retain to perform
the cluster analysis. This choice depends on the study of the eigenvalues in the
previous factorial analysis.
In our example, we use the 14 first factors.

C
66
The mixed clustering method (SEMIS)

C Starting partition
Three procedures are available .
- The first one consists in searching stable clusters by crossing many partitions
provided by centrods randomly selected.
The item Number defines the number of partitions (2 by default) and Size
determines the number of centrods for each partition.
- The others produce a single partition based on N centrods chosen by the SPAD
User or randomly selected.

C
67
THE HIERARCHY EDITOR

To access the hierarchy editor, double click on this icon .

THE CLUSTERING TREE

This tree is the graphical display of the partitions hierarchy.
The interest of it is to suggest graphically the number of clusters that exist in the dataset.
We can cut the tree where the gap is the highest.

7% 8% 7% 11% 14% 16% 37%
7
7% 8% 7% 48% 30%
5
7% 9% 8% 7% 8% 11% 8% 8% 14% 20%
10

THE TOOL BAR OF THE HIERARCHY EDITOR

Display / delete Delete Display the cuts
labels node number of the tree

Display Display Vertical or
Node number aggregation criteria horizontal tree

68
CURVE OF THE LEVEL INDEXES

Edit - Curve of the level indexes

The level index is the gain of inter-cluster inertia obtained by subdividing one node into
two nodes.
The larger bar corresponds the cut of the tree into two clusters.

69

PARTI - DECLA -
CUT OF THE TREE AND CLUSTERS DESCRIPTION

The PARTI procedure constructs partitions by pruning an aggregation tree. The procedure
creates the partitions requested by the user or by an automatic search for the best
partitions, by possibly improving them by iteration on mobile centers (consolidation). The
partitions created in this way will then be characterized automatically.

The DECLA procedure lets you describe the partitions determined by the PARTI
procedure.

We can define either each cluster of a partition, or globally the partition itself. All the
elements available (actives and illustrative) may participate in the characterization:
categories of categorical variables, categorical variables themselves, continuous variables,
the frequencies and the factorial axes.

THE PARAMETERS OF THE PARTI-DECLA METHOD

THE CHOICE OF PARTITIONS TAB

PARTI - DECLA -
Cut of the tree and clusters description
70
THE PARTITIONING PARAMETERS TAB

THE PARTITIONS CHARACTERIZATION TAB

See the DEMOD method.

71
THE PARTI-DECLA RESULTS

BUILDING UP PARTITIONS
DETERMI NI NG THE BEST PARTI TI ONS
RESEARCH OF I RREGULARI TI ES
+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| 1993- - 1994| - 39. 99 | **************************************************** |
| 1990- - 1991| - 19. 79 | ************************** |
| 1995- - 1996| - 17. 70 | ************************ |
+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
LI ST OF THE BEST 3 PARTI TI ON BETWEEN 3 AND 10 CLUSTERS
1 - PARTI TI ON I N 7 CLUSTERS

CUT "b" OF THE TREE I NTO 7 CLUSTERS
CLUSTERS FORMATI ON ( ON ACTI VE CASES)
SUMMARY DESCRI PTI ON
+- - - - - - - - - +- - - - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - +
| CLUSTER | COUNT | WEI GHT | CONTENT |
+- - - - - - - - - +- - - - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - +
| bb1b | 106 | 106. 00 | 1 TO 5 |
| bb2b | 375 | 375. 00 | 6 TO 23 |
| bb3b | 70 | 70. 00 | 24 TO 27 |
| bb4b | 79 | 79. 00 | 28 TO 32 |
| bb5b | 67 | 67. 00 | 33 TO 36 |
| bb6b | 141 | 141. 00 | 37 TO 42 |
| bb7b | 162 | 162. 00 | 43 TO 50 |
+- - - - - - - - - +- - - - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - +
LOADI NGS AND TEST- VALUES BEFORE CONSOLI DATI ON
AXES 1 A 5
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| CLUSTERS | TEST- VALUES | LOADI NGS | |
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - |
| I DEN - LABEL COUNT ABS. WT. | 1 2 3 4 5 | 1 2 3 4 5 | DI STO. |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| CUT " b" OF THE TREE I NTO 7 CLUSTERS |
| |
| bb1b - Cl ust er 1 / 7 106 106. 00 | 3. 7 - 8. 7 - 0. 3 2. 3 6. 9 | 0. 18 - 0. 39 - 0. 01 0. 09 0. 27 | 0. 83 |
| bb2b - Cl ust er 2 / 7 375 375. 00 | - 15. 2 - 7. 2 2. 3 - 6. 6 4. 2 | - 0. 32 - 0. 14 0. 04 - 0. 12 0. 07 | 0. 22 |
| bb3b - Cl ust er 3 / 7 70 70. 00 | - 10. 6 7. 0 - 9. 4 6. 8 0. 6 | - 0. 63 0. 39 - 0. 49 0. 35 0. 03 | 1. 75 |
| bb4b - Cl ust er 4 / 7 79 79. 00 | - 6. 3 2. 4 3. 6 8. 2 - 4. 9 | - 0. 35 0. 13 0. 18 0. 39 - 0. 23 | 1. 57 |
| bb5b - Cl ust er 5 / 7 67 67. 00 | 2. 8 - 2. 1 - 4. 3 - 2. 8 - 1. 7 | 0. 17 - 0. 12 - 0. 23 - 0. 15 - 0. 09 | 1. 98 |
| bb6b - Cl ust er 6 / 7 141 141. 00 | 12. 2 - 1. 6 - 1. 9 2. 9 - 11. 2 | 0. 49 - 0. 06 - 0. 07 0. 10 - 0. 38 | 0. 75 |
| bb7b - Cl ust er 7 / 7 162 162. 00 | 15. 4 13. 0 5. 8 - 4. 8 3. 5 | 0. 58 0. 46 0. 19 - 0. 15 0. 11 | 0. 73 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
CLUSTERI NG CONSOLI DATI ON
AROUND CENTERS OF THE 7 CLUSTERS ACHI EVED BY 10 I TERATI ONS WI TH MOVI NG CENTERS
BETWEEN- CLUSTERS I NERTI A I NCREASE
+- - - - - - - - - - - +- - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - +
| I TERATI ON | TOTAL I NERTI A | I NTER- CLUSTERS| RATI O |
| | | I NERTI A | |
+- - - - - - - - - - - +- - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - +
| 0 | 2. 35008 | 0. 77272 | 0. 32881 |
| 1 | 2. 35008 | 0. 82435 | 0. 35078 |
| 2 | 2. 35008 | 0. 82613 | 0. 35153 |
| 3 | 2. 35008 | 0. 82630 | 0. 35160 |
| 4 | 2. 35008 | 0. 82630 | 0. 35160 |
+- - - - - - - - - - - +- - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - +
STOP AFTER I TERATI ON 4. RELATI VE I NCREASE OF BETWEEN- CLUSTER I NERTI A
WI TH RESPECT TO THE PREVI OUS I TERATI ON I S ONLY 0. 000 %.
PARTI - DECLA -
72
I NERTI A DECOMPOSI TI ON
COMPUTED ON 14 AXES.
+- - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +
| | I NERTI AS | COUNTS | WEI GHTS | DI STANCES |
| I NERTI AS | BEFORE AFTER | BEFORE AFTER | BEFORE AFTER | BEFORE AFTER |
+- - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +
| | | | | |
| BETWEEN CLUSTERS | 0. 7727 0. 8263 | | | |
| | | | | |
| WI THI N CLUSTER | | | | |
| | | | | |
| CLUSTER 1 / 7 | 0. 1299 0. 1731 | 106 128 | 106. 00 128. 00 | 0. 8283 0. 8028 |
| CLUSTER 2 / 7 | 0. 6116 0. 5710 | 375 358 | 375. 00 358. 00 | 0. 2191 0. 2551 |
| CLUSTER 3 / 7 | 0. 0930 0. 0945 | 70 72 | 70. 00 72. 00 | 1. 7521 1. 7687 |
| CLUSTER 4 / 7 | 0. 1233 0. 1336 | 79 82 | 79. 00 82. 00 | 1. 5661 1. 5452 |
| CLUSTER 5 / 7 | 0. 1293 0. 1293 | 67 67 | 67. 00 67. 00 | 1. 9831 1. 9831 |
| CLUSTER 6 / 7 | 0. 2054 0. 2180 | 141 149 | 141. 00 149. 00 | 0. 7483 0. 7707 |
| CLUSTER 7 / 7 | 0. 2849 0. 2043 | 162 144 | 162. 00 144. 00 | 0. 7286 0. 9060 |
| | | | | |
| TOTAL I NERTI A | 2. 3501 2. 3501 | | | |
+- - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +
RATI O I NTER I NERTI A / TOTAL I NERTI A) : BEFORE . . 0. 3288
AFTER . . 0. 3516
LOADI NGS AND TEST- VALUES AFTER CONSOLI DATI ON
AXES 1 A 5
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| CLUSTERS | TEST- VALUES | LOADI NGS | |
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - |
| I DEN - LABEL COUNT ABS. WT. | 1 2 3 4 5 | 1 2 3 4 5 | DI STO. |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
| CUT " b" OF THE TREE I NTO 7 CLUSTERS |
| |
| bb1b - Cl ust er 1 / 7 128 128. 00 | 3. 8 - 8. 6 - 1. 8 4. 1 8. 0 | 0. 16 - 0. 35 - 0. 07 0. 15 0. 28 | 0. 80 |
| bb2b - Cl ust er 2 / 7 358 358. 00 | - 16. 0 - 6. 4 1. 9 - 8. 1 4. 0 | - 0. 35 - 0. 13 0. 04 - 0. 15 0. 07 | 0. 26 |
| bb3b - Cl ust er 3 / 7 72 72. 00 | - 10. 7 8. 3 - 9. 3 6. 5 - 0. 1 | - 0. 63 0. 46 - 0. 48 0. 32 0. 00 | 1. 77 |
| bb4b - Cl ust er 4 / 7 82 82. 00 | - 5. 8 2. 7 2. 9 7. 9 - 5. 6 | - 0. 32 0. 14 0. 14 0. 37 - 0. 25 | 1. 55 |
| bb5b - Cl ust er 5 / 7 67 67. 00 | 2. 8 - 2. 1 - 4. 3 - 2. 8 - 1. 7 | 0. 17 - 0. 12 - 0. 23 - 0. 15 - 0. 09 | 1. 98 |
| bb6b - Cl ust er 6 / 7 149 149. 00 | 13. 3 - 1. 2 - 2. 6 2. 4 - 11. 6 | 0. 52 - 0. 04 - 0. 09 0. 08 - 0. 38 | 0. 77 |
| bb7b - Cl ust er 7 / 7 144 144. 00 | 15. 1 11. 5 9. 4 - 4. 1 4. 3 | 0. 61 0. 43 0. 33 - 0. 14 0. 14 | 0. 91 |
+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +
CLUSTERS REPRESENTATI VES
CLUSTER 1/ 7
COUNT: 128
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
| RK | DI STANCE | I DENT. | | RK | DI STANCE | I DENT. | | RK | DI STANCE | I DENT. |
+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +
| 1| 0. 51034| 0980 | | 2| 0. 56936| 0091 | | 3| 0. 58376| 0485 |
| 4| 0. 58376| 0619 | | 5| 0. 62658| 0368 | | 6| 0. 62658| 0897 |
| 7| 0. 63989| 0704 | | 8| 0. 66465| 0184 | | 9| 0. 66465| 0232 |
| 10| 0. 66465| 0238 | | | | | | | | |
+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +
CLUSTER 2/ 7
COUNT: 358
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +
| 1| 0. 66989| 0459 | | 2| 0. 80053| 0043 | | 3| 0. 80753| 0322 |
| 4| 0. 86366| 0393 | | 5| 0. 86366| 0450 | | 6| 0. 86366| 0780 |
| 7| 0. 86366| 0540 | | 8| 0. 86366| 0460 | | 9| 0. 90535| 0082 |
| 10| 0. 91404| 0593 | | | | | | | | |
+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +
CLUSTER 3/ 7
COUNT: 72
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +
| 1| 0. 58799| 0741 | | 2| 0. 60470| 0940 | | 3| 0. 61735| 0639 |
| 4| 0. 61735| 0788 | | 5| 0. 69764| 0789 | | 6| 0. 70722| 0758 |
| 7| 0. 78494| 0766 | | 8| 0. 78494| 0806 | | 9| 0. 82442| 0742 |
| 10| 0. 82442| 0946 | | | | | | | | |
+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +
CLUSTER 4/ 7
COUNT: 82
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +
| 1| 0. 74814| 0156 | | 2| 0. 98976| 0575 | | 3| 1. 01170| 0730 |
| 4| 1. 07622| 0569 | | 5| 1. 12107| 0721 | | 6| 1. 12879| 0148 |
| 7| 1. 12879| 0660 | | 8| 1. 12879| 0715 | | 9| 1. 14287| 0566 |
| 10| 1. 14460| 0360 | | | | | | | | |
+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +
CLUSTER 5/ 7
COUNT: 67
73
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +
| 1| 0. 97554| 0358 | | 2| 1. 10787| 0130 | | 3| 1. 12353| 0328 |
| 4| 1. 27382| 0288 | | 5| 1. 27888| 0825 | | 6| 1. 29654| 0165 |
| 7| 1. 30224| 0828 | | 8| 1. 30330| 0302 | | 9| 1. 30330| 0326 |
| 10| 1. 34956| 0208 | | | | | | | | |
+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +
CLUSTER 6/ 7
COUNT: 149
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +
| 1| 0. 52061| 0062 | | 2| 0. 52061| 0240 | | 3| 0. 55153| 0419 |
| 4| 0. 55153| 0611 | | 5| 0. 66158| 0991 | | 6| 0. 70375| 0286 |
| 7| 0. 70767| 0251 | | 8| 0. 75757| 0497 | | 9| 0. 77031| 0377 |
| 10| 0. 78869| 0242 | | | | | | | | |
+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +
CLUSTER 7/ 7
COUNT: 144
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +
| 1| 0. 54714| 0141 | | 2| 0. 58623| 0007 | | 3| 0. 60549| 0243 |
| 4| 0. 63791| 0200 | | 5| 0. 64338| 0025 | | 6| 0. 72304| 0172 |
| 7| 0. 72691| 0004 | | 8| 0. 74024| 0006 | | 9| 0. 74024| 0352 |
| 10| 0. 74024| 0343 | | | | | | | | |
+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +
DI STANCE'S MATRI X BETWEEN CLUSTERS
| 1 2 3 4 5 6 7
- - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1 | 0. 000
2 | 1. 134 0. 000
3 | 1. 701 1. 443 0. 000
4 | 1. 628 1. 402 1. 856 0. 000
5 | 1. 752 1. 608 1. 990 1. 984 0. 000
6 | 1. 327 1. 183 1. 746 1. 637 1. 703 0. 000
7 | 1. 383 1. 247 1. 820 1. 702 1. 770 1. 283 0. 000
- - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
| 1 2 3 4 5 6 7

PARTI - DECLA -
74
DESCRIPTION OF: CUT "b" OF THE TREE INTO 7 CLUSTER

The characterizing elements are classified by order of importance with the help of a
statistical criterion (test-value) with which is associated a probability : the larger the test-
value is, the lower the probability, the better the element is defined.

In the case of the description of the classes by the categories of the categorical variables, an
option allows to sort the characterizing categories by decreasing test-values, or by
percentages.

CLUSTERS CHARACTERISATION BY ACTIVE CATEGORIES
CHARACTERISATION BY CATEGORIES OF CLUSTERS OR CATEGORIES
OF CUT "b" OF THE TREE INTO 7 CLUSTERS
Cluster 1 / 7
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
T. VALUE PROB. - - - - PERCENTAGES - - - - CHARACTERI STI C WEI GHT
GRP/ CAT CAT/ GRP GLOBAL CATEGORI ES OF VARI ABLES
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
12. 80 Cl ust er 1 / 7 128
24. 52 0. 000 81. 01 100. 00 15. 80 BEPC- BE- BEPS Di pl oma i n 5 cat egor i es 158
4. 73 0. 000 17. 59 71. 88 52. 30 t enant Occupat i on st at us of housi ng i n 4 cat egor i es 523
3. 10 0. 001 18. 31 40. 63 28. 40 25 t o 34 yo Age i n 5 cat egor i es 284
3. 08 0. 001 17. 61 46. 09 33. 50 Empl oyee J ob cat egor y 335
2. 85 0. 002 20. 67 24. 22 15. 00 Lower t han 25 yo Age i n 5 cat egor i es 150
- 2. 04 0. 021 8. 73 15. 63 22. 90 Manager J ob cat egor y 229
- 2. 27 0. 012 8. 97 20. 31 29. 00 owner Occupat i on st at us of housi ng i n 4 cat egor i es 290
- 2. 33 0. 010 2. 08 0. 78 4. 80 Ot her J ob cat egor y 48
- 2. 72 0. 003 3. 61 2. 34 8. 30 Lower t han 2. 000 Ur ban ar ea si ze ( number of i nhabi t ant s) 83
- 3. 01 0. 001 5. 92 7. 81 16. 90 65 yo and mor e Age i n 5 cat egor i es 169
- 3. 28 0. 001 6. 22 10. 16 20. 90 35 t o 49 yo Age i n 5 cat egor i es 209
- 3. 81 0. 000 0. 00 0. 00 6. 70 f r ee housi ng, ot her Occupat i on st at us of housi ng i n 4 cat egor i es 67
- 4. 49 0. 000 0. 00 0. 00 8. 70 2. 000 - 20. 000 Ur ban ar ea si ze ( number of i nhabi t ant s) 87
- 6. 27 0. 000 0. 00 0. 00 15. 00 Uni ver si t y Di pl oma i n 5 cat egor i es 150
- 7. 06 0. 000 0. 00 0. 00 18. 20 Bac - Br evet sup. Di pl oma i n 5 cat egor i es 182
- 7. 22 0. 000 0. 00 0. 00 18. 90 No one Di pl oma i n 5 cat egor i es 189
- 10. 07 0. 000 0. 00 0. 00 32. 10 CEP Di pl oma i n 5 cat egor i es 321
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Cluster 2 / 7
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
35. 80 Cl ust er 2 / 7 358
14. 73 0. 000 68. 54 61. 45 32. 10 CEP Di pl oma i n 5 cat egor i es 321
12. 34 0. 000 67. 68 49. 72 26. 30 Wor ker J ob cat egor y 263
11. 58 0. 000 73. 02 38. 55 18. 90 No one Di pl oma i n 5 cat egor i es 189
6. 09 0. 000 49. 24 45. 25 32. 90 gr eat er t han 100. 000 Ur ban ar ea si ze ( number of i nhabi t ant s) 329
5. 94 0. 000 56. 00 27. 37 17. 50 20. 000 - 100. 000 Ur ban ar ea si ze ( number of i nhabi t ant s) 175
5. 32 0. 000 38. 68 94. 97 87. 90 No Do you own some secur i t i es ? 879
4. 33 0. 000 50. 89 24. 02 16. 90 65 yo and mor e Age i n 5 cat egor i es 169
- 2. 58 0. 005 30. 06 27. 37 32. 60 Par i s Ur ban ar ea si ze ( number of i nhabi t ant s) 326
- 3. 64 0. 000 22. 67 9. 50 15. 00 Lower t han 25 yo Age i n 5 cat egor i es 150
- 3. 91 0. 000 10. 42 1. 40 4. 80 Ot her J ob cat egor y 48
- 4. 20 0. 000 19. 20 6. 70 12. 50 mi ssi ng cat egor y J ob cat egor y 125
- 5. 32 0. 000 14. 88 5. 03 12. 10 Yes Do you own some secur i t i es ? 121
- 12. 22 0. 000 0. 00 0. 00 15. 80 BEPC- BE- BEPS Di pl oma i n 5 cat egor i es 158
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
75
Cluster 3 / 7
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
7. 20 Cl ust er 3 / 7 72
21. 01 0. 000 86. 75 100. 00 8. 30 Lower t han 2. 000 Ur ban ar ea si ze ( number of i nhabi t ant s) 83
9. 61 0. 000 20. 34 81. 94 29. 00 owner Occupat i on st at us of housi ng i n 4 cat egor i es 290
8. 27 0. 000 50. 00 33. 33 4. 80 Ot her J ob cat egor y 48
4. 40 0. 000 12. 77 56. 94 32. 10 CEP Di pl oma i n 5 cat egor i es 321
2. 13 0. 017 7. 85 95. 83 87. 90 No Do you own some secur i t i es ? 879
- 2. 13 0. 017 2. 48 4. 17 12. 10 Yes Do you own some secur i t i es ? 121
- 2. 22 0. 013 2. 40 4. 17 12. 50 mi ssi ng cat egor y J ob cat egor y 125
- 3. 10 0. 001 3. 04 11. 11 26. 30 Wor ker J ob cat egor y 263
- 7. 38 0. 000 0. 00 0. 00 32. 90 gr eat er t han 100. 000 Ur ban ar ea si ze ( number of i nhabi t ant s) 329
- 7. 81 0. 000 1. 34 9. 72 52. 30 t enant Occupat i on st at us of housi ng i n 4 cat egor i es 523
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Cluster 4 / 7
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
8. 20 Cl ust er 4 / 7 82
22. 73 0. 000 94. 25 100. 00 8. 70 2. 000 - 20. 000 Ur ban ar ea si ze ( number of i nhabi t ant s) 87
3. 15 0. 001 16. 67 24. 39 12. 00 homeowner Occupat i on st at us of housi ng i n 4 cat egor i es 120
2. 17 0. 015 11. 38 40. 24 29. 00 owner Occupat i on st at us of housi ng i n 4 cat egor i es 290
- 7. 94 0. 000 0. 00 0. 00 32. 90 gr eat er t han 100. 000 Ur ban ar ea si ze ( number of i nhabi t ant s) 329
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Cluster 5 / 7
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
6. 70 Cl ust er 5 / 7 67
21. 82 0. 000 100. 00 100. 00 6. 70 f r ee housi ng, ot her Occupat i on st at us of housi ng i n 4 cat egor i es 67
- 3. 65 0. 000 0. 00 0. 00 12. 00 homeowner Occupat i on st at us of housi ng i n 4 cat egor i es 120
- 6. 51 0. 000 0. 00 0. 00 29. 00 owner Occupat i on st at us of housi ng i n 4 cat egor i es 290
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Cluster 6 / 7
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
14. 90 Cl ust er 6 / 7 149
25. 06 0. 000 80. 77 98. 66 18. 20 Bac - Br evet sup. Di pl oma i n 5 cat egor i es 182
5. 87 0. 000 27. 95 42. 95 22. 90 Manager J ob cat egor y 229
4. 67 0. 000 30. 40 25. 50 12. 50 mi ssi ng cat egor y J ob cat egor y 125
4. 23 0. 000 27. 33 27. 52 15. 00 Lower t han 25 yo Age i n 5 cat egor i es 150
3. 52 0. 000 20. 86 45. 64 32. 60 Par i s Ur ban ar ea si ze ( number of i nhabi t ant s) 326
2. 27 0. 012 22. 50 18. 12 12. 00 homeowner Occupat i on st at us of housi ng i n 4 cat egor i es 120
- 2. 57 0. 005 10. 75 24. 16 33. 50 Empl oyee J ob cat egor y 335
- 3. 49 0. 000 6. 51 7. 38 16. 90 65 yo and mor e Age i n 5 cat egor i es 169
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PARTI - DECLA -
76
Cluster 7 / 7
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
14. 40 Cl ust er 7 / 7 144
24. 37 0. 000 88. 67 92. 36 15. 00 Uni ver si t y Di pl oma i n 5 cat egor i es 150
11. 52 0. 000 40. 17 63. 89 22. 90 Manager J ob cat egor y 229
7. 36 0. 000 26. 69 60. 42 32. 60 Par i s Ur ban ar ea si ze ( number of i nhabi t ant s) 326
5. 76 0. 000 33. 88 28. 47 12. 10 Yes Do you own some secur i t i es ? 121
- 5. 40 0. 000 6. 27 14. 58 33. 50 Empl oyee J ob cat egor y 335
- 5. 76 0. 000 11. 72 71. 53 87. 90 No Do you own some secur i t i es ? 879
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

THE GRAPH EDITOR

77

CLASS - MINER - Clusters description
78

CLASS - MINER - CLUSTERS DESCRIPTION

This procedure lets you describe the partitions created by the PARTI procedure with the
variables that did not participate in the analysis.

We can thus select variables by themes and evaluate their characterizing power on the
partitions constructed (typologies). The parameter settings and the outputs are identical to
those of the DECLA procedure of the PARTI-DECLA icon.

Characteristic elements are classified by order of importance with the help of a statistical
criterion (test-value) with which is associated a probability: the higher the level of the test-
value, and the weaker the probability, the more strongly the element is characterized.

79

ESCAL - STORING THE FACTORIAL AXES AND THE
PARTITIONS

Regression and Analysis of Variabce,
General Linear Model
80

THE LINEAR MODEL AND ITS APPLICATIONS

REGRESSION AND ANALYSIS OF VARIABCE,
GENERAL LINEAR MODEL

OBJECT

The general purpose of this procedure, called VAREG, is to learn more about the
relationship between several independent or predictor variables and a dependent
continuous variable.

VAREG allows you to perform least squares adjustement models with a constant term. It
can be used for many different analyses including:

Simple regression
Multiple regression
Analysis of variance
Analysis of covariance

VAREG enables you to specify interactions (crossed effects) up to the 3
rd
order. Each
regression coefficient is associated with the null test, which is valid in the classical context
where the random term is assumed to be generated by a Laplace-Gauss law.
The REPEATED statement enables you to specify effects in the model that represent
repeated measurements on the same experimental unit for the same response.
The VAREG procedure generates automatically a rule file that allows you to create a new
data set (with the Deployment Archiving\Archiving\Predcitions method) containing
the input dataset in addition to predicted values and residuals.
The treatment of missing data is handled by the parameters.

OUTPUTS

Summary statistics on the variables of the model are output: Marginal distributions of the
categorical variables; mean, standard deviation, minimum and maximum of the
continuous variables. The method supplies the identification of the coefficients of the
model: coefficient of the continuous (endogenous) variables, the categories of the factors
and of the eventual interactions. Subsequently it is possible to output the matrix of the
variances and covariance, or the correlations matrix.
The Linear Model and its applications
81

The procedure prints the coefficients, the estimation of their standard deviation, the
corresponding Students statistic, as well as the critical probability and the associated test
value. Also shown are the sum of the squares of the deviations, the multiple correlation
coefficient, and the estimate of the common factor variance of the residuals. Finally, the
test of simultaneous nullity of all the coefficients (test of an endogenous "y" constant) is
provided.

In the case of an analysis of variance, you also get the sum of the squared deviations
according to their source (residual, criteria or interaction) as well as Fishers statistics, the
critical probabilities and the associated test values. In the case of repeated observations,
the repeatability variance is displayed, as well as the estimates obtained including the
variance.

DEFINE A MODEL

The interface allows you to specify the definition of one or more models. The functions of
the CTRL, SHIFT keys are standard.

1. In the Selection list choose the TYPE of the variable(s) you want to define
82

2. Then in the Variables Available list, select one or more variables of the TYPE, and
confirm your choices with the transfer button. A double click on a variable
confirms the choice.

To delete a variable, or an interaction, of the model under construction, select it in the list
of models and confirm with the transfer button .

3. Save a model
Once that you have specified at least one endogenous variable and one exogenous
variable, click on the "Validate" Button to add the model to the Model list.

Delete a model
Select the model from the list and click on "Delete" button.

Change a model
Select the model in the list and click on "Modify" button.

PARAMETERS

The VAREG parameters allow you to handle missng data and to specify wether
measurments are repeated or not. With the printout parameters, you can specify the
desired ouputs.

83

Missing data handling for continuous variables (LSUPR)
Possible values : Deleted case / Mean imputation

If LSUPR = Deleted case, the cases presenting the missing data for one of the
variables of the model (endogenous or exogenous) will be eliminated from the
analysis.
If LSUPR = Mean imputation, the missing exogenous data will be replaced by the
corresponding variable.

Warning : if LSUPR = Mean imputation, the endogenous variable must not have any
missing data.

Missing data handling categorical variables (LZERO)
Possible values: Re-coded / Deleted case

If LZERO = Re-coded, the missing values will treated as a normal state.
If LZERO = Deleted case, the cases with missing data will be eliminated.

Treatment with repetitions (LREP)

Possible values : No (there are no repetitions) / Repetitions in sequence / Repetitions
in disorder

This parameter concerns the treatment of experiment plans.
When there are repetitions, the variance of the observations may be estimated on the
repetitions of observations, rather than on the whole of the observations. It is not
necessary that the number of repetitions is the same everywhere.

Choose LREP = Repetitions in sequence if the repetitions are one under the others
in the data table lines.
Choose LREP = Repetitions in confusion if the repetitions are unordered

Output Parameters

Summary statistics on the variables in the model (LSTAT)
Possible values : Yes / No

If LSTAT = Yes, one obtains marginal distributions for the categorical variables of the
model, as well as the various statistics concerning the continuous variables : mean,
standard deviation, minimum and maximum.

Printout of the covariances matrix (LMAT)
Possible values :

No (No output)
Variances, covariance (Output the variance covariance matrix)
84
Correlations (Output the correlations matrix)

File for Excel application
Possible values: Yes / No
If LEXCE = YES, you will have available on output an ASCII delimited file, which can be
directly imported into Excel application

Variables labels (LABEL)
Possible values : short / long

If LABEL = short, we use 4 characters for categorical variable label and 20 for
continuous variable label.

If LABEL = long, we use 60 characters for categorical variable and continuous
variable label.

85

OPTIMAL REGRESSIONS RESEARCH

General principles

This procedure selects the N best adjustments for a regression. The selection criterion
can be the R2, the adjusted R2 or Mallow's Cp.

Let N be the number of the best adjustments requested, and P be the number of explicative
(exogenous) variables for the model. The procedure shows the N best adjustments for all
sizes of the models, from 1 to P-1 variables (the adjustment with the P variables is unique).

The procedure supplies the criterion value (R2, adjusted R2 or the Cp), Fishers F
associated with R2, the critical probability associated with this F, and the corresponding
test value.

The list of the variables of the model is then shown with the estimated coefficients, the
nullity tests, the critical probability and the associated test value. Finally, a diagram
representing the evolution of the criterion as a function of the number of variables in the
models shows a quick summary of the selections.

For the R2 criterion, all the printed selections are optimal. For the other two criteria, the
selections are not always optimal (the adjusted R2 and Mallows Cp vary in a non-
monotone way as a function of the number of variables). A non-optimal selection is
identified if the procedure does not show the coefficients of the variables (only the names
of the variables and the value of the criterion are shown). In this case the selected
adjustment, if it is not optimal for the criterion, is, nonetheless better than the adjustments
that were not calculated.

Reference:
Selection algorithm is a transcription of the algorithm "leaps and bounds" from Furnival
& Wilson. (Technometrics, 174, Vol.16, pp.499-511).

Data
This dataset corresponds to the perception that has 100 companies of their furnishers.
Criteria are the following:
Delivery time
Price index
Price flexibility
Perceived quality
Service quality
Commercial image
Product quality
Satisfaction

The main goal is to find the best model explaining Satisfaction by a subset of the other
items.
Optimal Regressions Research
86

Id
Company
Size
Delivery
delay
Price
Index
Price
Flexibility
Perceived
Quality
Service
Quality
Commercial
Image
Product
Quality
1 <50 employees 4,1 0,6 6,9 4,7 2,4 2,3 5,2
2 >=50 employees 1,8 3 6,3 6,6 2,5 4 8,4
3 >=50 employees 3,4 5,2 5,7 6 4,3 2,7 8,2
4 >=50 employees 2,7 1 7,1 5,9 1,8 2,3 7,8
5 <50 employees 6 0,9 9,6 7,8 3,4 4,6 4,5
6 >=50 employees 1,9 3,3 7,9 4,8 2,6 1,9 9,7
7 <50 employees 4,6 2,4 9,5 6,6 3,5 4,5 7,6
8 >=50 employees 1,3 4,2 6,2 5,1 2,8 2,2 6,9
9 <50 employees 5,5 1,6 9,4 4,7 3,5 3 7,6
10 >=50 employees 4 3,5 6,5 6 3,7 3,2 8,7
11 <50 employees 2,4 1,6 8,8 4,8 2 2,8 5,8
12 <50 employees 3,9 2,2 9,1 4,6 3 2,5 8,3
13 >=50 employees 2,8 1,4 8,1 3,8 2,1 1,4 6,6
14 <50 employees 3,7 1,5 8,6 5,7 2,7 3,7 6,7
15 <50 employees 4,7 1,3 9,9 6,7 3 2,6 6,8
16 <50 employees 3,4 2 9,7 4,7 2,7 1,7 4,8
17 <50 employees 3,2 4,1 5,7 5,1 3,6 2,9 6,2
18 <50 employees 4,9 1,8 7,7 4,3 3,4 1,5 5,9
19 <50 employees 5,3 1,4 9,7 6,1 3,3 3,9 6,8
20 <50 employees 4,7 1,3 9,9 6,7 3 2,6 6,8
21 <50 employees 3,3 0,9 8,6 4 2,1 1,8 6,3
22 <50 employees 3,4 0,4 8,3 2,5 1,2 1,7 5,2
23 <50 employees 3 4 9,1 7,1 3,5 3,4 8,4
24 >=50 employees 2,4 1,5 6,7 4,8 1,9 2,5 7,2
25 <50 employees 5,1 1,4 8,7 4,8 3,3 2,6 3,8
26 <50 employees 4,6 2,1 7,9 5,8 3,4 2,8 4,7
27 >=50 employees 2,4 1,5 6,6 4,8 1,9 2,5 7,2
28 <50 employees 5,2 1,3 9,7 6,1 3,2 3,9 6,7
29 <50 employees 3,5 2,8 9,9 3,5 3,1 1,7 5,4
30 >=50 employees 4,1 3,7 5,9 5,5 3,9 3 8,4
31 >=50 employees 3 3,2 6 5,3 3,1 3 8
32 <50 employees 2,8 3,8 8,9 6,9 3,3 3,2 8,2
33 <50 employees 5,2 2 9,3 5,9 3,7 2,4 4,6
34 >=50 employees 3,4 3,7 6,4 5,7 3,5 3,4 8,4
35 >=50 employees 2,4 1 7,7 3,4 1,7 1,1 6,2
36 >=50 employees 1,8 3,3 7,5 4,5 2,5 2,4 7,6
37 >=50 employees 3,6 4 5,8 5,8 3,7 2,5 9,3
38 <50 employees 4 0,9 9,1 5,4 2,4 2,6 7,3
39 >=50 employees 0 2,1 6,9 5,4 1,1 2,6 8,9
40 >=50 employees 2,4 2 6,4 4,5 2,1 2,2 8,8
41 >=50 employees 1,9 3,4 7,6 4,6 2,6 2,5 7,7
42 <50 employees 5,9 0,9 9,6 7,8 3,4 4,6 4,5
43 <50 employees 4,9 2,3 9,3 4,5 3,6 1,3 6,2
44 <50 employees 5 1,3 8,6 4,7 3,1 2,5 3,7
45 >=50 employees 2 2,6 6,5 3,7 2,4 1,7 8,5
46 <50 employees 5 2,5 9,4 4,6 3,7 1,4 6,3
47 <50 employees 3,1 1,9 10 4,5 2,6 3,2 3,8
48 >=50 employees 3,4 3,9 5,6 5,6 3,6 2,3 9,1
49 <50 employees 5,8 0,2 8,8 4,5 3 2,4 6,7
50 <50 employees 5,4 2,1 8 3 3,8 1,4 5,2
51 <50 employees 3,7 0,7 8,2 6 2,1 2,5 5,2
52 >=50 employees 2,6 4,8 8,2 5 3,6 2,5 9
53 >=50 employees 4,5 4,1 6,3 5,9 4,3 3,4 8,8
54 >=50 employees 2,8 2,4 6,7 4,9 2,5 2,6 9,2
55 <50 employees 3,8 0,8 8,7 2,9 1,6 2,1 5,6
56 <50 employees 2,9 2,6 7,7 7 2,8 3,6 7,7
87

88
Fuwil 3 - Excel sheet output

Missing data handling for exogenous variables
Missing values are replaced by general means
Variable label Mean
Number of missing
values
Delivery Time 3,515 0
Price Index 2,364 0
Price Flexibility 7,894 0
Perceived Quality 5,248 0
Service Quality 2,916 0
Commercial Image 2,665 0
Product Quality 6,971 0
Usage Index 46,100 0

R criteria
Curve of R according to the number of variables

The following graph displays the evolution of the R criteria according to the number of
variables entered in the model. Higher is this criteria, better is the model.
But as this criterion increases automatically by entering new variables in the model, we
must evaluate the relative gain of adding each new variable. We will see further criteria
that penalize the R for each new entered variable: the adjusted R adjusted and the
Mallow C(p).
By looking at the graph below, we see that the R increases significantly up to 3 variables.
The next variables are redundant and do not bring any more information that could
improve significantly the model.
The R can be interpreted as the part of the variance explained by the model. It takes its
values between 0 and 1.
Curve of R2 accordind to the number of variables
Value of R2
Number of
model's
variables
0.45 0.48 0.52 0.55 0.59 0.62 0.66 0.69 0.73 0.77 0.80
1
2
3
4
5
6
7
8

89
1 var

This output presents the 3 best adjustments with one exogenous variable.

Adjustments with 1 variable + constant DDL(Student) = 98
Adjustment 1 (Full printout)
R**2 = 0.5051
Fisher = 100.0162
Probability = 0.0000
Test-Value = 8.283
Variable label Coefficient Student Probability Test-Value
Usage Index 0,0676 10,00 0,000 8,28

R**2 = 0.4233
Fisher = 71.9390
Test-Value = 7.327
Delivery Time 0,4215 8,48 0,000 7,33

R**2 = 0.3985
Fisher = 64.9139
Test-Value = 7.040
Service Quality 0,7189 8,06 0,000 7,04

The number of degrees of freedom is 98.

The first adjustment is the best one, with an R of 0.5051 ; meaning that the variance
explained by the model represents 50,51 % of the total variance.

The Fisher statistic corresponds to the global validation of the model. This statistics
follows a fisher distribution with one and 98 degrees of freedom. Its value of 100.02
corresponds to a p-value lower than 1/10000 (0.0000). Thus, the model is acceptable. This
p-value is also expressed as a test-value: 8.283 here.

The Coefficient column presents the estimation of the coefficient of the variable Usage
Index: the model can be written: Satisfaction Index = constant + 0.0676 x Usage Index

The Student column tests the nullity of the coefficient for the concerned variable: this
statistic follows a student distribution with 98 degrees of freedom. Its value of 10
corresponds to a p-value lower than 1/10000 (0.0000). The coefficient is significantly
different than zero.

This probability is also expressed in test value. Since the model gets only one explanatory
variable, the test value of the coefficient is the same than the global model.
90
6 vars

Adjustments with 6 variables + constant DDL(Student) = 93
R**2 = 0.8009
Fisher = 62.3410
Test-Value = 11.408
Delivery Time 0,3061 8,10 0,000 7,03
Price Index 0,2446 5,95 0,000 5,47
Price Flexibility 0,2912 7,99 0,000 6,95
Perceived Quality 0,4324 7,39 0,000 6,54
Commercial Image -0,1978 2,35 0,021 2,31
Product Quality -0,0470 1,49 0,139 1,48

R**2 = 0.7993
Fisher = 61.7159
Test-Value = 11.376
Delivery Time 0,0777 1,49 0,140 1,47
Service Quality 0,4536 5,87 0,000 5,40
Product Quality -0,0417 1,33 0,188 1,32

R**2 = 0.7973
Fisher = 60.9833
Test-Value = 11.338
Price Index -0,0624 1,14 0,256 1,14
Service Quality 0,5884 7,93 0,000 6,91
Product Quality -0,0453 1,42 0,159 1,41

The three adjustments listed above have 6 exogenous variables.

For the first adjustment, we can see that the variable Product Quality has a coefficient
non significantly different than zero to the usual threshold of 5%.

Finally, the best adjustment is obtained with 6 exogenous variables. It is confirmed by the
following graphs.

But since one coefficient is not significantly different than zero, we may choose the model
with 5 variables:
91

Adjustments with 5 variables + constant DDL(Student) = 94
R**2 = 0.7961
Fisher = 73.4081
Test-Value = 11.506
Delivery Time 0,3247 9,05 0,000 7,65
Price Index 0,2291 5,73 0,000 5,29

The R adjusted criterion
Curve of R adjusted according to the number of explanatory variables

The R adjusted criterion is based on the standard R, but it imposes a penalty for each
additional explanatory variable that is used to build the model. To increase this criterion,
the entry of a new variable needs to be sufficient (if the variable is redundant with the
ones already included in the model, the criterion decreases).

The graph below shows that the best models have to be found in the ones with 5 or 6
explanatory variables.
Curve of R2 ajusted accordind to the number of variables
Value of R2 ajusted
Number
of model's
variables
0.44 0.48 0.51 0.55 0.58 0.62 0.65 0.68 0.72 0.75 0.79
1
2
3
4
5
6
7
8

92
The Mallows CP criterion
Curve of Mallows CP according to the number of explanatory variables

Lower is this criterion, better is the adjustment. We get the same results than with the
previous criterions, the best models have 5 or 6 variables.
Curve of Mallows Cp accordind to the number of variables
Value of Mallows Cp
Number
of
model's
variables
0.0 0.1 0.3 0.4 0.5 0.7 0.8 0.9 1.1 1.2 1.3
1
2
3
4
5
6
7
8

93
Formulas of the criterions R, R adjusted and Mallows Cp

1. R :
The coefficient of determination R2 (which takes values in the range 0 to 1) is a measure
of the proportion of the total variation that is associated with the regression process:

1
SSE
R
SST
=
SSE : Error Sum of Squares
SST : Total Sum of Squares.

2. R adjusted :
additional explanatory variable that is used to build the model.
( 1)(1 )
1
( )
n R
R
n p

=

n : the number of observations,
p : the number of variables used for the model plus one.

3. Mallows CP - C(p) :
The Mallows C(p) is positively related to the error (SSE) and the number of
explanatory variables in the model :a model with a lot of variables or with a high error
will be penalized by this criterion.

( ) 2
SSE
C p p n
SST
= +

References:

Furnival, G.M. and Wilson, R.W. (1974), Regression by Leaps and Bounds
Technometrics, 16, 499 -511.

Logistic Regression
94

LOGISTIC REGRESSION

Logistic regression is a model used for prediction of the probability of occurrence of an
event by fitting data to a logistic curve. It makes use of several predictor variables that
may be either numerical or categorical.
Binomial (or binary) logistic regression is a form of regression which is used when the
dependent Y is a dichotomy and the independents are of any type X1, X2,..., Xp.

LOGIT INTRODUCTION

The logistic regression means to explain the probability of a binary event. This probability
cannot be explained by a traditional regression model using the least squares method.
Thus, we perform a qualified LOGIT transformation whose process uses the generalized
linear model and establishes a method based on the research of maximum likelihood.

If P is the probability that we are trying to explain, the P/(1-P) ratio must be defined as
ODDS and the magnitude that is finally explained corresponds to this ODDS logarithm.

We want to explain ( )
1 2
1/ , P Y X X =
Thus: ( ) ( )
1 2 1 2
1/ , 2/ , 1 P Y X X P Y X X = + = =
The logit of the probability P is the logarithm of the quotient :
1
P
P

( ) Logit Log
1
P
P
P

=

(1)
Graphical representation of the P logit

0 1/2 1 P
.
1

P
P
Log

95
LOGISTIC MODEL WITH BINARY EXPLANATORY VARIABLES

The model can be written:
0 1 1 2 2
Log
1
P
X X
P

= + +

(2)
The logit of the probability is a linear function of the explanatory variables but the
probability itself is a non linear function. Indeed, according to (2)
( )
( )
0 1 1 2 2
0 1 1 2 2
exp
1 exp
X X
P
X X

+ +
=
+ + +

The model (2) is an additive model for binary categorical exogenous variables (coded 0 or
1). The models with categorical exogenous variables with more than 2 categories and with
crossed effects are presented further.

LOGISTIC MODEL WITH CATEGORICAL EXOGENEOUS VARIABLES
WITH MORE THAN 2 CATEGORIES

A categorical variable with no hierarchy in the categories needs to be recoded before its
introduction in the model into many binary variables (0/1), well known under the name
of design variables.
We introduce as much design variables as categories.
But the following problem appears: the k design variables are not independent because
their sum makes 1 whatever the individual.
A simple solution is to eliminate one of the design variables. The category not introduced
in the model has a zero coefficient by convention. We can consider that it represents the
reference situation.
Mathematically, the choice of the reference category has no importance.
We can for example choose as reference the modal category (category with the largest
count).
Consider Y as the dependent variables with 2 categories 1 and 2. Consider Z as a
categorical variable with 4 categories corresponding to the race of the individual.
Z = 1: White
Z = 2: Black
Z = 3: Hispanic
Z = 4: Others
If we choose the White category as reference, the D matrix is the following:
The three columns of D (D2, D3, D4) correspond to the coding of Z into design variables
that will be introduced in the model.
Logistic Regression
96
Table 1
D Matrix construction
RACE (categories) D2 D3 D4
White (1)
Black (2)
Hispanic (3)
Others (4)
0
1
0
0
0
0
1
0
0
0
0
1

The logistic model is then written this way:
( )
( )
( )
( )
{
0
2
0 3
4
1/ 1 1 0 0 0
1/ 2 1 1 0 0
Logit
1/ 3 1 0 1 0
1/ 4 1 0 0 1
D D
P Y Z
P Y Z
P Y Z
P Y Z
= =
= =
= +
= =
= =
14243

Thus, the explanatory variable Z with k categories is transformed into (k-1) design
variables, notated d
u
. If the first category is the reference, the logit is written:
( )
0
2
Logit 1/
k
u u
u
P Y Z d
=
= = +

For example, we obtain
( )
0 2 2 3 3 4 4
Logit 1/ P Y Z d d d = = + + +

with
d
u
= 1 if Z = u
d
u
= 0 otherwise

97
LOGISTIC REGRESSION WITH SPAD

Iterations number :
Specifies the maximum number of iterations to perform.
By default, Iterations number=25. If convergence is not attained in n iterations, the
displayed output created by the procedure contain results that are based on the last
maximum likelihood iteration.

Seuil alpha pour les tests (en %) :
Sets the level of significance for (100 )% confidence intervals for regression
parameters or odds ratios. The value must be between 0 and 100. By default, is
equal 5%.
Logistic Regression
98
Parameterization method for categorical variables
Consider a model with one categorical variable A with four categories, 1, 2, 5, and 7.

Comparison to mean
Three columns are created to indicate group membership of the nonreference
categories. For the reference category, all three design variables have a value of -1. For
instance, if the reference category is 7 (REF='7'), the design matrix columns for A are as
follows.
Comparison to mean Coding
Design Matrix
A A1 A2 A5
1 1 0 0
2 0 1 0
5 0 0 1
7 -1 -1 -1

Parameter estimates of a categorical variable main effects using the Comparison to
mean coding scheme estimate the difference in the effect of each nonreference
category compared to the average effect over all 4 categories.

GLM
Four columns are created to indicate group membership. The design matrix columns
for A are as follows.
GLM Coding
Design Matrix
A A1 A2 A5 A7
1 1 0 0 0
2 0 1 0 0
5 0 0 1 0
7 0 0 0 1

As in ANOVA, the last category coefficient is fixed to 0. Parameter estimates of a
categorical variable main effects using the GLM coding scheme estimate the difference
in the effects of each category compared to the last category.

Comparison to a reference
Three columns are created to indicate group membership of the nonreference
categories. For the reference category, all three design variables have a value of 0. For
99
instance, if the reference level is 7 (REF='7'), the design matrix columns for A are as
follows.
Comparison to a Reference Coding
Design Matrix
A A1 A2 A5
1 1 0 0
2 0 1 0
5 0 0 1
7 0 0 0

Parameter estimates of a categorical variable main effects using the Comparison to a
reference coding scheme estimate the difference in the effect of each nonreference
category compared to the effect of the reference category.

Variable selections :

The selection options are available only if the model contains simple factors (no
interaction).

No selection
The model is estimated with all the input variables, this is the default option.

Forward
The procedure first estimates parameters for factors forced into the model. These
factors are the intercepts and the first n explanatory factors in the model statement,
where n is the number specified by the Number of variables in initial model (n is
zero by default). Next, the procedure computes the score chi-square statistic for each
factor not in the model and examines the largest of these statistics. If it is significant at
the Threshold (%) for the variables entry in model level, the corresponding factor is
added to the model. Once a factor is entered in the model, it is never removed from the
model. The process is repeated until none of the remaining effects meet the specified
level for entry or until the Number of variables in final model value is reached.

Backward
Parameters for the complete model as specified in the model statement are estimated
unless the Number of variables in initial model option is specified. In that case, only
the parameters for the intercepts and the first n explanatory factors in the model
statement are estimated, where n is the Number of variables in initial model. Results
of the Wald test for individual parameters are examined. The least significant factor
that does not meet the Threshold (%) for a variable to stay in the model is removed.
Once a factor is removed from the model, it remains excluded. The process is repeated
until no other factor in the model meets the specified level for removal or until the
Number of variables in final model value is reached. Backward selection is often less
Logistic Regression
100
successful than forward or stepwise selection because the full model fit in the first step
is the model most likely to result in a complete or quasi-complete separation of
response values.

Stepwise
This option is similar to the FORWARD option except that factors already in the model
do not necessarily remain. Factors are entered into and removed from the model in
such a way that each forward selection step may be followed by one or more backward
elimination steps. The stepwise selection process terminates if no further factor can be
added to the model or if the factor just entered into the model is the only factor
removed in the subsequent backward elimination.

EXAMPLE BASED ON THE CREDIT.SBA DATASET

Response variable
1 . Type of client 2 CATEGORIES

Categorical explanatory variables:
2 . Age of client 4 CATEGORIES
3 . Family Situation 4 CATEGORIES
4 . Seniority 5 CATEGORIES
5 . Salary domiciliation 2 CATEGORIES
6 . Size of savings 4 CATEGORIES
7 . Profession 3 CATEGORIES
8 . Average outstanding 3 CATEGORIES
9 . Average transactions 4 CATEGORIES
10 . Number of withdrawals 3 CATEGORIES
11 . Overdraft 2 CATEGORIES
12 . Checkbook 2 CATEGORIES

101

Logistic Regression
102

REGRESSION LOGISTIQUE
MODEL PRESENTATION
MODEL DEFINITION
================
RESPONSE VARI ABLE . . . . . . . . . . . . . . . : Type of cl i ent
NUMBER OF RESPONSE LEVELS . . . . . . . : 2
NUMBER OF OBSERVATI ONS . . . . . . . . . . : 468
LI NK FUNCTI ON . . . . . . . . . . . . . . . . . . . : BI NARY LOGI T
OPTI MI ZATI ON TECHNI QUE . . . . . . . . . . : FI SHER' S SCORI NG
RESPONSE PROFILE
================
VARI ABLE RESPONSE : Type of cl i ent
==========================
ORDER RESPONSE FREQUENCY
- - - - - - - - - - - - - - - - - - - - - - - - - -
1 Good 237
2 Bad 231
==========================
PROBABI LI TY MODELED I S: Type of cl i ent = Good
DESCRITIVE STATISTICS FOR EXPLANATORY VARIABLES
===============================================
FREQUENCY DISTRIBUTION OF CATEGORICAL VARIABLES
========================================================================
Type of cl i ent
- - - - - - - - - - - - - - - - - -
VARI ABLE VALUE Good Bad TOTAL
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Seni or i t y 1 year or l ess 66 133 199
Fr om1 t o 4 year s 19 28 47
Fr om4 t o 6 year s 42 27 69
Fr om6 t o 12 year s 44 22 66
Over 12 year s 66 21 87
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Sal ar y domi ci l i at i on Sal . domi ci l i at ed 204 112 316
Sal . not domi ci l . 33 119 152
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Si ze of savi ngs No savi ng 169 201 370
Less t han 10 KF 34 24 58
Fr om10 t o 100 KF 26 6 32
Mor e t han 100 KF 8 0 8
THI S VARI ABLE I S PARTI ALLY NESTED I N THE RESPONSE VARI ABLE!
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Pr of essi on execut i ve 51 26 77
empl oyee 127 110 237
ot her 59 95 154
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Age of cl i ent Less t han 23 year s 31 57 88
Fr om23 t o 40 year s 71 79 150
Fr om40 t o 50 year s 68 54 122
Over 50 year s 67 41 108
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Fami l y Si t uat i on Si ngl e 80 90 170
Mar r i ed 129 92 221
Di vor ced 24 37 61
Wi dow 4 12 16
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Aver age out st andi ng Less t han 2 KF 19 79 98
Fr om2 t o 5 KF 168 140 308
Mor e t han 5 KF 50 12 62
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Aver age t r ansact i ons Less t han 10 KF 44 110 154
Fr om10 t o 30 KF 32 39 71
Fr om30 t o 50 KF 82 47 129
Mor e t han 50 KF 79 35 114
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Number of wi t hdr awal s Less t han 40 113 58 171
Fr om40 t o 100 87 74 161
Mor e t han 100 37 99 136
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Over dr af t Aut hor i zed 83 119 202
For bi dden 154 112 266
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Checkbook Aut hor i zed 231 184 415
For bi dden 6 47 53
========================================================================
NB : TO ALLOWCALCULATI ONS ONE CASE WI TH OPPOSI TE WAS AFFECTED WI TH EACH
LEVEL CAUSE OF PARTI AL NESTI NG!
103
RESULTS ABOUT THE MODEL
FITTING OF MODEL
CONVERGENCE CRI TERI OM ( . 1E- 07) SATI SFI ED
================================================
I NTERCEPT I NTERCEPT AND
ONLY COVARI ATES
================================================
AKAI KE CRI TERI OM 650. 752 460. 104
SCHWARZ CRI TERI OM 654. 900 567. 964
- 2 LOG ( L) 648. 752 408. 104
================================================
TESTING GLOBAL NULL HYPOTHESIS : BETA = 0
======================================================
CHI - SQUARE DF PROB > KHI 2
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
LI KELI HOOD RATI O 240. 6475 25 < 0. 0001
WALD 119. 1086 25 < 0. 0001
======================================================
TYPE III ANALYSIS OF EFFECTS
=======================================================
EFFECT DF WALD CHI - SQU PROB > CHI SQ
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Seni or i t y 4 23. 2572 0. 0001
Sal ar y domi ci l i at i on 1 25. 9650 < 0. 0001
Si ze of savi ngs 3 0. 6047 0. 8953
Pr of essi on 2 2. 3555 0. 3080
Age of cl i ent 3 8. 0984 0. 0440
Fami l y Si t uat i on 3 12. 6296 0. 0055
Aver age out st andi ng 2 6. 4046 0. 0407
Aver age t r ansact i ons 3 8. 0692 0. 0446
Number of wi t hdr awal s 2 21. 1787 < 0. 0001
Over dr af t 1 0. 2441 0. 6213
Checkbook 1 15. 6171 < 0. 0001
=======================================================
ANALYSIS OF MAXIMUM LIKELIHOOD ESTIMATES
==================================================================================================
PARAMETER DF ESTI MATE STAND. ERROR WALD CHI - SQU. PROB > CHI 2 EXP( ESTI M. )
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
I nt er cept 1 - 1. 3248 0. 5152 6. 6123 0. 0101 0. 2659
Seni or i t y 1 1 - 1. 0047 0. 2304 19. 0143 < 0. 0001 0. 3662
2 1 - 0. 1850 0. 3369 0. 3016 0. 5829 0. 8311
3 1 0. 7539 0. 3165 5. 6730 0. 0172 2. 1252
4 1 0. 0304 0. 3123 0. 0094 0. 9226 1. 0308
Sal ar y domi ci l i at i on 1 1 0. 7396 0. 1451 25. 9650 < 0. 0001 2. 0950
Si ze of savi ngs 1 1 0. 0430 0. 5466 0. 0062 0. 9374 1. 0439
2 1 0. 2895 0. 4440 0. 4250 0. 5145 1. 3357
3 1 0. 0220 0. 5631 0. 0015 0. 9688 1. 0223
Pr of essi on 1 1 0. 3516 0. 2681 1. 7197 0. 1897 1. 4213
2 1 - 0. 0442 0. 1853 0. 0570 0. 8113 0. 9567
Age of cl i ent 1 1 - 0. 7262 0. 2822 6. 6230 0. 0101 0. 4838
2 1 - 0. 0130 0. 2101 0. 0039 0. 9505 0. 9870
3 1 0. 4832 0. 2242 4. 6423 0. 0312 1. 6212
Fami l y Si t uat i on 1 1 0. 9222 0. 2983 9. 5593 0. 0020 2. 5147
2 1 0. 2492 0. 2639 0. 8918 0. 3450 1. 2830
3 1 - 0. 6348 0. 3555 3. 1889 0. 0741 0. 5300
Aver age out st andi ng 1 1 - 0. 8553 0. 3446 6. 1612 0. 0131 0. 4252
2 1 0. 0486 0. 2946 0. 0272 0. 8690 1. 0498
Aver age t r ansact i ons 1 1 - 0. 5518 0. 2245 6. 0422 0. 0140 0. 5759
2 1 - 0. 1342 0. 2564 0. 2741 0. 6006 0. 8744
3 1 0. 1469 0. 2183 0. 4527 0. 5010 1. 1582
Number of wi t hdr awal s 1 1 0. 9794 0. 2213 19. 5817 < 0. 0001 2. 6629
2 1 0. 0606 0. 1804 0. 1127 0. 7371 1. 0624
Over dr af t 1 1 - 0. 0660 0. 1336 0. 2441 0. 6213 0. 9361
Checkbook 1 1 1. 0448 0. 2644 15. 6171 < 0. 0001 2. 8427
==================================================================================================
Logistic Regression
104
ODDS RATIO ESTIMATES
=========================================================================
EFFECT ESTI MATE CONFI DENCE LI MI TS *
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Seni or i t y 1 VS 5 0. 244 0. 109 0. 548
2 VS 5 0. 554 0. 200 1. 538
3 VS 5 1. 417 0. 535 3. 755
4 VS 5 0. 687 0. 263 1. 798
Sal ar y domi ci l i at i on 1 VS 2 4. 389 2. 485 7. 752
Si ze of savi ngs 1 VS 4 1. 488 0. 101 22. 004
2 VS 4 1. 904 0. 150 24. 208
3 VS 4 1. 457 0. 126 16. 898
Pr of essi on 1 VS 3 1. 933 0. 816 4. 577
2 VS 3 1. 301 0. 745 2. 271
=========================================================================
=========================================================================
EFFECT ESTI MATE CONFI DENCE LI MI TS *
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Age of cl i ent 1 VS 4 0. 374 0. 146 0. 962
2 VS 4 0. 764 0. 350 1. 668
3 VS 4 1. 255 0. 585 2. 690
Fami l y Si t uat i on 1 VS 4 4. 300 0. 851 21. 734
2 VS 4 2. 194 0. 455 10. 579
3 VS 4 0. 906 0. 166 4. 960
Aver age out st andi ng 1 VS 3 0. 190 0. 041 0. 882
2 VS 3 0. 469 0. 114 1. 922
Aver age t r ansact i ons 1 VS 4 0. 336 0. 154 0. 732
2 VS 4 0. 510 0. 219 1. 188
3 VS 4 0. 676 0. 325 1. 404
Number of wi t hdr awal s 1 VS 3 7. 534 3. 164 17. 939
2 VS 3 3. 006 1. 419 6. 366
Over dr af t 1 VS 2 0. 876 0. 519 1. 479
Checkbook 1 VS 2 8. 081 2. 867 22. 779
=========================================================================
* 95%WALD CONFI DENCE LI MI TS
CONFUSION MATRIX
FREQUENCI ES
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
| ESTI M Good Bad | TOTAL
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
OBSERV Good | 191 45 | 236
Bad | 38 194 | 232
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TOTAL | 229 239 | 468
ROWPERCENTAGES
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
OBSERV Good | 80. 932 19. 068 | 100. 000
Bad | 16. 379 83. 621 | 100. 000
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TOTAL | 48. 932 51. 068 | 100. 000
COLUMN PERCENTAGES
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
OBSERV Good | 83. 406 18. 828 | 50. 427
Bad | 16. 594 81. 172 | 49. 573
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TOTAL | 100. 000 100. 000 | 100. 000
CLASSI FI CATI ON
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
| CLASS. WELL BAD | TOTAL
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
OBSERV Good | 80. 932 19. 068 | 100. 000
Bad | 83. 621 16. 379 | 100. 000
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TOTAL | 82. 265 17. 735 | 100. 000

The Discriminant and its methods
105

THE DISCRIMINANT AND ITS METHODS

FUWILD - OPTIMAL DISCRIMINANT ANALYSIS

Purpose

This method is the branch and bound algorithm of Furnival and Wilson (1974).
The FUWILD procedure selects the N ''best'' adjustments for the linear discriminant
analysis. The selection criteria could be the R2, the adjusted R2 or the Cp of Mallows.

If N is the number of the best adjustments required and P is the number of explanatory
variables of the model, the procedure calculates the N best adjustments for all sizes of
models from 1 to P-1 variables (the adjustment with the P variables is unique).

The procedure supplies the value of the criterion (R2, R2 adjusted or Cp), Fisher's F
associated with R2, the critical probability associated with this F, and the corresponding
test value.

The list of the variables of the model is then presented with the estimated coefficients, the
null tests, the critical probability and the associated test value. Finally, a diagram
representing the evolution of the criterion as a function of the number of the variables in
the models supplies a quick summary of the selections.

Dataset

The dataset is extracted from a survey where 100 respondents judge their suppliers. The
criteria are :
Delivery time
Prices level
Prices flexibility
Image
Services
Commercial image
Product quality

About the suppliers, we know the size of the company in two classes: more or less than 50
employees.

The goal of the analysis is to study the differences between these two classes.
FUWILD - Optimal Discriminant Analysis
106
ID
Delivery
Time
Prices
Level
Prices
Flexibility
Image Services
Commercial
Image
Product
Quality
Supplier's
Company Size
1 4,1 0,6 6,9 4,7 2,4 2,3 5,2 <50 employees
2 1,8 3 6,3 6,6 2,5 4 8,4 >=50 employees
3 3,4 5,2 5,7 6 4,3 2,7 8,2 >=50 employees
4 2,7 1 7,1 5,9 1,8 2,3 7,8 >=50 employees
5 6 0,9 9,6 7,8 3,4 4,6 4,5 <50 employees
6 1,9 3,3 7,9 4,8 2,6 1,9 9,7 >=50 employees
7 4,6 2,4 9,5 6,6 3,5 4,5 7,6 <50 employees
8 1,3 4,2 6,2 5,1 2,8 2,2 6,9 >=50 employees
9 5,5 1,6 9,4 4,7 3,5 3 7,6 <50 employees
10 4 3,5 6,5 6 3,7 3,2 8,7 >=50 employees
11 2,4 1,6 8,8 4,8 2 2,8 5,8 <50 employees
12 3,9 2,2 9,1 4,6 3 2,5 8,3 <50 employees
13 2,8 1,4 8,1 3,8 2,1 1,4 6,6 >=50 employees
14 3,7 1,5 8,6 5,7 2,7 3,7 6,7 <50 employees
15 4,7 1,3 9,9 6,7 3 2,6 6,8 <50 employees
16 3,4 2 9,7 4,7 2,7 1,7 4,8 <50 employees
17 3,2 4,1 5,7 5,1 3,6 2,9 6,2 <50 employees
18 4,9 1,8 7,7 4,3 3,4 1,5 5,9 <50 employees
19 5,3 1,4 9,7 6,1 3,3 3,9 6,8 <50 employees
20 4,7 1,3 9,9 6,7 3 2,6 6,8 <50 employees
21 3,3 0,9 8,6 4 2,1 1,8 6,3 <50 employees
22 3,4 0,4 8,3 2,5 1,2 1,7 5,2 <50 employees
23 3 4 9,1 7,1 3,5 3,4 8,4 <50 employees
24 2,4 1,5 6,7 4,8 1,9 2,5 7,2 >=50 employees
25 5,1 1,4 8,7 4,8 3,3 2,6 3,8 <50 employees
26 4,6 2,1 7,9 5,8 3,4 2,8 4,7 <50 employees
27 2,4 1,5 6,6 4,8 1,9 2,5 7,2 >=50 employees
28 5,2 1,3 9,7 6,1 3,2 3,9 6,7 <50 employees
29 3,5 2,8 9,9 3,5 3,1 1,7 5,4 <50 employees
30 4,1 3,7 5,9 5,5 3,9 3 8,4 >=50 employees
31 3 3,2 6 5,3 3,1 3 8 >=50 employees
32 2,8 3,8 8,9 6,9 3,3 3,2 8,2 <50 employees
33 5,2 2 9,3 5,9 3,7 2,4 4,6 <50 employees
34 3,4 3,7 6,4 5,7 3,5 3,4 8,4 >=50 employees
35 2,4 1 7,7 3,4 1,7 1,1 6,2 >=50 employees
36 1,8 3,3 7,5 4,5 2,5 2,4 7,6 >=50 employees
37 3,6 4 5,8 5,8 3,7 2,5 9,3 >=50 employees
38 4 0,9 9,1 5,4 2,4 2,6 7,3 <50 employees
39 0 2,1 6,9 5,4 1,1 2,6 8,9 >=50 employees
40 2,4 2 6,4 4,5 2,1 2,2 8,8 >=50 employees
41 1,9 3,4 7,6 4,6 2,6 2,5 7,7 >=50 employees
42 5,9 0,9 9,6 7,8 3,4 4,6 4,5 <50 employees
43 4,9 2,3 9,3 4,5 3,6 1,3 6,2 <50 employees
44 5 1,3 8,6 4,7 3,1 2,5 3,7 <50 employees
45 2 2,6 6,5 3,7 2,4 1,7 8,5 >=50 employees
46 5 2,5 9,4 4,6 3,7 1,4 6,3 <50 employees
47 3,1 1,9 10 4,5 2,6 3,2 3,8 <50 employees
48 3,4 3,9 5,6 5,6 3,6 2,3 9,1 >=50 employees
49 5,8 0,2 8,8 4,5 3 2,4 6,7 <50 employees
50 5,4 2,1 8 3 3,8 1,4 5,2 <50 employees

107

108
Fuwil 4

The Fuwil 4 excel sheet gives the main statistics of each class regarding to the

The column Within-group mean displays the means of each explanatory variable
respectively for the group 1 and 2. By default, the group 1 is the first category (in the list)
of the endogenous variable. In this example, the group 1 concerns small suppliers (< 50
employees) and the group 2 bigger suppliers (50 or more employees)
The column General mean displays the mean of each variable observed on the total set.

Missing data handling for exogenous variables
Missing values are replaced by within-groups means
Group Variable label
Within-group
mean
General mean
Number of
missing
values
1 Delivery Time 4,192 3,515 0
1 Prices Level 1,948 2,364 0
1 Prices Flexibility 8,622 7,894 0
1 Image 5,213 5,248 0
1 Services 3,050 2,916 0
1 Commercial Image 2,692 2,665 0
1 Product Quality 6,090 6,971 0
2 Delivery Time 2,500 3,515 0
2 Prices Level 2,988 2,364 0
2 Prices Flexibility 6,803 7,894 0
2 Image 5,300 5,248 0
2 Services 2,715 2,916 0
2 Commercial Image 2,625 2,665 0
2 Product Quality 8,293 6,971 0

This table is useful to detect the variables with the main average differences between the
class and the overall sample.
For example, the class number 2 (suppliers with more than 50 employees), obtains an
average quality score of 8.293, while the class number 1 obtains a score of 6.090.
The Image variable does not differentiate small suppliers than bigger ones.

With the DEMOD procedure (Descriptive statistics), we would get these results :

109

110
The R Criterion
Curve of R according to the number of explanatory variables

This graph displays the evolution of the R criterion according to the number of
explanatory variables included in the model. Higher is the R, better is the adjustment.
The R increases automatically with the number of explanatory variables.
Therefore, it is recommended to find a compromise between the best R and the smallest
model in terms of explanatory variables.
Some other criterions are available in the parameters tab such as : R adjusted, Mallows
CP.

The graph below shows that the R increases until the entry of the 4
th
explanatory variable;
then adding some other variables do not increase the R and the adjustments quality :
these variables are redundant.
The R can be interpreted as the part of variance explained by the linear discriminant
function. It goes from 0 to 1.
Curve of R2 accordind to the number of variables
Number of
model's
variables
Value of R2
7
6
5
4
3
2
1
0.43 0.45 0.48 0.50 0.53 0.55 0.57 0.60 0.62 0.65 0.67

The excel sheets 1 var to 7 vars display the 3 best adjustments regarding to the R for
models from 1 to 7 explanatory variables.
111
1 var

This table lists the 3 best adjustments (R) with one single explanatory variable.

Adjustments with 1 variable + constante DDL(Student) = 98
R**2 = 0.4680
Fisher = 86.2000
Test-Value = 7.845
Product Quality -0,4337 9,28 0,000 7,85
R**2 = 0.4173
Fisher = 70.1912
Test-Value = 7.258
Prices Flexibility 0,4683 8,38 0,000 7,26
R**2 = 0.3977
Fisher = 64.7156
Test-Value = 7.032
Delivery Time 0,4799 8,04 0,000 7,03

The number of degrees of freedom is 98.

The first adjustment is the best one, with a R of 0.468 ; this means that the between group
variance (between the two classes) represents 46.8% of the overall variance. A model that
is unable to differentiate the two classes is given a 0 R.

The Fisher statistic corresponds to the global model validation.
Higher is the between group variance, higher is the Fisher statistic. This criterion follows a
Fisher law with 1 and 98 degrees of freedom.
The 86.2 Fisher statistic corresponds to a probability lower than 1/10000 (0.0000).
The model is acceptable. This probability is converted into a test-value, here 7.85.

The column coefficient contains the estimation of the coefficient Product Quality: the
function discriminant D is written : D = constant 0.4337 x Product Quality.

The Student column test the nullity of the coefficient Product Quality : this statistic
follows a student law with 98 degrees of freedom; the 9.28 value corresponds to a
probability lower than 1/10000 (0.0000). The coefficient is significantly different from 0.

The probability is also converted into a test value, here we obtain 7.85. As the model
contains one single explanatory variable, test values of the coefficient and the overall
quality adjustment are equal.
112
6 vars

The two following adjustments contain both 6 explanatory variables.

Adjustments avec 6 variables + constante DDL(Student) = 93
R**2 = 0.6718
Fisher = 31.7290
Test-Value = 9.210
Delivery Time 0,3005 1,12 0,264 1,12
Prices Level 0,1242 0,45 0,656 0,44
Services -0,2308 0,45 0,657 0,44
Commercial Image 0,1516 1,85 0,067 1,83
Product Quality -0,2812 5,90 0,000 5,42
R**2 = 0.6716
Fisher = 31.6987
Test-Value = 9.207
Delivery Time 0,1863 3,27 0,002 3,17
Prices Level 0,0070 0,11 0,910 0,11
Image -0,0328 0,37 0,711 0,37
Product Quality -0,2790 5,87 0,000 5,40
R**2 = 0.6716
Fisher = 31.6925
Test-Value = 9.206
Delivery Time 0,1844 2,35 0,021 2,31
Image -0,0317 0,36 0,722 0,36
Services 0,0029 0,02 0,980 0,02
Product Quality -0,2779 5,88 0,000 5,41

For the first adjustment, the variables Prices Flexibility and Product Quality are the
only ones significant to 5% (probability that the related coefficient is null, lower than 5%).

113
3 Vars

Finally, we should search the best adjustments in the models with 3 or 4 explanatory
variables, where all the coefficients are significant and the models test values are the
highest.

R**2 = 0.6591
Fisher = 61.8789
Test-Value = 9.660
Delivery Time 0,2031 3,64 0,000 3,51
Product Quality -0,2592 5,79 0,000 5,35
R**2 = 0.6392
Fisher = 56.6932
Test-Value = 9.378
Services 0,2206 2,68 0,009 2,63
Product Quality -0,3097 7,12 0,000 6,36
R**2 = 0.6338
Fisher = 55.3919
Test-Value = 9.303
Product Quality -0,3323 7,46 0,000 6,61

114
The R adjusted criterion
Curve of R adjusted according to the number of explanatory variables

additional explanatory variable that is used to build the model. To increase this criterion,
the entry of a new variable needs to be sufficient (if the variable is redundant with the
ones already included in the model, the criterion decreases).

The graph below shows that the best models have to be found in the ones with 3 or 4
Curve of R2 ajusted accordind to the number of variables
Number of
model's
variables
Value of R2 ajusted
7
6
5
4
3
2
1
0.42 0.45 0.47 0.49 0.52 0.54 0.56 0.59 0.61 0.63 0.66

4 vars

The firs adjustment with 4 explanatory variables is the following:

R2AJ = 0.6574
Fisher = 48.4911
Test-Value = 9.612
Delivery Time 0,1840 3,28 0,001 3,18
Product Quality -0,2788 6,13 0,000 5,61

The R adjusted is about 0.6574; very close to the standard R of 0.6711. The explanatory
variables are meaningful, thus the penalty related to the R adjusted is very small.
115
The Mallows CP criterion
Curve of Mallows CP according to the number of explanatory variables

Lower is this criterion, better is the adjustment. We get the same results than with the
previous criterions, the best models have 3 or 4 variables.
Curve of Mallows Cp accordind to the number
of variables
Number
of
model's
variables
Value of Mallows Cp
7
6
5
4
3
2
1
0.00 0.05 0.11 0.16 0.22 0.27 0.32 0.38 0.43 0.49 0.54

4 vars

C(P) = 2.2916
Fisher = 48.4607
Test-Value = 9.610
Delivery Time 0,1840 3,28 0,001 3,18
Product Quality -0,2788 6,13 0,000 5,61

116
Formulas of the criterions R, R adjusted and Mallows Cp

4. R :
The coefficient of determination R2 (which takes values in the range 0 to 1) is a measure
of the proportion of the total variation that is associated with the regression process:

1
SSE
R
SST
=
SSE : Error Sum of Squares
SST : Total Sum of Squares.

5. R adjusted :
additional explanatory variable that is used to build the model.
( 1)(1 )
1
( )
n R
R
n p

=

n : the number of observations,
p : the number of variables used for the model plus one.

6. Mallows CP - C(p) :
The Mallows C(p) is positively related to the error (SSE) and the number of
explanatory variables in the model :a model with a lot of variables or with a high error
will be penalized by this criterion.

( ) 2
SSE
C p p n
SST
= +

References :

Furnival, G.M. and Wilson, R.W. (1974), Regression by Leaps and Bounds
Technometrics, 16, 499 -511.

117

DIS2GD - LINEAR DISCRIMINANT ANALYSIS BASED ON
CONTINUOUS VARIABLES

This procedure executes a linear discriminant analysis with two groups on continuous
variables, using Fisher's classical method.
The procedure provides bootstrap estimates of the bias and the precision of the principal
results of the discrimination : coefficients, case classification probabilities, and global
percentage classifications. It allows the modification of the costs and a priori probabilities
of classification in the groups. It manages base, test and anonymous cases.

The procedure outputs in advance the descriptive statistics on the variables of the model
in each of the two groups. The discriminant analysis results follow: classification tables,
discriminant function, results of the equivalent regression, and output of assignment to
cases.

If a bootstrap validation is required, the results of the discrimination are output again with
the bootstrap estimates. In particular, the bias and the precision of the global classifications
are shown facing the direct classifications. For anonymous cases, the procedure calculates
the bootstrap probability of their assignment.

If an evaluation of the case tests is required, the procedure will output the results of the
discrimination for these cases. If the assignment of anonymous cases is requested, only the
display of the assignments is shown.

The procedure can archive the rules for the discriminant function so that they can be
applied later on another file with the same structure.

DIS2GD - Linear Discriminant Analysis based on
118

119
Dis2g 3

The following table describes the differences observed between the two classes, regarding
to the input explanatory variables.

Analyse discriminante linaire sur l'chantillon DE BASE
Description des chantillons
Libell de la variable
G1 :
< 50 salaris [ 60]
G2 :
>= 50 salaris [ 40]
T de
Student
Probabilit
Dlais de livraison 8.045 0.000
Moyenne 4.192 2.500
Ecart-type 1.029 1.006
Minimum 2.100 0.000
Maximum 6.100 4.900
Flexibilit des prix 8.378 0.000
Moyenne 8.622 6.803
Minimum 5.100 5.000
Maximum 10.000 8.500
Qualit du produit 9.284 0.000
Moyenne 6.090 8.293
Minimum 3.700 6.200
Maximum 8.500 10.000

The first group G1 corresponds to the suppliers with less than 50 employees. There are 60
in the sample.
The second group G2 corresponds to the suppliers with 50 or more employees, there are
40.

SPAD displays the means, standard deviations, minima and maxima for each explanatory
variable by group.

The Student T column corresponds to the test that the two means of the two groups are
equal for each explanatory variable. We reject this hypothesis for the three variables
because the associated probabilities are lower than 1/10000.

The product quality is perceived as significantly higher for the suppliers with more than
50 employees (average score of 8.29 against 6.09).
Reversely, delivery times and prices flexibility are better for smaller suppliers.
120
Dis2g 4

This table displays all the correlation matrices associated with the discriminant analysis.

Correlation matrix
Correlation matrix on group 1 : < 50 employees (Cont = 60)
Delivery
Time
Prices
Flexibility
Product
Quality
Delivery Time 1,00
Prices Flexibility 0,32 1,00
Product Quality -0,17 0,04 1,00
Correlation matrix on group 2 : >= 50 employees (Cont = 40)
Delivery
Time
Prices
Flexibility
Product
Quality
Delivery Time 1,00
Prices Flexibility -0,12 1,00
Product Quality 0,07 -0,16 1,00
Within-group common correlation
Delivery
Time
Prices
Flexibility
Product
Quality
Delivery Time 1,00
Product Quality -0,09 -0,01 1,00
Total correlation
Delivery
Time
Prices
Flexibility
Product
Quality
Delivery Time 1,00
Product Quality -0,48 -0,45 1,00

The first two correlation matrices display the correlations between explanatory variables
inside each group. For example, the correlation between delivery time and prices
flexibility is 0.32 for the group 1 and 0.12 for the group 2.
These two matrices allow us to determine redundancies between explanatory variables:
this is not the case in this example.
121
Dis2g 6

Classification table of the discriminant analysis

Result of the FISHER linear discriminant analysis on sample: TRAIN
Table of groups counts
Assignment
group: < 50
employees
Assignment
group: >= 50
employees
Total
Original group: < 50 employees 50 10 60
Original group: >= 50 employees 4 36 40
Classification table (counts and percentages)
Well
classified
Misclassified Total
83,33 16,67 100,00
90,00 10,00 100,00
Total 86 14 100
86,00 14,00 100,00

The adjustment presents a good classification rate on the current set: 50 of the 60 small
suppliers and 36 of 40 big suppliers, respectively 83% and 90%.

Globally, the good classification rate is 86% = (50+36)/100.

122
Dis2g 9

This table displays the characteristics of the linear discriminant function :

Linear discriminante function
R2 = 0.65913 Fisher = 61.87877 Probability = 0.0000
D2 (Mahalanobis) = 7.89599 T2 (Hotelling) = 189.50369 Probability = 0.0000
Variable label
Correlations
with D.L.F.
(Threshold =
0.201)
D.L.F.
coefficients
Regression
coefficients
Standard
deviation
(Regression)
T de Student
(regression)
Probability
Delivery Time 0,632 1,191760 0,203073 0,0558 3,6373 0,0004
Prices Flexibility 0,648 1,390700 0,236972 0,0521 4,5482 0,0000
Product Quality -0,686 -1,521000 -0,259174 0,0448 5,7880 0,0000
CONSTANT -3,774790 -0,777758 0,5981 1,3005 0,1966

The R is 0.659; it means that the between group variance (that expresses the differences
between the two groups) represents 65.9% of the total variance.

The Fisher statistic corresponds to the global model validation.
Higher is the between group variance, higher is the Fisher statistic. This criterion follows a
Fisher law with 1 and 96 degrees of freedom.
The 61.87 Fishers statistic corresponds to a probability lower than 1/10000 (0.0000).
The model is acceptable.

D is the Mahalanobis distance between the two groups. This distance takes into account
the relationships between explanatory variables (the common correlation matrix).

The T of Hotelling is a generalization of the Student test when we have more than one
explanatory variable. It tests the hypothesis that all the means are equal.
In this example, T of Hotelling is 189.503 ; the associated probability is lower than 1/1000:
differences between means are significant.

For each explanatory variable, SPAD displays its correlation with the F.L.D. (Linear
Discriminant Function). The threshold of 0.201 corresponds to the limit where we consider
a correlation as significant (the threshold is given in absolute value).
The correlations between each explanatory variable and the linear discriminant function
are significant and quite close: the linear discriminant function is a good well-balanced
compromise between these three variables.

The F.L.D. coefficients give the model equation: therefore the best linear combination of
the 3 explanatory variables to separate the two groups is the following:
S1(x) = 1.191 x Delivery Time + 1.39 x Prices Flexibility 1.52 x Product Quality 3.77.

This equation gives high scores to suppliers that provide good delivery times and prices flexibility
(group 1, < 50 employees), and low scores for suppliers that have good quality products (group 2,
>= 50 employees).
123
Of course, the following equation is equivalent to the previous one but inverses the sign of
scores :
S2(x) = - 1.191 x Delivery Time - 1.39 x Prices Flexibility + 1.52 x Product Quality
+ 3.77.
The suppliers hierarchy is not modified.

The regression' coefficients column is redundant with the discriminant function
coefficients column : they are proportional.
Linear discriminant analysis based on two groups is a particular case of multiple regressions.

This equation : S3(x) = - 0.203 x Delivery Time + 0.236 x Prices Flexibility
0.299 x Product Quality 0.777
Is still equivalent to the two previous ones.

The Students T and the associated probabilities are calculated from the regression
coefficients, but are valid for the discriminant function coefficients because of the
proportionality.
The Students T are the rate between the regression coefficient and their standard
deviation: for example, 3.63 = 0.203 / 0.558.

Thus, we can see that our three coefficients are significant at 1% but not the constant term.

124
BOOTSTRAP Estimations: Dis2g - 12 and Dis2g - 13

SPAD provides a Bootstrap validation for all its discriminant functions : the purpose is to
simulate by resampling several samples to calculate for each one an adjustment. In this
example, we have chosen 250 samples.

At the end, we obtain 250 estimations for the classification table and for the coefficients of
the linear discriminant function.

The good classification and misclassification rates are calculated as an average of the 250
estimations. It is the same for the coefficients.

Dis2g - 12

Discriminant analysis by bootstrap estimations: 250 random samples
Classification table (Counts and percentages)
Training
sample - Well
classified
Training
sample -
Misclassified
Bootstrap -
Well
classified
Bootstrap -
Misclassified
Total
Original group: < 50 employees 50,00 10,00 49,53 10,47 60,00
83,33 16,67 82,55 17,45 100,00
Original group: >= 50 employees 36,00 4,00 35,78 4,22 40,00
90,00 10,00 89,45 10,55 100,00
Total 86,00 14,00 85,31 14,69 100,00
86,00 14,00 85,31 14,69 100,00

Dis2g 13

Bootstrap estimations for linear discriminant function
Variable label
Correlations
with D.L.F.
(Mean)
Standard
deviation
D.L.F
coefficients
(Mean)
Standard
deviation
Mean /
Standard
deviation
Delivery Time 0,637 0,051 1,296 0,379 3,418
Prices Flexibility 0,648 0,064 1,500 0,513 2,924
Product Quality -0,691 0,038 -1,633 0,327 4,996
CONSTANT -4,163 4,680 0,889

125
Dis2g 11

In this excel sheet, SPAD displays for each case their observed group, their assigned
group, the probability of being assigned to this group by the model and their discriminant
score.

The column Group of origin gives for each case has to be compared with the Group
assignment column. If the model is right, SPAD prints '=='.

The Fisher function or score is calculated by the model with the following equation:
S(x) = 1.191 x Delivery Time + 1.39 x Prices Flexibility 1.52 x Product Quality 3.77.

For example, for the case n79, the score 7.767 is calculated this way : (Delivery Time 1.00,
Prices Flexibility 7.1, and Product Quality 9.9)
-7.767 = 1.197 x 1.00 + 1.39 x 7.1 1.52 x 9.9 3.77.

Cases are listed by decreasing scores. Thus the case n79 gets the lowest score and
therefore the highest probability of assignment to the group 2 (50 and more employees).
Reversely, cases with high scores have higher probability of assignment to the group 1
(lower than 50 employees).

For each case, SPAD calculates the probabilities to be assigned to each group and assigns
the case to the group with the highest probability. The indifference point (equal
probabilities for the two groups) corresponds here to a zero Fisher score; it does not
appear in this example

The assignment probability is obtained from the Fisher Score (S(x)):

)) ( exp( 1
)) ( exp(
) / (
1
x S
x S
x G P
+
= and then ) / ( 1 ) / (
1 2
x G P x G P =

Sample: TRAINING
List of group assignments and related probabilities
Case identifier
Original
group
Assignment
Assignment
probability
Fisher
function
Individu n 79 >=50 == 1,000 -7,767
Individu n 39 >=50 == 1,000 -7,716
Individu n 65 >=50 == 1,000 -7,661
Individu n 93 <50 >=50 0,877 -1,962

Individu n 88 <50 >=50 0,873 -1,932
Individu n 84 <50 >=50 0,848 -1,720
Individu n 87 >=50 <50 0,640 0,577

Individu n 13 >=50 <50 0,687 0,788
Individu n 85 >=50 <50 0,690 0,802
Individu n 25 <50 == 1,000 8,623

Individu n 42 <50 == 1,000 9,763
Individu n 5 <50 == 1,000 9,882

DIS2GFP - Linear Discriminant Analysis
based on Principal Factors
126

DIS2GFP - LINEAR DISCRIMINANT ANALYSIS
BASED ON PRINCIPAL FACTORS

General principles

This procedure outputs a linear discriminant analysis with two groups on the factorial
coordinates from a NOT NORMED principal components analysis using the classical
Fisher method.

It provides bootstrap estimates of the bias and the precision of the principal results of the
discrimination: coefficients, case classification probabilities, global classification
percentages. It also allows the modification of the a priori costs and probabilities of the
classification in the groups. It provides the management of the base cases, of the test cases
and of the anonymous cases.

The procedure offers a print preview of the descriptive statistics of the model variables in
each of the two groups. Next the results of the discriminant analysis are shown:
classification tables, discriminant function, and output of the assignment of cases.

The decision rule is finally expressed as a function of the original variables. The results of
the regression equivalent are only indicative, since the classical hypotheses of normality
are meaningless in this context.

If a bootstrap validation is requested, the results of the discrimination are repeated with
the bootstrap estimates. In particular, the bias and the precision of the global classifications
are shown with the direct classifications. For anonymous cases, the procedure calculates
their bootstrap probability assignment.

If an evaluation of the test cases is required, the procedure outputs the results of the
discrimination relative to these cases. If the assignment of anonymous cases is required,
only the assignments are output.

The procedure can archive the rules for the discriminant function so they can be applied
later to another file of the same structure.

127

128
Dis2g 1

This first excel sheet displays the studied model: the variable to explain is the same than in
the previous methods (Suppliers company size), the explanatory variables are the
principal factors get from the principal component analysis based on all the continuous
variables available in the dataset except Satisfaction index.
By default, SPAD assigns the prefix F and the number corresponding to each factor.
F1 , F2 , . We ordered SPAD to process this analysis on the 7 factors, that is to say
99.99% of the total inertia

Model : V8=F1+F2+F3+F4+F5+F6+F7
Variable number Variable label
8 Supplier's Company Size
1 F 1
2 F 2
3 F 3
4 F 4
5 F 5
6 F 6
7 F 7

EI GENVALUES
+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| 1 | 81. 8822 | 91. 04 | 91. 04 | *********************************************|
| 2 | 4. 0759 | 4. 53 | 95. 58 | **** |
| 3 | 1. 4053 | 1. 56 | 97. 14 | ** |
| 4 | 1. 2298 | 1. 37 | 98. 51 | ** |
| 5 | 0. 7842 | 0. 87 | 99. 38 | * |
| 6 | 0. 3903 | 0. 43 | 99. 81 | * |
| 7 | 0. 1617 | 0. 18 | 99. 99 | * |
| 8 | 0. 0081 | 0. 01 | 100. 00 | * |
+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

129
Dis2g 6 : Classification Table

Assignment
group: < 50
employees
Assignment
group: >= 50
employees
Total
Well
classified
Misclassified Total
90,00 10,00 100,00
100,00 0,00 100,00
Total 94 6 100
94,00 6,00 100,00

The adjustment presents good classification rate on this sample: it assigns correctly 54 of
60 small suppliers and all the big ones, respectively 90% and 100%.

The global good classification rate is 94% = (54+40)/100.

Comparison with the model of the previous chapter

We can notice that this model obtains better results than the previous one that only used
three predictors (Delivery time, Prices flexibility and Quality product).

Classification table of the previous model :

Assignment
group: < 50
employees
Assignment
group: >= 50
employees
Total
Well
classified
Misclassified Total
83,33 16,67 100,00
90,00 10,00 100,00
Total 86 14 100
86,00 14,00 100,00

In our current model, we have kept all the available information (almost all the
explanatory variables and all the factors), it is normal to get better results.
130
Dis2g 9 : results of the model based on principal factors

Axe label
Correlations
with D.L.F.
(Threshold =
0.201)
D.L.F.
coefficients
Regression
coefficients
Standard
deviation
(Regression)
T de Student
(regression)
Probability
F 1 -0,380 -0,291099 -0,041896 0,0062 6,7769 0,0000
F 2 -0,651 -2,234360 -0,321575 0,0277 11,6055 0,0000
F 3 0,240 1,403290 0,201965 0,0472 4,2798 0,0000
F 4 0,036 0,226999 0,032670 0,0504 0,6477 0,5188
F 5 0,028 0,221504 0,031879 0,0632 0,5046 0,6150
F 6 0,278 3,090510 0,444793 0,0895 4,9676 0,0000
F 7 -0,101 -1,747540 -0,251510 0,1391 1,8078 0,0739
CONSTANT 1,009960 0,000000 0,0559 0,0000 1,0000

The R is 0.7120 ; it means that the between group variance represents 71.20 % of the total
variance.
The Fisher statistic is 32.50 corresponding to a probability lower than 1/10000 (0.0000).
Thus, the model is accepted.

All the statistics displayed in the above table are described in the previous section page 19.

We can see that the factors 4 and 5 present coefficient none significantly different from
zero (probabilities 0.5188 and 0.6150). The factor 7 presents also a probability greater than
0.05.

The coefficients of the Linear Discriminant Function give the following equation:
S1(x) = - 0.291 x F1 - 2.23 x F2 + 1.40 x F3 + 0.226 x F4 + 0.221 x F5 + 3.09 x F6
- 0.14 x F7 + 1.0099.

Dis2g 10 : Fisher linear discriminant function rebuilt,
starting from original variables

This excel sheet is the most interesting for the user because it displays the equation model
based on the original variables and no longer on the principal factors.
Thus, we find the variables Delivery Time , Prices flexibility with strong positive
coefficients.

To understand the coefficients, we have to remember that the equation opposites the two
groups by giving high scores to the small suppliers and low scores to the bigger ones. By
default, SPAD always gives high scores to the first category (in the list) of the endogenous
variable.

131
Remark : coefficients calculation on the original variables

SPAD displays in the table below the linear discriminant function based on original
variables; it has been calculated from the linear discriminant function based on the
principal factors. We know that each principal factor is a linear combination of the original
variables.
The coefficients of these combinations are available in the PCA outputs in the column
called Normed eigenvectors.

Normed eigenvectors
Label variable Axis 1 Axis 2 Axis 3 Axis 4 Axis 5 Axis 6 Axis 7
Delivery Time -0,10 -0,28 0,26 -0,09 -0,74 0,36 -0,04
Prices Level -0,01 0,48 0,03 -0,47 0,35 0,49 -0,07
Prices Flexibility -0,09 -0,40 -0,25 0,49 0,33 0,65 -0,01
Image -0,03 0,26 0,69 0,41 0,11 0,07 0,52
Services -0,06 0,11 0,16 -0,29 -0,18 0,42 -0,03
Commercial Image -0,02 0,15 0,40 0,31 0,07 -0,04 -0,85
Product Quality 0,04 0,65 -0,45 0,43 -0,42 0,12 0,02
Frequency of use -0,99 0,07 -0,06 -0,02 0,03 -0,12 0,01

The factor 1 can be calculated this way :
F1 = -0.10 x "Delivery Time" -0.01 x Prices level" - 0.9 x "Prices flexibility"
- 0.03 x "Image" -0.06 x "Services" 0.02 x "Commercial Image"
+ 0.04 x "Product Quality" 0.99 x "Frequency of use".

and so on for all the factors. Starting from these equations, SPAD can assign a
coefficient to each original variable.

FISHER linear function rebuilt starting original variables
Variable label Category label
D.L.F.
coefficients
Regression
coefficients
Standard
deviation
(Regression)
T de Student
(regression)
Probability
Delivery Time 2,018560 0,290515 0,0588 4,9366 0,0000
Prices Level 0,590870 0,085039 0,0573 1,4851 0,1410
Image -0,198512 -0,028570 0,0831 0,3437 0,7319
Services 1,257900 0,181039 0,0430 4,2083 0,0001
Commercial Image 1,674080 0,240937 0,1206 1,9979 0,0487
Product Quality -1,764960 -0,254017 0,0453 5,6085 0,0000
Frequency of use -0,327111 -0,047079 0,0130 3,6132 0,0005
CONSTANTE -9,065840 -1,450130

The linear discriminant function equation is the following :

D1 = 2.02 x Delivery Time + 0.59 x Prices level + 2.77 x Prices Flexibility
0.20 x Image + 1.26 x Services + 1.67 x Commercial Image 1.76 x Product Quality
0.33 x Frequency of use 9.07.

The variables Image et Prices Level are not significant (respective probabilities of
0.7319 et 0.141). The small contribution of the variable Image is not surprising, we get
132
the same result than the ones obtained with the automatic characterization. (see table
below)
About the variable Prices level , it is surprising to find it not significant in the model
while it appears significant in the automatic characterization.
This is due to the correlations existing between the explanatory variables: the prices level
is related to the variables Delivery Time , Prices Flexibility These variables tend to
reduce the specific effect due to the prices level.

Characterisation by continuous variables of categories of
Supplier's Company Size
< 50 employees (Weight = 60.00 Count = 60 )
Category
mean
Overall mean
Category Std.
deviation
Overall Std.
deviation
Prices Flexibility 8,622 7,894 1,154 1,380 6,43 0,000
Delivery Time 4,192 3,515 1,029 1,314 6,27 0,000
Frequency of use 48,767 46,100 8,724 8,944 3,63 0,000
Services 3,050 2,916 0,584 0,747 2,18 0,014
Commercial Image 2,692 2,665 0,859 0,767 0,42 0,336
Image 5,213 5,248 1,281 1,126 -0,38 0,354
Prices Level 1,948 2,364 1,018 1,190 -4,26 0,000
Product Quality 6,090 6,971 1,282 1,577 -6,81 0,000
>= 50 employees (Weight = 40.00 Count = 40 )
Category
mean
Overall mean
Category Std.
deviation
Overall Std.
deviation
Product Quality 8,293 6,971 0,918 1,577 6,81 0,000
Prices Level 2,988 2,364 1,156 1,190 4,26 0,000
Image 5,300 5,248 0,838 1,126 0,38 0,354
Commercial Image 2,625 2,665 0,601 0,767 -0,42 0,336
Services 2,715 2,916 0,905 0,747 -2,18 0,014
Frequency of use 42,100 46,100 7,690 8,944 -3,63 0,000
Delivery Time 2,500 3,515 1,006 1,314 -6,27 0,000
Prices Flexibility 6,803 7,894 0,879 1,380 -6,43 0,000

133
Simplified Model : dis2g - 9 et dis2g - 10

We modify our previous model by keeping only the significant principal factors:1, 2, and 3
et 6.
The results are listed below :
Axe label
Correlations
with D.L.F.
(Threshold =
0.201)
D.L.F.
coefficients
Regression
coefficients
Standard
deviation
(Regression)
T de Student
(regression)
Probability
F 1 -0,380 -0,279138 -0,041896 0,0062 6,7436 0,0000
F 2 -0,651 -2,142560 -0,321575 0,0278 11,5484 0,0000
F 3 0,240 1,345630 0,201965 0,0474 4,2588 0,0000
F 6 0,279 2,963530 0,444793 0,0900 4,9432 0,0000
CONSTANT 0,951688 0,000000 0,0562 0,0000 1,0000

Factors are orthogonal, then the Student statistics do not change except the rounding error
: we keep the same hierarchy and the same relative importance of the factors.
The new linear discriminant function is now written :
S1(X) = - 0.27 x F1 - 2.14 x F2 +1.34 x F3 + 2.96 x F6 + 0.95.

D.L.F.
coefficients
Regression
coefficients
Standard
deviation
(Regression)
T de Student
(regression)
Probability
Delivery Time 2,043930 0,306772 0,0353 8,6795 0,0000
Prices Level 0,472917 0,070980 0,0461 1,5395 0,1272
Image 0,572879 0,085983 0,0340 2,5320 0,0131
Services 1,261200 0,189292 0,0390 4,8564 0,0000
Commercial Image 0,103654 0,015557 0,0196 0,7920 0,4304
Product Quality -1,669530 -0,250579 0,0300 8,3423 0,0000
Frequency of use -0,297403 -0,044637 0,0128 3,4890 0,0007
CONSTANTE -8,387220 -1,401670

We find the same opposition between the characteristic variables of the small suppliers
(Delivery Time and Prices Flexibility) and the bigger ones (Product Quality).

The variable Commercial Image is still not significant, but the variable Image
becomes significant. Moreover, its positive coefficient indicates a characteristic of the small
companies. However, it is recommended to interpret this result with care, because the
automatic characterization shows that small suppliers have a lower image score than the
big ones (average of 5.21 compared to 5.3). This is due to the correlations existing between
variables. Working on a restricted number of factors was not sufficient to erase them.
Finally, by eliminating non significant variables, principal factors, or variables whose the
coefficients sign is not coherent, we get back to the model of the previous chapter with the
following variables Delivery Time , Prices Flexibility and Product Quality .
Even if it discriminates less good than the other models studied in this chapter, we may
keep this one because of its coherence regarding to the relative contributions and effects
signs.
DISCO - Discriminant Analysis
based on Qualitative variables
134

DISCO - DISCRIMINANT ANALYSIS
BASED ON QUALITATIVE VARIABLES

SCORE - SCORING FUNCTION

With SPAD, building a scoring function requires the following steps:

- Firstly, we determine the most discriminant variables regarding to the endogenous
variables (The DEMOD and MSMOD procedures)
- Then, we perform a Multiple Correspondence Analysis (MCA) on the qualitative
variables selected.
- We perform a linear discriminant analysis based on the factorial coordinates extracted
from the Multiple Correspondence Analysis.
- Then, we rebuilt the discriminant function starting from the original qualitative
variables.
- We normalize the coefficients of each explanatory category to get only zero or positive
scores. The maximum score is defined by the user (100, 1000).
- Then, each case is assigned a score regarding to its profile.

NB : The steps 2 and 3 are implemented in the DISCO procedure of the scoring chain.

The SPAD scoring method performs a multiple correspondence analysis for the following
reasons :

- The linear discriminant analysis is a method that requires only input continuous
variables.
- The MCA transforms qualitative variables into continuous factorial coordinates that
can be used for the discriminant analysis.
- The factorial coordinates are orthogonal and we are liberated from the multicolinearity
problems.
- At last, the factorial coordinates selection optimizes the results.

135
To illustrate the scoring methodology of SPAD, we use the CREDIT.SBA dataset.

The main goal is to discriminate bad customers from good customers (Target variable
GOOD_BAD) regarding to all their bank and sociodemographic profiles.
The final goal is of course to build decision rules to assign to new customers.

Extraction from the dataset CREDIT.SBA :

GOOD_BAD AGE MARITAL SENIORITY SALARY SAVINGS JOB
GOOD GE 50 years Single GT 12 years SALARY AT THE BANK No saving Employee
GOOD LT 23 years Single LE 1 year SALARY AT THE BANK No saving Employee
BAD GE 23 LT 40 years Widowed GT 6 LT 12 years SALARY AT THE BANK No saving Employee
GOOD GE 23 LT 40 years Divorced GT 1 LE 4 years SALARY AT THE BANK LT 10 KF Sav. Employee
GOOD LT 23 years Single GT 6 LT 12 years NO SALARY No saving Employee
GOOD GE 23 LT 40 years Single LE 1 year SALARY AT THE BANK No saving Employee
GOOD GE 50 years Married GT 6 LT 12 years SALARY AT THE BANK No saving Executive
GOOD GE 50 years Married GT 12 years SALARY AT THE BANK No saving Executive
GOOD GE 40 LT 50 years Single GT 1 LE 4 years SALARY AT THE BANK No saving Employee
GOOD GE 50 years Single GT 4 LE 6 years SALARY AT THE BANK No saving Employee
GOOD GE 50 years Married GT 12 years SALARY AT THE BANK No saving Employee
GOOD GE 40 LT 50 years Married LE 1 year NO SALARY LT 10 KF Sav. Executive
GOOD GE 23 LT 40 years Single GT 4 LE 6 years NO SALARY No saving Other
GOOD GE 23 LT 40 years Married GT 6 LT 12 years SALARY AT THE BANK No saving Employee
GOOD GE 40 LT 50 years Divorced GT 4 LE 6 years NO SALARY LT 10 KF Sav. Executive
BAD GE 40 LT 50 years Divorced GT 6 LT 12 years SALARY AT THE BANK No saving Employee
GOOD GE 50 years Single GT 12 years SALARY AT THE BANK No saving Other
BAD GE 50 years Widowed GT 12 years SALARY AT THE BANK No saving Other

The dataset CREDIT.SBA has 468 cases and 12 qualitative variables.
QUALITATIVE VARIABLES
1. GOOD_BAD ( 2 categories )
2. AGE ( 4 categories )
3. MARITAL STATUS ( 4 categories )
4. SENIORITY ( 5 categories )
5. SALARY ( 2 categories )
6. SAVINGS ( 4 categories )
7. JOB ( 3 categories )
8. CHECKING ACCOUNT ( 3 categories )
9. AVERAGE TRANSACTIONS ( 4 categories )
10. WITHDRAWALS ( 3 categories )
11. NEGATIVE ACCOUNT BALANCE ( 2 categories )
12. CHEQUE AUTHORIZATION ( 2 categories )

Before building the scoring function, it is recommended to start by descriptive statistics
such as the STATS, DEMOD and MSMOD procedures.
SCORE - Scoring Function
136
THE SCORING FAVOURITE

We create a new chain using the Predefined Chain command from the general Chain
Menu.

In the favourites tab, select the Scoring rubric and double click on
Discriminant analysis on categorical variables and scoring.

SPAD displays the following methods in the diagram.

Import the dataset credit.sda by using the
SPAD Data Archive File import
method.

Icons methods are grey because you have to
configure them.

The SCORING parameters will be
defined by default.

137
DISCO PARAMETERS

The configuration of the DISCO procedure start by defining the model to build : the
endogenous variable and the qualitative exogenous variables :

The model is the following :

V1 = V2 + + V12

In this Model tab, we can
specify the real model, i.e. built
on the factorial coordinates
extracted from the MCA.

To proceed, click on the button
Calculation Options

We decide to build the complete
model, with all the factorial
coordinates.

Click on OK to go back the
Model tab et again Ok to
finish the Disco configuration.

Run the methods.
138
Right-click on the discriminant method icon to access the results.

Starting by the complete model allows us to keep in a second step only the significant
factors.

We visualize and select the
factorial axes that really
discriminate the target variable.

To do this, we use the ratio
Coefficient/StDev, that can be
interpreted as a Students T.

We could keep all the axes with
an absolute ratio greater than
1.96.

139
DISCO RESULTS

Linear Discriminant Function

Model
V1=F1+F2+F3+F4+F5+F6+F7+F8+F9+F10+F11+F12+F13+F14+F15+F16+F17+F18+F19+F20
+F21+F22+F23+F24+F25

Linear discriminant function
Axis label
Correlations
with D.L.F.
(Threshold
= 0.093)
D.L.F.
coefficients
Regression
coefficients
Standard
deviation
(Regression)
Ratio
Coefficient /
ST.
Deviation
F 1 -0,475 -3,228700 -0,950022 0,0729 -13,0262
F 2 0,290 2,342510 0,689267 0,0867 7,9474
F 3 0,104 0,897833 0,264181 0,0925 2,8551
F 4 0,170 1,532160 0,450828 0,0967 4,6611
F 5 -0,007 -0,072457 -0,021320 0,1057 -0,2018
F 6 -0,057 -0,571836 -0,168259 0,1077 -1,5617
F 7 -0,022 -0,227015 -0,066797 0,1099 -0,6076
F 8 0,061 0,641800 0,188845 0,1130 1,6705
F 9 0,139 1,515070 0,445797 0,1173 3,8017
F 10 -0,045 -0,502921 -0,147981 0,1192 -1,2411
F 11 0,004 0,051269 0,015086 0,1224 0,1233
F 12 -0,028 -0,319744 -0,094082 0,1237 -0,7605
F 13 -0,030 -0,356309 -0,104841 0,1279 -0,8197
F 14 -0,070 -0,847106 -0,249255 0,1300 -1,9170
F 15 0,045 0,567041 0,166848 0,1350 1,2364
F 16 0,002 0,023938 0,007043 0,1359 0,0518
F 17 -0,017 -0,219652 -0,064631 0,1405 -0,4599
F 18 -0,105 -1,389350 -0,408807 0,1425 -2,8691
F 19 0,049 0,676453 0,199041 0,1487 1,3381
F 20 -0,008 -0,119744 -0,035234 0,1546 -0,2279
F 21 -0,074 -1,071810 -0,315374 0,1553 -2,0303
F 22 0,024 0,367523 0,108141 0,1624 0,6659
F 23 0,068 1,151150 0,338719 0,1819 1,8622
F 24 -0,061 -1,190570 -0,350316 0,2089 -1,6768
F 25 0,019 0,608556 0,179063 0,3351 0,5343
CONSTANT 0,018039 0,000000 0,0364 0,0000

The factors with a ratio whose the absolute value is greater than 1,96 are displayed in bold.
These factors are to be included in the optimal model.

To build this model, we need to return to the Disco configuration and click on the button
Calculation Options .

140
NEW CONFIGURATION OF THE DISCO METHOD

We have specified the
optimal model to use for
building the discriminant
function and in a second
step the scoring function.

The optimal model is built
with the following
factors :

F1 F4, F9, F18 et F21.

We have to re-run the
chain.

Now that the optimal model is available, we want to partition the dataset into two subsets
: one to perform the analysis, the other one to confirm and validate the analysis. This part
is called validation. We talk about Learning set and Testing set or test-cases in the
following tab.

In this example, we
choose to select randomly
25 % of the cases to test
the model based on the 75
% remaining cases.

Validation is very useful
for testing that the model
does not overfit the data
and has a good predictive
power.

141
THE DISCO RESULTS

To measure the prediction performance of the model, we read the following classification
table :
Result of the FISHER linear discriminant analysis on sample: TRAINING
Assignment
group:
GOOD
Assignment
group: BAD
Total
Original group: GOOD 150 28 178
Original group: BAD 35 138 173
Well
classified
Misclassified Total
84,27 15,73 100,00
79,77 20,23 100,00
Total 288 63 351
82,05 17,95 100,00

Result of the FISHER linear discriminant analysis on sample: TEST
Assignment
group:
GOOD
Assignment
group: BAD
Total
Well
classified
Misclassified Total
84,75 15,25 100,00
63,79 36,21 100,00
Total 87 30 117
74,36 25,64 100,00

On the TRAINING SET, 82.05 % of the cases are well classified.
On the TESTING SET, 74.36 % of the cases are well classified.

The built model presents a good predictive power on both sets. It does not overfit the
training set and looks reproductible.

Another validation way would be to use bootstrapping.

142

Linear discriminant function
Axis label
Correlations
with D.L.F.
(Threshold =
0.093)
D.L.F.
coefficients
Regression
coefficients
Standard
deviation
(Regression)
Ratio
Coefficient /
ST. Deviation
F 1 -0,475 -3,070890 -0,950022 0,0733 -12,9600
F 2 0,290 2,228010 0,689267 0,0872 7,9070
F 3 0,104 0,853949 0,264181 0,0930 2,8406
F 4 0,170 1,457270 0,450828 0,0972 4,6374
F 9 0,139 1,441010 0,445797 0,1179 3,7824
F 18 -0,105 -1,321450 -0,408807 0,1432 -2,8545
F 21 -0,074 -1,019430 -0,315374 0,1561 -2,0200
CONSTANT 0,015909 0,000000 0,0366 0,0000

D.L.F.
coefficients
Regression
coefficients
Standard
deviation
(Regression)
Ratio
Coefficient /
ST. Deviation
Age of client Less than 23 years -4,413170 -1,365270 0,4690 -2,9112
From23 to 40 years 1,944030 0,601412 0,2750 2,1873
From40 to 50 years 1,169940 0,361938 0,2642 1,3698
Over 50 years -0,425731 -0,131706 0,3968 -0,3319
Family Situation Single 0,009629 0,002979 0,3449 0,0086
Married 1,427100 0,441492 0,2180 2,0249
Divorced -3,502810 -1,083640 0,1992 -5,4407
Widow -6,459620 -1,998370 0,9437 -2,1177
Seniority 1 year or less -6,138460 -1,899020 0,3039 -6,2495
From1 to 4 years -8,631250 -2,670200 0,3826 -6,9787
From4 to 6 years 9,017780 2,789770 0,5665 4,9243
From6 to 12 years 2,082220 0,644162 0,4131 1,5592
Over 12 years 9,972050 3,084990 0,6592 4,6798
Salary domiciliation Sal. domiciliated 4,923760 1,523230 0,1359 11,2049
Sal. not domicil. -10,236200 -3,166720 0,2826 -11,2049
Size of savings No saving -1,401220 -0,433488 0,0864 -5,0190
Less than 10 KF 3,659600 1,132150 0,2840 3,9865
From10 to 100 KF 6,438820 1,991940 0,6526 3,0524
More than 100 KF 12,519100 3,872970 1,1221 3,4515
Profession executive 3,238490 1,001870 0,3962 2,5287
employee 2,657760 0,822213 0,1786 4,6033
other -5,709430 -1,766290 0,2101 -8,4074
Average outstanding Less than 2 KF -12,409300 -3,839000 0,4235 -9,0644
From2 to 5 KF 2,567540 0,794304 0,1589 4,9987
More than 5 KF 6,859870 2,122200 0,4772 4,4468
Average transactions Less than 10 KF -3,598390 -1,113210 0,3180 -3,5007
From10 to 30 KF -0,489471 -0,151425 0,2069 -0,7318
From30 to 50 KF 1,643420 0,508415 0,4045 1,2571
More than 50 KF 3,306170 1,022810 0,2583 3,9605
Number of withdrawals Less than 40 6,128640 1,895980 0,2382 7,9587
From40 to 100 -0,076000 -0,023512 0,2219 -0,1060
More than 100 -7,615890 -2,356080 0,3085 -7,6361
Overdraft Authorized -1,481820 -0,458423 0,3886 -1,1795
Forbidden 1,125290 0,348125 0,2951 1,1795
Checkbook Authorized 1,654080 0,511713 0,0658 7,7814
Forbidden -12,951800 -4,006810 0,5149 -7,7814
CONSTANT 0,015909 0,000000

143
THE SCORING FUNCTION

The SCORE procedure transforms the FLD coefficients by using the two following rules :

Minimum coefficient for each variable : for each categorical variable, the smallest
coefficient is set to the value zero. The minimum score possible for a case is zero. It is
obtained for a case who, for each variable, presents the assigned category to zero.

Maximum possible of the score function : the value for the maximum possible score is
chosen by the user (for example 1000). This maximum corresponds to the sum of the
largest transformed coefficients for each variable.

The score attributed to a case is obtained by adding the transformed coefficients associated
with the categories of the case. The transformed score function classifies the cases in the
same way as the initial discriminant function.

144
THE SCORE CONFIGURATION

Click OK and run the method.

Parameter to modify
eventually, for assigning a
target category to low
To tick for creating a file
containing the Decision
Rules to be applied on new
145
THE SCORING RESULTS
Coefficients of the Discriminant and Score functions
Age of client
Category label
Linear
discriminant
function
coefficient
Score
function
coefficient
Less than 23 years -4,413 0,00
From23 to 40 years 1,944 49,66
From40 to 50 years 1,170 43,62
Over 50 years -0,426 31,15
Family Situation
Category label
Linear
discriminant
function
coefficient
Score
function
coefficient
Single 0,010 50,54
Married 1,427 61,61
Divorced -3,503 23,10
Widow -6,460 0,00
Seniority
Category label
Linear
discriminant
function
coefficient
Score
function
coefficient
1 year or less -6,138 19,47
From1 to 4 years -8,631 0,00
From4 to 6 years 9,018 137,88
From6 to 12 years 2,082 83,69
Over 12 years 9,972 145,33
Salary domiciliation
Category label
Linear
discriminant
function
coefficient
Score
function
coefficient
Sal. domiciliated 4,924 118,43
Sal. not domicil. -10,236 0,00
Size of savings
Category label
Linear
discriminant
function
coefficient
Score
function
coefficient
No saving -1,401 0,00
Less than 10 KF 3,660 39,54
From10 to 100 KF 6,439 61,25
More than 100 KF 12,519 108,75
Profession
Category label
Linear
discriminant
function
coefficient
Score
function
coefficient
executive 3,238 69,90
employee 2,658 65,37
other -5,709 0,00
Average outstanding
Category label
Linear
discriminant
function
coefficient
Score
function
coefficient
Less than 2 KF -12,409 0,00
From2 to 5 KF 2,568 117,00
More than 5 KF 6,860 150,53
Average transactions
Category label
Linear
discriminant
function
coefficient
Score
function
coefficient
Less than 10 KF -3,598 0,00
From10 to 30 KF -0,489 24,29
From30 to 50 KF 1,643 40,95
More than 50 KF 3,306 53,94
Number of withdrawals
146
OPTIMAL SCORING PILOT

Double-clicking on the following icon opens the Optimal Scoring Pilot interface.
Click on New in the File menu to display the graph below.

The user can define a rate called the Classification Error tolerance Abbreviated as CET in
the parameters Tab of the Score method. In this example, we chose 10% by default.
This rate supports the calculation of regions on the score function scale :

The low boundary 528 has been chosen to assign 10.0% of the real good customers to the
weak scores group (misclassified) and the high boundary 655 to assign 9.7% of the real bad
to high scores group (misclassified).

These boundaries are shrinkable if the user wants to modify the misclassification rates.

Three regions are displayed on the graph :

A "green" region, which corresponds to the high scores (here the category GOOD), where
one expects to find the majority of the GOOD customers. In this region, a miss-classified
case is a BAD customer assigned to GOOD because of its high score. The boundary is
calculated to contain a rate of miss-classified cases that does not exceed the CET.
In this example: 10.0% of the real BAD customers are assigned to this region and 62.4% of the
real GOOD are well assigned.

147
A "red" region of low scores containing most of the cases in BAD category - and therefore
correctly classified - and a percentage not exceeding the CET of cases of GOOD and
therefore miss-classified.
In this example : 64.5% of the real BAD are well assigned and 9.7% of the real GOOD are
misclassified.

An intermediary Orange region between the boundaries of the red and green regions,
where group assignment is left undecided. This region of indecision shrinks when the user
increases the CET.
In this example : 25.5% of the real BAD and 27.8% of the real GOOD are assigned to the
Orange region.

Sometimes, it is not necessary to keep this intermediary region, for direct marketing
campaigns. Then, by clicking on the Single score checkbox, we keep only two regions (Red
and Green) and one single boundary.

Modifying the boundaries by using the scores table
This part of the user interface allows us to modify manually the CET and
therefore the boundaries.

The Data view
This view is not available when the number of cases is greater than 10 000.

148
The fields of the data view are described below :

Identifier :
The cases identifier truncated to 40 characters.
Weight :
The weight defined in the Weighting Tab of the DISCO method; 1 by default.
Sample :
The set assignment for each case : learning or test set.
Score :
The score calculated for each case.
Group :
The original group (G1 or G2) of the case.
Assign. :
Displays the group assignment determined by the model (G1, NC or G2).
NC means that the case is not assigned, or assigned to the orange region.
Err. G1 - Error group 1 :
If the case belongs to the group 1 (G1) and is assigned to :
the group 1, no error
the orange zone, error coded (x)
the group 2, error coded (xx)
Err. G2 - Error group 2 :
If the case belongs to the group 2 (G2) and is assigned to:
the group 2, no error
the orange zone, error coded (x)
the group 1, error coded (xx)

Sort the data by a field:
Clicking on any field name allows us to :
- sort the data in increasing order,
- sort the data in decreasing order,
- return to the initial order.

The case profile :
By clicking on any case in the Data view, it is possible to :
- locate the case on the previous graph, the case is displayed in red with its
identifier, press Escape to return to the Data view
- display its profile in a condensed view
- display its questionnaire and scores associated, the cases categories and
associated scores are blue. We know its original group and its assignment.
149

Interactive simulations, after clicking on Questionnaire and score :
It is possible to simulate new score by clicking on the chosen categories. They
become red.

150
Density curves

This graph draws respectively the density curves of the real BAD and the real GOOD
customers.
151
Lift or Gain Curve

Horizontal axis : % of the scored cases selected, sorted by decreasing scores
Vertical axis : % of the target category captured by the selection

The optimal curve is the one in grey where the selection captured the entire target
category and only the target category.

152
ROC Curve (Receiver Operating Curve)

Sensitivity : Percentage of the target category captured (GOOD classified as GOOD)
Specificity : Percentage of the other category well-classified
1-Specificity : Percentage of the other category misclassified in the target category
(BAD misclassified as GOOD)

Closer is the curve to the upper left part of the graph, better is the separation between the
two categories of the target variables.
When the densities are equal, the ROC curve is confounded with the diagonal of the
square.

153
APPLY THE SCORING FUNCTION TO A NEW DATASET

Firstly, you need to archive the model rules using the method Predictive model rules file from
the Deployment Archiving\Archiving rubric. Connect this new method to the scoring method
as follows:

- Give a name to the rule file and specify its location
- Import the new dataset on which you want to apply the model you archived.
- Connect the method Predictive model deployment from the Deployment
Archiving\Deployment rubric and configure it.
- Run this new method and check the data view.

IDT 1 - Interactive Decision Tree 1
154

IDT 1 - INTERACTIVE DECISION TREE 1

IDT 2 - INTERACTIVE DECISION TREE 2

The IDT procedure produces decision trees from a data set. It is a discriminant procedure
for predicting the values of a categorical variable (variable to explain, with K groups) from
a set of explanatory variables that may be categorical, ordinal or continuous.
The IDT procedure gives the user a choice of three well-established methods in Data
Mining: CHAID, C4.5 and C&RT. The model produced by the method is a Decision Tree,
which can be evaluated with a test sample or by crossed validation. The procedure
includes additional information that lets you refine the results: integration of adjustment
with the a priori group inclusion probabilities, and the introduction of a cost matrix for
incorrect assignment.

The IDT procedure lets the user interactively manipulate the decision tree produced by the
method: pruning from the root, interactive segmentation of a node, and description of the
properties of a segmentation. The procedure also offers a fully interactive mode, in which
the construction of the tree is entirely based on the user's ideas. Several supporting tools (a
list of the best segmentations, descriptive statistics et al.) lets you choose the tree which
best corresponds to the problem to be solved.

At all stages of the design conceived by the user, it is possible to output the reports in
HTML format: on the complete decision tree, or locally on each node including a subset of
the database analyzed.

To illustrate this method, we use the same dataset than for the scoring function: the Credit
English.sba dataset.

155
MARGINAL DISTRIBUTIONS OF THE CATEGORICAL VARIABLES

GOOD_BAD JOB
Categories label Counts % Categories label Counts %
GOOD 237 50,64 Executive 77 16,45
BAD 231 49,36 Employee 237 50,64
Other 154 32,91
Overall 468 100,00
Overall 468 100,00
AGE
Categories label Counts % CHECKIN ACCOUNT
LT 23 years 88 18,80 Categories label Counts %
GE 23 LT 40 years 150 32,05 LT 2KF Account 98 20,94
GE 40 LT 50 years 122 26,07 GE 2 LT 5KF Account 308 65,81
GE 50 years 108 23,08 GE 5KF Account 62 13,25
Overall 468 100,00 Overall 468 100,00
MARITAL AVERAGE TRANSACTIONS
Single 170 36,32 LT 10 KF Trans. 154 32,91
Married 221 47,22 GE 10 LT 30 KF Trans 71 15,17
Divorced 61 13,03 GE 30 LT 50 KF Trans 129 27,56
Widowed 16 3,42 GE 50 KF Trans. 114 24,36
Overall 468 100,00 Overall 468 100,00
SENIORITY WITHDRAWALS
LE 1 year 199 42,52 LT 40 With. 171 36,54
GT 1 LE 4 years 47 10,04 GE 40 LT 100 With. 161 34,40
GT 4 LE 6 years 69 14,74 GE 100 With. 136 29,06
GT 6 LT 12 years 66 14,10
GT 12 years 87 18,59 Overall 468 100,00
Overall 468 100,00 NEGATIVE ACCOUNT BALANCE
Categories label Counts %
SALARY Allowed 202 43,16
Categories label Counts % Not allowed 266 56,84
SALARY AT THE BANK 316 67,52
NO SALARY 152 32,48 Overall 468 100,00
Overall 468 100,00 CHEQUE AUTHORIZATION
Categories label Counts %
SAVINGS CHEQUE OK 415 88,68
Categories label Counts % NO CHEQUE 53 11,32
No saving 370 79,06
LT 10 KF Sav. 58 12,39 Overall 468 100,00
GE 10 LT 100 KF Sav. 32 6,84
GE 100 KF Sav. 8 1,71
Overall 468 100,00

156
IDT 1

The IDT1 procedure prepares the data for the construction of the tree (Procedure IDT2). In
particular, it handles the missing data of the selected variables. The procedure outputs a
report of the treatment of the missing data.

You also have available by default an automatic characterization of the variable to
discriminate by the set of the selected explanatory variables.

This characterization gives you a better selection of the explanatory variables; for example,
by removing all of those that have no connection with the variable to discriminate.

157
IDT 2

The IDT2 procedure constructs an initial segmentation tree as a function of the chosen
method (CHAID, C&RT, C4.5) and the associated parameters.

After the execution of the procedure, to the right of the method you have available a
graphical icon to handle the initial tree.

The CHAID algorithm

The acronym CHAID stands for Chi-squared Automatic Interaction Detector. It is one of
the oldest tree classification methods originally proposed by Kass (1980; according to
Ripley, 1996, the CHAID algorithm is a descendent of THAID developed by Morgan and
Messenger, 1973). CHAID will "build" non-binary trees (i.e., trees where more than two
branches can attach to a single root or node), based on a relatively simple algorithm that is
particularly well suited for the analysis of larger datasets.

Chi-squared Automatic Interaction Detector derives from the basic algorithm that is used
to construct (non-binary) trees, which for classification problems (when the dependent
variable is categorical in nature) relies on the Chi-square test to determine the best next
split at each step. Specifically, the algorithm proceeds as follows:

Preparing predictors. The first step is to create categorical predictors out of any
continuous predictors by dividing the respective continuous distributions into two
categories. For categorical predictors, the categories (classes) are "naturally" defined.

Merging categories. The next step is to cycle through the predictors to determine for each
predictor the pair of (predictor) categories that is least significantly different with respect
to the dependent variable; for classification problems (where the dependent variable is
categorical as well), it will compute a Chi-square test (Pearson Chi-square). If the
respective test for a given pair of predictor categories is not statistically significant as
defined by an alpha-to-merge value, then it will merge the respective predictor categories
and repeat this step (i.e., find the next pair of categories, which now may include
previously merged categories). If the statistical significance for the respective pair of
predictor categories is significant (less than the respective alpha-to-merge value), then
(optionally) it will compute a Bonferroni adjusted p-value for the set of categories for the
respective predictor.

Selecting the split variable. The next step is to choose the split the predictor variable with
the smallest adjusted p-value, i.e., the predictor variable that will yield the most significant
split; if the smallest (Bonferroni) adjusted p-value for any predictor is greater than some
alpha-to-split value, then no further splits will be performed, and the respective node is a
terminal node.

Continue this process until no further splits can be performed (given the alpha-to-merge
and alpha-to-split values).

158

The CHAID algorithm that is particularly well suited for the analysis of larger datasets
and for a first exploration of the data.

159
The CART algorithm (written CR-T in SPAD because of the copyright)

C&RT builds classification and regression trees for predicting continuous dependent
variables (regression) and categorical predictor variables (classification). The classic C&RT
algorithm was popularized by Breiman et al. (Breiman, Friedman, Olshen, & Stone, 1984;
see also Ripley, 1996).

For classification, the CART algorithm uses the Gini impurity criterion. It is based on
squared probabilities of membership for each target category in the node. It reaches its
minimum (zero) when all cases in the node fall into a single target category.

CART is based on a decade of research, assuring stable performance and reliable results.
CART's proven methodology is characterized by:

Reliable pruning strategy CART algorithm considers that no stopping rule could be
relied on to discover the optimal tree, so CART integrates the notion of over-growing trees
and then pruning back; this idea, fundamental to CART, ensures that important structure
is not overlooked by stopping too soon.

Powerful binary-split search approach - CART's binary decision trees are more sparing
with data and detect more structure before too little data are left for learning. Other
decision-tree approaches use multi-way splits that fragment the data rapidly, making it
difficult to detect rules that require broad ranges of data to discover.

Automatic self-validation procedures - In the search for patterns in databases it is
essential to avoid the trap of "over fitting" or finding patterns that apply only to the
training data. CART's embedded test disciplines ensure that the patterns found will hold
up when applied to new data. Further, the testing and selection of the optimal tree are an
integral part of the CART algorithm.

In addition, CART accommodates many different types of modeling problems by
providing a unique combination of automated solutions:
- surrogate splitters intelligently handle missing values;
- adjustable misclassification penalties help avoid the most costly errors

The classification and regression trees (C&RT) algorithms are generally aimed at achieving
the best possible predictive accuracy. Operationally, the most accurate prediction is
defined as the prediction with the minimum costs. The notion of costs was developed as a
way to generalize, to a broader range of prediction situations, the idea that the best
prediction has the lowest misclassification rate.
In most applications, the cost is measured in terms of proportion of misclassified cases, or
variance. In this context, it follows, therefore, that a prediction would be considered best if
it has the lowest misclassification rate or the smallest variance. The need for minimizing
costs, rather than just the proportion of misclassified cases, arises when some predictions
that fail are more catastrophic than others, or when some predictions that fail occur more
frequently than others.

160

161
The C4.5 algorithm

C4.5 belongs to a succession of decision tree learners that trace their origins back to the
work of Hunt and others in the late 1950s and early 1960s (Hunt 1962). Its immediate
predecessors were ID3 (Quinlan 1979).
C4.5 uses the gain ratio to select the attribute whose value will be tested to decide how the
original training set will be partitioned. The gain ratio is a refinement of the information
gain, used in C4.5s precursor ID3.

This method presents two particularities:

- As for the CART algorithm, C4.5 integrates the notion of over-growing trees and then
pruning back. The good size of the tree is determined by pruning. In a first step, the
tree is completely developed with the information gain criterion, and then pruned in
order to minimize the misclassification rate.
- If the splitter of a parent node is a categorical variable, each category of the splitter
becomes a child node even if some of them are null.

C4.5 built large trees with lots of leaves. The calculations time is a bit longer.

162
Partitioning

By random sampling:
If you did not select a variable Samples definition in the IDT1 method, the choice of
the samples is done by random sampling. You have to define the percentages for the
learning set, the testing set and the pruning set (for CART only).

The parameter Random sampling initialization allows you to define different
samples from the same size.

By categories value:
If you have chosen a variable Samples definition in the IDT1 method, the categories
are listed in the window to be assigned a specific sample: learning, testing and pruning
for CART only.

163
IDT2 parameters

Type of analysis:

By default: Automatic

Automatic:
The tree is growth automatically with respect to the stopped criterions defined by the user.

Automatic and crossed validation:
The tree is growth automatically and the procedure evaluates the error by crossed validation.
In this case, you have to define the number of division or subsets to be used for the crossed-
validation.

Interactive :
The tree is not growth at all. In the graphic interface, the user can develop it manually.

164
Thresholds:

Minimum count for cutting (splitting) a segment (node): (by default: 5)
This parameter defines the minimum count for splitting a node. Behind this threshold,
no further split can be performed.
By increasing this parameter, one reduces the size of the tree.

Admissible count: (by default: 1)
This parameter defines the minimum count for a leave after a split.
By increasing this parameter, one reduces the size of the tree.

N umber of tree level: (by default: 10)
This parameter defines the depth of the tree.
By decreasing this parameter, one reduces the size of the tree.

Specification threshold: (by default: 0.9)
This parameter defines the threshold from where we consider a node to belong to a
single target category. Then, no further split can be performed.
When target categories present real unbalanced weights, it is recommended to choose 1.
By decreasing this parameter, one reduces the size of the tree.

Configure the method and run it. Right-click on the IDT2 method, choose the Results
command and click on Interactive decision tree Editor.

165
INTERACTIVE DECISION TREE EDITOR

Results windows

The tool for viewing the Decision Tree produced by the IDT procedure comprises several
windows grouped together in a tabs page. They correspond to several different levels of
information relative to the model constructed.

166
View the data

In a grid, this window shows all the data under analysis. Used together with the
information window on the roots, it lets you follow the path of a case in the Decision Tree.

You can copy the contents of the grid to the clipboard and then paste it into a spreadsheet
application.

The data grid has the format "Cases x Variables:
the leftmost column in grey corresponds to the identifier of individuals;
the next column contains the variable to explain;
the following columns represent the explanatory variables;
the column furthest on the right indicates the weight associated with each case.

167
View the Decision Tree

This window offers a graphical representation of the Decision Tree. You can adjust the
display scale with the Zoom in/Zoom out commands, or by clicking on the corresponding
icons in the Toolbar.

The tree is presented horizontally, starting from the root on the left and moving towards
the terminal node leaves on the right. Each node offers the distribution of the estimated
conditional probability of the predicted variable in absolute (real elements), and relative
(percentages). In the upper right of the window, a caption is available associating
categorical variables with the color codes used. Attention! If an adjustment is requested,
the tool shows the adjusted estimated probabilities. In the upper part of the node there is
shown the decision rule (variable -- operator -- value) related to the creation of a leaf node.

By clicking on a terminal node in the Tree, it is possible to obtain additional information,
supplied on the right side of the window: the full path from the root to the active terminal
leaf node, and the relevance of the candidate variables for the segmentation. The latter
may be sorted according to the name of the variable, or according to the value of the
quality of the segmentation (click on the List's header).

You can also explore further the subset of individuals circumscribed by the terminal node,
or control interactively your analysis.

168
Information on the roots

When you click on a specific node, you can carry out an in-depth analysis by shifting to
the Local exploration window via the Local exploration menu, or by clicking on the
corresponding tab.

Path information and the relevance of the variables are repeated.

It is also possible to view individual elements present in the selected node, together with
their values for each variable in the analysis. Note that for each root there corresponds a
conclusion assigned by the method: the individuals that do not correspond to this
conclusion are shown in red.

Finally, it is possible to go deeper into the analysis of the root by requesting, for each
variable, in the lower part of the window, descriptive statistics on the whole set of
individuals (the root of the Tree).

169
Information on the Decision Tree

This Window lets you judge the quality of the Decision Tree. The Window is divided into
several areas:

Characteristics of the Tree: shows the properties of the Decision Tree produced by the
method, as well as the number of nodes in the tree, the number of terminal node leaves
and its maximum depth. Also shown is the size of the sample used for training, for the
test and, if required, for the pruning.

Impact of the attributes: shows the role of each attribute in the elaboration of the Tree.
The value indicated represents the weighted mean of the impact of each attribute on all
the segmentation candidates. Less importance is given to the impacts measured on the
lower parts of the Tree.

Confusion Matrix: lists the confrontation between the predictions of the tree and the
values observed on the dependent variable to predict. The matrix may be measured on
the training sample, on the test sample, or in crossed validation. These last two options
are active if they were requested during the parameter setting for the procedure.

Profile: presents the current confusion matrix in the form of a row profile (e.g. to
measure sensitivities), or in the form of a column profile (to measure the specifics).

170
Explore and modify the Decision Tree

The originality of the IDT procedure rests largely on the fact that the user can explore and
edit the Tree: either by changing the tree produced by the induction procedure, or by
constructing it from zero using their expert knowledge.
Several tools are made available to the user, allowing them to set the properties of the
node segmentations, while letting them prune the parts of the Tree that are of little
interest.
The operators available that can be applied, either on the set of the nodes (leaves, more
generally), or on a previously selected node.

Operations on a node in the Tree
By right clicking on a node, you have access the context menu. According to the status of
the node (leaf or internal node), different options are available. The options let the user
specify precisely the Tree which will be best suited for the current analysis.

Two main operators are available: Prune for a node within the tree, and Segment for a leaf
node.

Prune a sub-Tree
Pruning a sub-tree consists in deleting the nodes and leaves located beneath a node that
has been previously selected. This operation is necessary when we consider that the
corresponding sub-tree does not add anything to the active analysis; or when we want to
manually induce another segmentation starting from the selected node.

Warning! This operation is only possible on the internal nodes of the Tree.
Procedure
1. Select the node from which you wish to start the pruning process
2. Right click and the Prune menu is available, if the node is not a leaf
3. Click on the Prune menu

171
Segment a Tree node
For each tree node, we have the list of candidate variables for segmentation, with their
respective impacts. At your convenience, you can sort these variables by name or by
relevance so as to recover the variables of interest.

A first originality of the IDT procedure is the ability given to the user to introduce the
segmentation that they seem the most pertinent, either by following suggestions of the
IDT method, or by choosing themselves the segmentation variable.

A second very useful innovation is the possibility given to the user to themselves change
the properties of a segmentation, by letting them introduce the discretization limit for a
continuous variable. For example, the method proposes, for a segmentation according to
age, to set the limit at the 17.5 age level. On the basis of their personal knowledge of the
problem, the user may decide to change this value and manually set a limit of 18 years,
corresponding to adulthood.

Segmentation is impossible in three specific cases:
the terminal node is not a leaf. In other words, it has already been segmented, and
already has child nodes;
the node is empty and there are no cases on the node;
The node is pure, which means that a single attribute of the variable to predict is
attached to the node, and in this case the decision rule is unambiguous, so it is
pointless to take the analysis any further.

Attention! In this setting the rules for halting the expansion of the tree are deactivated
(e.g., minimum elements on the node, specialization threshold etc.).

172
Change the properties of a segmentation
IDT lets the user select the variable the most relevant for a segmentation. It also lets the
user change the properties of the segmentation they have selected.

According to the type of the variable in play in the segmentation, the procedure changes:
change the discrete/continuous threshold for the continuous variables
change the re-grouping of the categorical variables (nominal or ordinals)

Procedure
The procedure for changing the properties of the segmentation has a part in common with
the procedure for manual segmentation.

1. Select the leaf node you want to segment
2. Right click on the leaf node - So that the Segment with... menu is active, the leaf node
must be a leaf and the segmentation must be possible.
3. In the dialogue box which appears, we see the list for explanatory variables candidates
and the segmentations they propose. The sort function of the variables respects the sort
function requested in the Decision Tree window.
4. To change the properties of the variable under selection, you must click on the Change
button
5. Depending on the type of the variable, one of two dialogue boxes appear:
for the continuous variables
6. the dialogue box indicates the variable on which we are working, and offers the
discretization limit used up until now
7. the user must then enter a new threshold. Attention! The edit area does not
accept numerical values, and the decimal point character is the full-stop.
8. now validate your new threshold by clicking on the OK button

for the categorical variables (nominal -- ordinals)
6. The dialogue box indicates, in the list on the left, the sub-trees (leaves coming
from the segmentation procedure) are output, and in the list on the right, the
levels available for developing the sub-trees
7. to change the content of a sub-tree, all its contents (the levels of the explanatory
variable) must be passed to the list on the right with the help of the ">>" button,
then transfer them, by default, to another sub-tree, with the help of the "<<"
button
8. you can add or delete a sub-tree with the help of the "+" et "-" buttons
9. when the changes have been completed, you must validate the new
segmentation with the help of the OK button
173

174
Edit a Tree by levels

The user can examine various options on each node. They also have the ability to
interactively edit the Decision Tree while moving through the hierarchical structure of the
nodes.

In this context, the procedure will carry out the operation requested on all the leaves
situated at the lowest level of the tree.

Two types of operation are available:
Move up one level: the procedure prunes all the nodes situated on the penultimate
level of the tree.
Go down one level: for each leaf situated on the last level of the tree, the procedure
looks for the most efficient segmentation. Warning! The rules for stopping the
expansion of the tree are deactivated at this level.

Procedure
According to the operation requested, click on the Operations menu -- Go up one level or
Operations -- Go down one level

Continue with an Automatic Analysis

At all stages during the exploration and editing of the Tree, the user has the option to
request the procedure to continue the construction of the model automatically; using the
options specified when the method's parameters were set. Users can, for example, identify
the segmentation which they find the most interesting on the tree root, then ask the
application to automatically continue the search for the best tree following this first cut.

All the options selected are active in this context, in particular the rules for halting the
expansion of the Tree.

Procedure
Make sure the Decision Tree is selected
Click on the menu for: Operations -- Automatic analysis
175
Save and Backup procedures

On the first execution of the procedure, the Decision Tree is saved with the title of Default
Analysis. The Tree is shown by default when the IDT procedure starts up.

You can freely edit and change the Tree supplied by default. The results can then be saved
in two different ways:

you can either save the Tree by overwriting the previous version,
or save a new version of the Tree for the same problem with a suitable title.

At anytime you can reload into IDT a decision tree that you have saved. The different
versions are identified by the titles you have given them.

Warning! Changing the analysis parameters automatically deletes all the saves carried
out for the problem analyzed. If you want to save your work permanently, you are
advised to use reports or exports.

Save the current tree
On the execution of the IDT procedure, a tree with the title Default Analysis is
automatically created. This is the tree shown when the IDT procedure starts up. The user
can personalize this tree, and save the results of their changes permanently.

In general, it is possible to save any tree on which the user is working.
Procedure
1. Click on the File Save menu
2. IDT deletes the old version of the tree and replaces it with the new one.

Save a new version of the Tree
When working on an analysis, the user may wish to work in parallel on several different
scenarios corresponding to multiple trains of thought: you therefore have the option to
save individual versions of the Tree with different names.
Procedure
The user wants to save a new version of the Tree, from which they have pruned a branch.

1. Proceed with pruning a part of the Tree
2. Then click on the File -- Save as... menu
3. A dialogue box appears, asking the user to give a new title to this new version of
the Tree. This operation is obligatory, since the different versions are
distinguished by their titles.
4. Click on the OK button

Warning! by clicking on File -- Save, the user overwrites the version in memory.
176
Load a Decision Tree
At any time in IDT, the user can load into memory a previously saved version of the tree.
For each version of the tree there is a title assigned by the user.

Procedure
1. Click on the menu File -- Open
2. A box lists the different versions associated with the current problem
3. Select the analysis you want by clicking on its title
4. And confirm by clicking on OK

Export Rules
Any Decision Tree may be transformed into a rules base without loss of information. A
rule is a path leading from the root to the given terminal leaf node. The conclusion
associated with the rule corresponds to the conclusion associated with the leaf node.

The rule is therefore of the form: If condition Then conclusion

IDT produces the list of rules associated with a tree in HTML format. Additional
information is provided: the support for the rule which corresponds to the number of
individuals concerned by the rule; this is an indicator of the reliability of the rule. The
confidence of the rule indicates the percentage of individuals correctly classified by the
rule. This is an indicator of the precision of the rule.
IDT produces also SQL rules and SPAD rules to be assigned to new datasets.

Procedure
Click on the menu File -- Export Rules in HTML format, or SPAD or SQL
Whether your previous choice, a dialog box appears in which you can enter the
name of the file generated and its location, or an SQL editor to generate Select or
Update rules.
Then click on the Save button

SPAD7 Data Miner Guide PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SPAD7 Data Miner Guide PDF

Uploaded by

Copyright:

Available Formats

22 quai gallieni - 92150 Suresnes - France

Tl : +33 1 57 32 60 60 - Fax : +33 1 57 32 62 00

The Discriminant and its methods

Individu n 93 <50 >=50 0,877 -1,962

Individu n 87 >=50 <50 0,640 0,577

Individu n 25 <50 == 1,000 8,623

You might also like