Professional Documents
Culture Documents
Multi-label classification is discussed in [6] and [7] mostly where wd (t) is a score for feature t in dialogue d.
for large collections of text documents. Particularly interesting Let 1 (d) be the set of themes discussed in dialogue d. A
is a technique called creation consisting in creating new com- first decision rule for automatically annotating dialogue d with
posite labels for each association of multiple labels assigned to a theme class label c is:
an instance.
This paper extends concepts found in the recent literature c 1 (d) = sc(d, c) > .sc(d, c) (4)
by introducing a version of bigram words that may have a gap of
one word. These features are used in decision strategies based where c := arg maxc C sc(d, c ); and [0; 1] is an
on a constrained application of the cosine similarity and on a empirical parameter whose value is estimated by experiments
new definition of soft theme density location in a conversation. on the development set.
If the score of c is too low, then the application of the above
rule is not reliable. To overcome this problem, the following
3. Application task and features additional rule is introduced:
The application task is about a customer care service (CCS). X
The purpose of the application is to monitor the effectiveness c 1 (d) = sc(d, c) > v sc(d, c ) (5)
of a call centre by evaluating proportions of problem items and c C
solutions. Application relevant information is described in the where v [0; 1] is another parameter whose value is esti-
application requirements focussing on an informal description mated by experiments on the development set.
of useful speech analytics not completely defined. The require-
ments and other application documentation essentially describe 4.1. Parameter estimation
the types of problems a CCS agent is entitled to solve.
This paper proposes a new approach for automatically an- The values of parameters and v have been estimated using 20
notating dialogues between an agent and a customer with one subsets of 98 dialogues each belonging to the development set
or more application theme labels belonging to the set C defined and selected with the same proportion of single and multiple
as follows: theme dialogues as in the development set. In order to estimate
C := { itinerary, lost and found, time schedules, transporta- the optimal values and v of these two parameters, the follow-
tion card, traffic state, fine, special offers }. ing decision rule has been applied:
Given a pair (d, c), where d X, a spoken dialogue in the
20
corpus X, is described by a vector vd of features and c C X
is a class corresponding to a theme described by a vector vc (
, v) = arg max F (1(,v) , Di ) (6)
(,v)[0;1]2 i=1
of features of the same type as vd , two classification methods,
indicated as 1 and 2 , are proposed for multiple theme classi- where F (1(,v) , Di ) is the F-score (defined in the follow-
fication. ing subsection) computed using model 1 with the estimated
With the purpose of increasing the performance of auto- values of and v in the subset Di D.
matic multiple theme hypothesization, bigrams were added to The optimal value of , the proportion of the highest score
the lexicon of 7217 words with the possibility of having also required for assigning themes to a dialogue, has been evaluated
distance bigrams made of pairs of words distant a maximum of to = 0.69, and the optimal value of v, the threshold required,
two words. The feature set increases to 160433 features with to v = 0.16.
this addition. In order to avoid data sparseness effects, a re-
duced feature set V was obtained by selecting features based on 4.2. Performance measures
their purity and coverage. Purity of a feature t is defined with The proposed approaches have been evaluated following pro-
the Gini criterion as follows: cedures discussed in [7] with measures used in Information Re-
X 2 X dfc (t) 2 trieval (IR) and accuracy as defined in the following for a corpus
G(t) = P (c|t) = (1) X and a decision strategy .
cC cC
dfT (t) Of particular importance is the F-score: based on this mea-
sure, and as mentioned in subsection 6.3, it is possible to find the
where dfT (t) is the number of dialogues of the train set T
containing term t and dfc (t) is the number of dialogues of the best trade-off between precision and recall by rejecting some of
train set containing term t in dialogues annotated with theme c. the dialogues.
A score wc (t) is introduced for feature t in the entire train
set collection of dialogues discussing theme c. It is computed Recall :
as follows:
1 X |(d) L(d)|
R(, X) = (7)
2 2
wc (t) = dfc (t).idf (t).G (t) (2) |X| dX |L(d)|
where idf (t) is the inverse document frequency for fea- where L(d) indicates the set of theme labels annotated for
ture t. conversation d.
Precision : FIG 1- a FIG 1- b
0.005 0.035
HORR
1 X |(d) L(d)| ITNR
P (, X) = (8) TARF
|X| dX |(d)| 0.03
0.004
F-score : 0.025
thematic density
2P (, X)R(, X)
F (, X) = (9) 0.003
P (, X) + R(, X) 0.02
Accuracy :
0.015
0.002
1 X |(d) L(d)|
A(, X) = (10)
|X| dX |(d) L(d)| 0.01
0.001
0.005
5. Automatic annotation based on thematic
densities 0 0
0 100 200 300 400 500 0 100 200 300 400 500
5.1. Thematic density position in dialogue position in dialogue