You are on page 1of 6

Writer Identification in Music Score Documents

without Staff-Line Removal


Anirban Jyoti Hati
Dept. of EEE
Birla Institute of Technology
Pilani, India
anirban006@gmail.com
Partha Pratim Roy
CVPR Unit
Indian Statistical Institute
Kolkata, India
2partharoy@gmail.com

Umapada Pal
CVPR Unit
Indian Statistical Institute
Kolkata, India
umapada@isical.ac.in

Abstract Writer identification from musical scores is a
challenging task. A few pieces of work on writer identification in
musical sheets have been published in the literature but to the
best of our knowledge all these work were performed after
removal of staff lines from the musical scores. In this paper we
propose a symbol-independent writer identication framework
using HMM in music score without removing staff lines. The
writing style of each writer is modelled using sliding window
based LGH feature. To identify the writer of an input musical
sheet, all musical lines are fed to writer specific HMM models
and each model return a log-likelihood score for the given input.
These log-likelihood scores from each HMM models are
compared and the writer corresponding to the maximum score
is considered as identified writer of the test sample. Next, a page
level log-likelihood score is computed for writer identification in
each page sample. We have compared our proposed approach
with Gaussian Mixture Models (GMMs) based writer
identification system in CVC-MUSCIMA data set. The results
obtained from an experiment on 50 writers show that the HMM
based approach outperforms GMM based approach.
Keywords- Writer I dentification, Music Score Documents,
Gaussian Mixture Model, Hidden Markov Model.

I INTRODUCTION
Writer identification of musical sheets is inevitably a
challenging task compared to text documents as number of
musical notes in music sheets is lesser than the number of
text symbols in text pages. Generally in a music page, notes
are written on stave lines. Music sheets also contain other
symbols like Clefs, Accidentals, Time signatures, Dynamics,
text etc. (See Fig.1). There are few works on writer
identification for music scores. In all these works, music
sheets were subjected to staff line removal and other pre-
processing which ease the identification of writer. To the best
of our knowledge we are first to propose writer identification
in musical pages without removing staff line or doing any
kind of pre-processing.
According to literature survey, initial proposal of writer
identification from music scores was given by Bruder et al.
[1]. They extracted features from the collection of music
scores and defined a tree structure for each feature for
clustering the feature set and K-NN method was used for the
purpose. Fornes et al. [4] proposed a writer identification
scheme on musical scores after removing staff line.
Experiments were performed on 175 music lines from seven
writers (each writer contributed 25 music lines) and achieved
95% writer identification. Later, Fornes et al. [5] proposed a
writer identification method based on textural features.
Gabor features and gray-scale co-occurrence matrices
features were extracted from the images after staff line
removal. K-NN classifier was used for classification. Gordo
et al. [6] proposed a bag of visual terms method to identify
the writer of the graphical document of old music scores
from symbol analysis. Recently Gordo et al. [16] proposed a
writer identification method with Bag of Notes after staff-line
removal from music document image.


Fig. 1. Example of a musical sheet showing handwritten
music-symbols along with staff-lines.

A writer identification competition on music scores was
organized in ICDAR, 2011 [8]. The music sheets having no
staff lines were considered. In this competition, Hassane and
Al-Maadeed proposed three methods as mentioned in the
PRIP02 method: (i)edge-based directional probability
distribution features [10], (ii) grapheme features [11], (iii)
combination of both edge and grapheme based features. They
reported 77% writer identification rate combining both
features. Djeddi et al. (TUA03) proposed five methods: (i) A
5 nearest neighbour classifier with city block distance metric,
(ii) Support Vector Machine (one against one), (iii) Support
Vector Machine (one against all), (iv) Multilayer perceptron,
(v) combination of the four previous classifier. They reported
76% accuracy using method (iii).
In this paper, we propose a symbol-independent writer
identication framework using HMM in music score without
removing staff lines. A flowchart of our algorithm is
presented in Fig.2. The music page is first segmented into
music score lines that contain staff lines and music symbols.
Next, local gradient histogram (LGH) based feature has been
applied in each music-score line to capture the writing style
feature. These features are used to construct HMM models
for each writer. For identification of an unknown music-
sheet, the input image is segmented into music-lines and next
these lines are fed to each of the HMM models. HMM
returns a log-likelihood score for each writer. Based on these
scores the writer of the target music-score line image is
decided. Finally, a page-level score is computed from these
line-based scores and the writer is identified for that page
sample. We have compared our HMM based writer
identification approach with GMM models and show that
HMM based approach outperforms GMM.
The rest of the paper is organized as follows. In Section II,
we explain feature extraction process from music sheet
image. Section III explains the writer identification approach
using HMM. We demonstrate the performance of our
proposed algorithm on CVC-MUSCIMA dataset in Section
IV. Finally, conclusions and future work are presented.


Fig. 2. Block diagram of the proposed framework
II. FEATURE EXTRACTION
To extract the feature from music lines, we first segment the
music-score lines from the musical document. Next, a sliding
window based feature processing algorithm is applied. These
processes are mentioned in the following sub-sections.
Music-Score Line Segmentation: Each music page is
subjected to score line segmentation. For this segmentation
task, first music pages are eroded with suitable structural
element to remove all of the musical symbols present.
Erosion followed by dilation make the presence of staff lines
more effective. After these morphological operations, noise
elimination methods are used to eliminate the unwanted
noise. Now the variation of intensity in the image provides
the positions of the staff lines which help us to segment out
the score lines from the original music page. Figure 3 shows
segmented out score lines from a page.




Fig. 3. Result of score line segmentation from musical sheet
shown in Fig. 1.

LGH Feature: We have used an off-the-shelf Local Gradient
Histogram (LGH) feature [13] in our proposed system. In
this feature extraction approach, a sliding window traverses
the image from left to right in order to produce a sequence of
overlapping sub-images. Each window is sub-divided into 4
4 (4 rows and 4 columns) regular cells and from all pixels
in each cell a histogram of gradient orientations is calculated.
For feature extraction, at first, a Gaussian filter is applied
to sub-image I(x, y) to obtain the smoothed image S(x, y).
Next, the horizontal and vertical gradient components

and

of S(x, y) are determined as follows.

(, )= ( +1, ) ( 1, )

(, )= (, + 1) (, 1)
Then, the gradient magnitude m and direction are obtained
for each pixel with coordinates(x, y) as
(, ) =

2
+

2
and (, ) =
1


The field vector

= (

), is divided into L bin


histogram. Each bin specifies a particular octant in the
angular radian space. The histogram is formed by adding
up (, ) to the bin indicated by quantized (, ). For
example the concatenation of the 16 histograms of 8 angular
bins provides a 128-dimensional feature vector for each
sliding window position.

III WRITER IDENTIFICATION

A. Line Level writer identification: The writer identification
system is built from HMM based recognizers of handwritten
music-line. HMM is used for writer identification of the
segmented music score-lines.
To identify the writer, we create an HMM for each
writer category. For a classifier of C categories, we choose
the model which best matches the observation from C
HMMs

= {

,

,

} ,where m= 1.C, and

=1
= 1. This means when a unknown sequence of
unknown category is given, we calculate P (

) for each
HMM

and select

, where

= arg max

(

)
(

|) =
(|

) (

)
()

Where, () is the density function for x irrespective of the
category and is computed by:
() = (|

) (

)

=1

The term (|

) is called the likelihood function for O


given

. (

) is called the marginal or prior probability


of

. The standard solution, performed by the Viterbi


algorithm, computes probability (|

) of that sequence
generated by .

B. Page level writer identification: The recognizer returns a
log-likelihood score from each writer specific HMM models
for a given test line. Let, the log-likelihood score of each
line image be S = {S
1
, S
2
.S
N
} for N writers. The
probability P = {P
1
, P
..
P
N
} of the writers choice is thus
calculated by

= exp (

).
According to the probability scores, the writers are
ranked as R = {R
1
, R
2
.R
N
}. Fig. 4 shows the distribution of
normalized probability scores (NPS) for all four lines of the
music page in Fig 1. We show the NPS score to visualize the
probability distribution in better way. The writer id of Fig.1
is 9. We observe that HMM estimates correct rank for line
1, 3 and 4 but in case of line 2, some other writers (i.e. 6, 15,
25, and 35) have better rank which is wrong.

Fig. 4. NPS distribution for music score line 1 to 4. (For
better view of the image, see soft copy of the pdf version)

To avoid the confusion of line wise detection and identify
the original writer of a music page, we assign a weight value
W={W
1
, W
2
..W
N
} to the writers according to their rank
(R
i
). Next step is to calculate the final scores of the writers.
For m number of score lines corresponding to a music page,
the final score F
i
of i
th
writer is estimated as

= [

=1

]
where, P
ij
and W
ij
are probability score and weight
assignment for j
th
line, respectively. We have tried a number
of functions listed in Table I for weight assignment and
found that Inverted distance function provides the best
result. The detailed analysis of result is given in Table II.

Table I: Function for weighted sum calculation
Decay function Description
Uniform function = K
Inverted distance
=


Inverted distance
squared
=

(
2
)

Exponential decay = exp (( ))
Sinusoidal
= Sin[

(
5
)
]
Linear negative slope = (n + N)

The writer having maximum final score max (F
1
,F
2
.F
N
)
is considered as the identified writer of that page. Figure 5
shows normalized final score for all writers. The Inverted
distance function is used to compute the page level score.


Fig. 5. Normalized final scores for all writers corresponding
to the music page shown in Fig. 1. Blue color indicates
scores obtained using weighted function and red denotes
scores with uniform function. (For better view of the image,
see soft copy of the pdf version)

IV. EXPERIMENTAL RESULTS
Dataset: We have performed our experiment on musical
sheets from CVC-MUSCIMA dataset [6, 8]. This dataset
consists of 1,000 music pages written by 50 different
writers. Every writer has 20 different music pages which
show a clear difference of writing style. Adult and expert
musicians from different geographic locations have been
chosen for making this dataset. This ensures a mature and
different writing style for every writer. Here we define the
dataset in a constrained way [16] where the training pieces
of a given fold are same for each writer. The whole dataset is
divided in two parts, one for training and other for testing.

As mentioned earlier, the images are subjected to music-
score line segmentation. All segmented score-lines are
directly fed to feature extraction process. We did not apply
any other pre-processing steps like binarization, noise
removal, staff line removal, text removal, etc. In the
following, we present the performance of writer
identification at line-level using HMM. Next, we discuss the
performance improvement in page level identification.
Finally, we compare the result of page level identification
with GMM based approach.

Line Wise Identification Performance: From each
score line image, LGH feature is extracted using a sliding
window. The window of fixed width slides through the
image with 50% overlapping ratio and in each position it
computes feature by dividing the region into n rectangular
cells. After extracting feature from training images of a
particular writer, the obtained sequence of feature vectors
were used to train HMM model for that particular writer.
Thus, 50 HMMs were created for fifty different writers from
the dataset. During testing, from each query test image,
feature sequence is extracted and HMM generates the log-
likelihood score of each writer.

We have tuned the parameters like, feature dimension and,
window width. The feature dimension is varied according to
concentration of histogram and angular information.
Considering orientation as 16 (T=16) we get 256
dimensional feature vector from each window position. The
feature dimension size is adjusted by testing with different
dimensions (i.e. 128, 512). Figure 6 (a) and 6 (b) present the
writer identification accuracy with varying the feature
dimension and size of the window. Highest line wise
identification accuracy was obtained as 59.52% with
window size 34 and feature dimension 256.

(a)
(b)
Fig. 6. Identification accuracy for different feature
dimension (a) and window size (b) (For better view of the
image, see soft copy of the pdf version)

During training of HMMs, the different parameters such as
the number of states and the number of Gaussian
distributions govern the proposed architecture. Fig.7
presents a detailed analysis of performance on datasets that
is observed in terms of Gaussian number and the state
number. From the experiment with the HMM parameters,
we decided 64 Gaussian and 4 states for each model.

(a)
(b)
Fig. 7. Writer identification performance against (a) number
of Gaussians and (b) number of states (For better view of
the image, see soft copy of the pdf version)
Page Wise Identification Performance: A music page
contains multiple score lines. Some of them provide more
information towards correct identification of writer and ease
the task whereas some of them make it more challenging. To
identify the original page level writer, we have applied
different decay functions as explained in section IV B. Table
II presents the performance with different weighted
function. With the inverted distance function, we have
obtained the best identification rate. There is a significant
improvement in page level writer identification in fig 6, 7, 8,
and 9 where we have chosen the same parameter set (i.e.
window size, feature dimension, number of states, number
of Gaussians) for line level performance.
Table II: Accuracy of different weight functions for
combining line-wise identification scores.
Weight function Accuracy
Uniform function 77.78
Inverted distance 84.13
Inverted distance squared 83.33
Exponential decay 82.54
Sinusoidal 82.54
Linear negative slope 80.16

Fig. 8 shows the writer identification result with different
top choices. Here, Top N denotes that the true writer is
present among the N-best hypotheses. It is to be noted that
with 6 top choices, the page level identification result
reached to 100%. With these 6 choices the line level
performance was 85.76%.

Fig. 8. Writer identification accuracy with different top
choices. (For better view of the image, see soft copy of the
pdf version)
Error Analysis: Though our methodology offers a good
overall accuracy, still identification process is susceptible to
error due to presence of staff lines, less number of music
symbol present in a music page, some part do not contain
any necessary information and some of the music pages
create confusion with other writers etc. Fig 9 shows the
similar writing style of writer 17 and 42 which leads to
misrecognition of the music pages of writer 17.


(a)

(b)
Fig. 9. Examples of images (a) from writer 17 and (b) from
writer 42 where we received high confusion.

To identify the writers which are more prone towards wrong
identification (fig 10), we have used the following formula
to measure.
(%) = 100
( )


Where, E (expected accuracy) is the accuracy corresponding
to successful identification of all writers (ideal case). O
(observed accuracy) is the page wise identification. Fig. 10
presents the error analysis of 50 writers from the dataset. It
is to be noticed that writer 1, 4, 17 and 34 have the highest
error percentage of 67%. Overall error percentage (Error) is
15.86%.



Fig. 10. Error in page wise identification for different writers

Comparison with GMM-based Approach: We have
compared our HMM-based approach with Gaussian Mixture
Model (GMM) based approach. GMM [12] is used here to
create a model for each music writer. The distribution of the
feature vectors extracted from a persons handwriting is
modelled by a Gaussian mixture density. For a D-
dimensional feature vector, x, the mixture density for a
specific writer is defined as
p(x|)=

(x).

i=1

The density is a weighted linear combination of M uni-
modal Gaussian densities, p
i
(x), each parameterized by a
D1 mean vector,
i
, and a DD covariance matrix, C
i
. The
parameters of a writers density model are denoted as =
{w
i
,
i
, C
i
}, i = 1,.,M where the mixture weights, w
i
, sum
up to one. We use diagonal covariance matrices [14] in this
paper. During decoding, the feature vectors of X = {x
1
, .... ,
x
T
} are assumed to be independent. The log-likelihood of a
model for a sequence of feature vectors X is defined as
log p(X|)= (|)

=1

where p(x
t
|) is computed according to Equation(1).

In GMM based experiment we retained the same
parameter setup. First, LGH features are extracted from the
music-line images and these are fed to the GMM for writer
identification. Next, page level writers are determined by the
previously explained algorithm. We obtained less than 50%
accuracy with GMM based method.

To measure the scalability of the system, we show ( Fig.11)
how identification performance is dependent on the number
of writers. It is observed that HMM works very well for
lesser number of writers. Up to 7 writers, the accuracies
were 100% using HMM. The line level accuracy gives more
than 80% accuracy with 7 writers. With GMM, the
performance falls down with increasing number of writers.
It indicates HMM to be a better choice. The advantage of
GMMs over HMMs is its lesser training time. GMMs are
less complex, as it consists of only one state and one output
distribution function. There is a significant decrease in
accuracy for more than 10 writers.



Fig. 11. Performance of scalability of HMM and GMM-based
writer identification approaches with increasing the number
of writers. (For better view of the image, see soft copy of the
pdf version)
Comparison with Other System: To compare with the
writer identification results in CVC-MUSCIMA dataset, we
find that best result obtained in ICDAR competition was
77%. Recently, Gordo et al. [16] reported 99.7% by using
bag of notes features extracted from segmented music-
symbols. All these existing works preferred staff line
removal which is not always possible for noisy, tampered
and torn music pages. Our proposed approach achieved
84.13% accuracy without removing staff-lines.

V. CONCLUSION AND FUTURE WORK
We have presented here a novel approach for writer
identification without removing staff line or doing any kind
of preprocessing work except music line segmentation. The
methodology is generic and can be used for other datasets.
From the results, it is clear that HMM based approach
performs better than GMM in music score images. For better
identification performance, differentiation of the writers
based on their log-likelihood score is very important. In this
context Inverted distance function performs better
compared to others. In future we plan to investigate the
effects of different noises in the experiment. The
performance can also be improved by including other
features along with LGH.

REFERENCE
[1] I. Bruder, T. Ignatova, L. Milewski, Integrating knowledge
components for writer identification in a digital archive of
historical music scores, in: Proceedings of the Joint ACM/IEEE
Conference on Digital Libraries, 2004, pp. 397.
[2] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum
likelihood from incomplete data via the EM algorithm. In Journal
of Royal Statistical Society, 39:138, 1977.
[3] H. Melin, J. Koolwaaij, J. Lindberg, and F. Bimbot. A
comparative evaluation of variance ooring techniques in HMM
based speaker verication. In Proc. of the 5th Int. Conf. on Spoken
Language Processing, pages 23792382, 1998.
[4] A. Fornes, J. Llados, G. Sanchez, H. Bunke, Writer
identification in old handwritten music scores, In proc. of the
International Workshop on Document Analysis Systems, 2008, pp.
347 353.
[5] A. Fornes, J. Llados, G. Sanchez, H. Bunke, On the use of
textural features for writer identification in old handwritten music
scores, In Proc. of the International Conference on Document
Analysis and Recognition, 2009, pp. 9961000.
[6] A. Gordo, A. Fornes, E. Valveny, J. Llados, A bag of notes
approach to writer identification in old handwritten musical scores,
in: Proceedings of the International Workshop on Document
Analysis Systems, 2010, pp. 247 254.
[7] A. Fornes, J. Llanos, G. Sanchez, X. Otazu, H. Bunke, A
combination of features for symbol-independent writer
identification in old music scores, International Journal on
Document Analysis and Recognition 13 (2010), pp. 243259.
[8] A. Fornes, A. Dutta, A. Gordo, J. Llados, The ICDAR 2011
music scores competition: staff removal and writer identification,
in: Proceedings of the International Conference on Document
Analysis and Recognition, 2011, pp. 15111515.
[9] C. Hertel and H. Bunke. A set of novel features for writer
identification. In Audio- and Video-Based Biometric Person
Authentication (AVBPA), pp.679687, 2003.
[10] S. Al-Maadeed, E. Mohammed and D. Al Kassis, Writer
identification using edge-based directional probability distribution
features for Arabic words, International Conference on Computer
Systems and Applications (AICCSA), pp. 582-590, 2008.
[11] S. Al-Maadeed, A.-A. Al-Kurbi, A. Al-Muslih, R. Al-Qahtani
and H. Al Kubisi, Writer identification of Arabic handwriting
documents using grapheme features, Intl Conf. on Computer
Systems and Applications (AICCSA), pp.923-924, 2008.
[12] A. Schlapbach and H.Bunke Off-line Identification Using
Gaussian Mixture Models, In Proc. of the International
Conference on Pattern Recognition.
[13] J. R. Serrano and F. Perronnin, Handwritten word-spotting
using hidden Markov models and universal vocabularies, In
Proceedings of the International Conference on Pattern
Recognition.
[14] H. Melin, J. Koolwaaij, J. Lindberg, and F. Bimbot. A
comparative evaluation of variance flooring techniques in HMM
based speaker verification. In Proc. of the 5th Int. Conf. on Spoken
Language Processing, pages 23792382, 1998.
[15] D. A. Reynolds, T. F. Quatieri, and R. B. Dunn. Speaker
verification using adapted Gaussian mixture models. Digital Signal
Processing, 10:1941, 2000.
[16] A. Gordo, A Fornes, and Ernest Valveny. Writer identification
in handwritten musical scores with bag of notes. Pattern
Recognition 46(2013) 1337-1346.

You might also like