Professional Documents
Culture Documents
attribution methods
Definition of non-traditional
1
known example of non-traditional attribution is that carried out by Mosteller and
Wallace (1964) on the authorship of the disputed Federalist Papers.
2
Table 1: Article Frequency in a range of newspaper articles and email
messages
News Email
the 0.074 0.044
a 0.023 0.022
an 0.004 0.004
From a sample of text in the author’s collection
News Email
I 0.00 0.04
you 0.00 0.02
he/she 0.01 0.01
From a sample of text in the author’s collection
From the above tables we see that first and second person pronouns are much more
densely distributed in emails than in news articles, with determiner distribution in the
former being low in density, and that – conversely – news articles contain higher
densities of determiner usage than emails and virtually non-existent instances of first
and second person pronouns. Clearly, the most common function words are not
uniformly distributed across text types.
The three assumptions commonly found in non-traditional studies, therefore, seem to
be open to challenge: that we each have a linguistic fingerprint, that the main obstacle
to finding this ‘fingerprint’ is one of sample size, and that we can find tokens which
are independent of text type or topic.
Authorship markers
3
Mathematical models
In its early stages authorship attribution was the offshoot of literary analysis: the quest
was to know which particular authors were worth reading firstly from a moral, then an
aesthetic, perspective. We could only know the answer to this if we could recognise
the style of the ‘principal’ authors of the time. A parallel imperative was the quest for
authority: on whose authority was such-and-such a principle declared to be true? If we
could not state for certain the author of the principle, then the importance of the
principle was open to devaluation. Later, authorship became a kind of parlour game
for under-occupied intellectuals: who ‘wrote’ the Bible, was Shakespeare the author
of the plays and poems attributed to him? More recently, authorship became the
concern of linguists – particularly those engaged in forensic work. The authorship of
texts involved in criminal investigations needed to be ascertained, frequently as a
matter of some urgency. In the current security conscious climate, these questions
have become even more important. If the authors of certain types of terrorist
document can be identified, it is asserted, this will assist in the defeat of terrorism
(e.g. Abbasi and Chen, 2005). Given the current political climate such assertions can
be likened to offering a desert dweller the promise of an everlasting water supply, and
is, in the view of this author, somewhat irresponsible.
4
these myths need to be explored and, where necessary, exploded. To this end, in this
research programme I will, specifically, be making the following claims:
(i) that authorship attribution can only be understood as an artefact of
authorship, which is itself a construct derived from author, and that this in
turn is a social construct. All this implies the need to understand the
philosophical and historical significance of the notion of author and the
history of authorship methods;
(ii) that authors vary, for identifiable reasons, and that this variation can be
measured;
(iii) that no particular type of authorship marker has automatic superiority over any
other set of authorship markers: almost any set of markers can be used
depending on text type/s, quantity of texts available for study, and the topics in
those texts;
(iv) that the particular statistical method used to quantify error rates is trivial:
almost any appropriate standard statistical method, provided it is properly and
honestly carried out, is useful. There is thus no need to develop new methods
or make existing ones more complex or opaque;
(v) that any authorship attribution must be undertaken with an understanding of,
and preferably training in, linguistics or possibly psychology or some related
field such as anthropology;
(vi) that non-traditional authorship attribution, provided it is undertaken with care
by suitably trained linguists or those working under linguistic supervision
within a linguistic framework, and provided that the necessary precautions are
observed with regard to statements of probability is not an inherently difficult
task, and can often be accomplished satisfactorily;
(vii) finally, that authors do not have a linguistic fingerprint, though they have
some core features which, however, vary – for a number of reasons, which I
will specify.