A Thematic Categorization of Thomas Hardys Prose Fiction
An exploratory multivariate analysis approach
Abstract From the beginning of the twentieth century to the present, critical debate about Hardys prose fiction writings has been shaped by questions of generic categorization and thematic classification. Almost all of the work on the thematic classification of the prose writings of Thomas Hardy, however, is theoretically driven. That is, classification criteria are selected by the critic based on some critical theory or framework (e.g. formal, biographical/historical, moral, Victorian, anti-Victorian, feminist, psychoanalytic, postcolonial, philosophical/religious, sociological/anthropological, etc) supported by personal knowledge and evaluation of the texts. Even more, many of the existing accounts follow the stereotype classifications of what is called Hardy Critical Industry. I mean by this that many of Hardys critics are willing to agree with conventional, well-known criticisms of Hardy regardless of any critical presuppositions about what they are supposed to agree/disagree. So in spite of the great number of thematic reviews of Hardys prose work, there is neither consensus among his commentators nor an objective study that adopts reliable empirical methods. In the face of the limitation of the previously mentioned methods, the research question in this article specifically asks Can an objective and conceptually useful classification- based on empirical evidence- abstracted from Thomas Hardys prose fiction texts be found? To address the research question, the study proposes vector space classification (VSC) for classifying the novels and short stories of Thomas Hardy thematically based on the lexical frequency representations of those texts. To put it into effect, VSC is executed where exploratory multivariate analysis techniques are applied to perform a document ranking wherein cluster information is used within a graph-based framework. The rationale behind the adoption of multivariate analysis, however, is that our proposed classification is concerned with grouping texts of identical/similar themes together into distinct sets. This suggests that the idea of analysis becomes a multivariate data-solving problem in the first place. Thus the core objective of this exploratory research is to make some preliminary progress towards developing a thematic classification
that addresses the limitations of traditional philological methods.
Our approach to handling this objective can be outlined as follows: the study takes the form of a case-study design, with an in-depth analysis of multivariate statistical techniques, particularly cluster analysis and principal components analysis and their feasibility in generating an empirical thematic classification of the prose fiction of Hardy. For the purpose of classification, cluster analysis is used to perform the task. This is simply a multivariate statistical technique for finding relatively homogeneous clusters of cases based on proximity measures. It encompasses a number of different methods including hierarchical cluster analysis with the purpose of sorting different objects into distinct groups where members of the one group are similar to each other and distant from members of the other group/s. In this, hierarchical cluster analysis is first used to measure the semantic relatedness between the selected texts of Thomas Hardy with the purpose of generating an automated objective classification of these works. What we have as a result is an illustrated set of analyses that show how texts are related to each other thematically. To validate the results, Principal Components Analysis (PCA) is used to reformulate the lexical frequency data into a reduced set of uncorrelated variables which is reanalyzed to determine whether or not the cluster trees based on the two matrices are morphologically equivalent. Taken together, they (cluster analysis and PCA) provide an integrated framework for the thematic classification of literary texts. The results are that the 62 texts (involved in the study and which represent all the novels and short stories of Hardy) fall into clearly defined 3 thematic groups in relation to the themes they convey, and that these groups correlate to some extent with bibliographical, textual, and critical findings associated with the texts. It can be concluded that computational methods like cluster analysis can usefully supplement the philological methods in thematic classification of the novels and short stories of Thomas Hardy yet in objective replicable ways.