Potential Project Description for 2011-12


Title: Statistical exploration of stylistic variation in the original and translations of the novels of O.E. Rolvaag
Domain Expert: Solveig Zempel (Norwegian)

Much research in stylometry, or the use of statistics in the analysis of literary style, has been devoted to identification or characterization of authorship. This study will use many of the techniques of stylometry to explore stylistic characteristics in the novels of O.E. Rolvaag. First determining the stylistic characteristics of the novels in their original Norwegian, then looking for significant differences between the earlier and the later novel, and finally turning to the English translations to see how the style in English differs from the original, and what, if any, stylistic differences emerge that might be due to the different translators of each of the novels. There are a number of steps to this project. 1. Create the corpus. Scan the Norwegian novels and the English translations to create a corpus of digitized texts. 2. Determine the most appropriate encoding scheme and encode (annotate) the texts. I am hopeful that some if not most of this can be done automatically for both the Norwegian and the English texts. Searching for appropriate software is part of the project. 3. Use the annotated texts for both quantitative and qualitative comparisons. This includes for example, the use of concordances, KWIC, word frequencies, collocations, etc. along with a variety of statistical techniques to describe and explore the stylistic characteristics of each individual text and to compare the stylistic characteristics of the various texts. 4. Write up the results. For some examples of this type of analysis, see

D.L. Hoover: Multivariate analysis and the study of style variation. http://llc.oxfordjournals.org/content/18/4/341.short This paper investigates style variation . . . using multivariate analysis, specifically, cluster analysis of the frequencies of frequent words.

D.L. Hoover and Shervin Hess: An exercise in non-ideal authorship attribution: the mysterious Maria Ward

http://llc.oxfordjournals.org/content/24/4/467.abstract?sid=229e7a60-f6f5-420c-b549-ce49b1147903 Investigates three texts, along with similar texts by other authors, using cluster analysis, Delta analysis, t-testing, and PCA.

J.F. Burrows: Modal verbs and moral principles, An aspect of Jane Austin’s style. http://llc.oxfordjournals.org/content/1/1/9.short Differing frequency-patterns in the modal auxiliary verbs show statistically significant differentiations among Jane Austen's characters, between dialogue and narrative, and between different modes of narrative

J. Rybicki: Burrowing into Translation: Character Idiolects in Henryk Sienkiewicz's Trilogy and its Two English Translations

http://llc.oxfordjournals.org/content/21/1/91.abstract The method used was Burrows's technique of multivariate analysis of correlation matrices of relative frequencies of the most frequent words in the dialogue.

And many other articles in journals such as Literary and Linguistic Computing, Computing in the Humanities, Style, etc.