[CHWP Titles] | [CaSTA 02-03] |
cpetter@uvic.ca |||| About the Authors (Petter, Roberts, Rose)
CHWP A.35, publ. August 2005. © Editors of CHWP 2005. [Jointly published with TEXT Technology, forthcoming, McMaster University.]
KEYWORDS / MOTS-CLÉS: Electronic text, TEI corpus, XML database, eXist, XPath, JavaScript, Laura Riding, expatriate English writers, Spanish Civil War Texte électronique, corpus TEI, base de données XML, eXist, Xpath, Javascript, Laura Riding, écrivains anglais expatriés, la guerre civile espagnole.
sections | 1. Academic Development |
2. Diary Content (C. Petter & L. Roberts) | |
3. TEI Development (C. Petter et al.) | |
4. Mark-up (L. Roberts) | |
5. Technical Implementation: Web Development (S. Rose) | |
5.1. Web Prototype | |
5.2. XSL Templates and XHTML | |
5.3. XML Indexing System | |
5.3. eXist XML Database | |
Works Cited |
Robert Graves' 1935-39 diary is part of the prized Graves collection in the U Victoria Libraries' Special Collection (see the inventory of the collection onlineurl). The diary has been much used by biographers and scholars, but it has remained inaccessible to a wider readership until recently. The Robert Graves Trust in Oxford owns the copyright to the diary and in 2001 they agreed to allow the U Victoria Libraries to publish the diary as both an electronic and a print edition. Beryl Graves, Robert’s widow, transcribed the diary into text and a copy of this text was sent to the University of Victoria. In addition the Trust has allowed UVictoria to scan Karl Goldschmidt’s annotated version of the transcript. (Goldschmidt was Graves' long time secretary.) Generously, William Graves, Robert's eldest son by Beryl, also offered to contribute notes that he kept on the Deyá portions of the diary.
Robert Graves (1895-1985) is a major twentieth-century English poet, novelist and essayist. After surviving the First World War and subsequent shell shock, he studied at Oxford and began to publish poetry. He married Nancy Nicholson in 1917 and they had four children. In 1926 he met Laura Riding, the American poet, whose poetry he had admired from afar. She became a dominant influence in his life and work. In the late twenties they began an intense working relationship which lasted for over ten years. They founded the Seizin Press together, and in 1929 they moved to Deyá, Majorca. The novels that made Graves famous--Goodbye to All That; I Claudius and Claudius the God--were written in this period. So were many poems, essays and prose pieces, which, notably in the Epilogue series edited by Riding, document their theories concerning writing and the role of poetry, as well as their life together and that of the little coterie of writers and artists they gathered around them. The diary, in its detached recording of day-to-day activities, contains evidence of the disciplined nature of their partnership. Graves regularly submitted his work to Riding’s strict editorial scrutiny: if it did not pass, he would patiently revise it until it was approved. Graves also critiqued Riding’s work, which suggests that their collaboration at this time was mutually beneficial. This was a productive period for both writers.
The diary begins in Deyá in 1935. The Graves/Riding relationship has started to deteriorate, but there are few overt signs of this tension in Graves’ diary entries, and they continue their writing collaborations. They move to England in 1936 with the advent of the Spanish Civil War, to France in 1938, and finally in 1939 to the U.S. where both their partnership and the diary ends. Graves remarried after the war and moved back to Majorca where he settled with Beryl Graves and his second family of four children. Robert Graves died in Deyá, Majorca in 1985 and his wife Beryl remained there until her passing in 2003.
The diary project team consists of Elizabeth Grove-White (U Victoria English Department), who will be responsible for the introductory material; Chris Petter (U Victoria Library), project manager of the text encoding and annotation; Linda Roberts M.A., who is doing the mark-up of the diary (so far from Feb. -Dec. 1935) and Spencer Rose, Humanities Computing and Media Centre, who has been transforming the marked-up text for display on the web (see below). Elizabeth and Chris have applied for a SSHRC grant to complete the markup of the diary in 2004-2005. They hope to publish it in time for the 2006 International Graves Conference.
Work began in 2002 when Chris Petter was granted a six month study leave from Special Collections. The diary was digitized and an index was created which linked the file title to the date. Chris traveled to the University of New Brunswick Text Centre where Lisa Charlong spent the best part of a week tutoring him on the TEI (Text Encoding Initiative) and various xml XML tools that could be used to mark up the diary. Lisa had also arranged for a programmer to create day divisions in the monthly text files. Next, Chris traveled to Oxford, England on a British Council Grant. Here, with help from Sebastian Rahtz, using Pizza Chef, a TEI corpus DTD (Document Type Definition) was created for the diary. Sebastian also contributed style sheets with which to view XML files together with digitized text. Chris was then able to copy and paste the monthly text files of the diary into a template formed by the DTD. Using Notetab Light, and trial copies of both XMetaL and XML Spy, the files were converted to XML and then validated. Another copy of the diary transcription with detailed annotation by Graves’ personal secretary Karl Goldschmidt was scanned. While in Oxford, Chris also set up Access databases in which could be stored the annotations on the names, places and titles mentioned in the diary. These would be used to store Karl Goldschmidt’s annotations and William Graves’ notes. So, much had been accomplished in the study leave, but very little of the text had actually been marked up and only a few diary entries could be viewed with transcription and digitization simultaneously accessible. At this point references to the image files had to be manually inserted into the text which was extremely labour intensive. It was only later that Spencer Rose solved this vexing problem by converting to XML the cross tab index file which linked the digitized images with the dates. (See charts, below).
While in London Chris visited with Willard McCarty where he sought his editorial advice on the principles that the mark-up should emulate. Willard advised Chris to keep the mark-up as simple as possible so as to allow the reader an uncluttered view of the text, while alerting her to Graves’ emendations and erasures and simultaneously allowing access to the manuscript images so that the cruxes and enclosures can be examined. These principles will be discussed further by Linda Roberts.
A guiding principle of the Graves diary mark-up procedure is to approximate the original document as closely as possible, so that the character of Graves’ ‘diary style’ is preserved along with its content. Fortunately, XML (eXtensible Mark up Language), with its capacity to convey emendations such as deletions (crossed out) and supralinear additions allows us to produce an authentic version which reflects to some extent the immediacy of the diary mss. It has been necessary to work constantly with the mss in order to identify and adjust any changes made in the transcript which diverge from the copy text, including paragraphing, spelling and punctuation. Any exceptions will be accounted for in the editorial notes.
The mark-up process involves the following:
- inserting tags for names, places, titles, foreign words, emendations, notes and editorial comments
- replacing entities (such as accents, and currency signs) with the required codes
- doing research for editorial notes and for data bases when required (many of the diary notes have been provided by Karl Goldschmidt --Graves’ personal secretary--and by William Graves).
- adding names, places and titles to their respective databases as they occur in the diary. These are also added to the Notetab library file, a reference list which includes the codes connecting each item with their data base, and enhances the efficiency of the mark-up.
- checking the validity of the marked-up text with a text editor (we are using XML Spy), and making the necessary corrections.
The Graves Diary project is in part a development of an earlier experimental XML transcription project undertaken by Undine Bruckner and Martin Holmes from the Humanities Computing and Media Centre in the summer of 2001. Among other things, this trial project demonstrated the possibility of transforming XML documents into presentable XHTML "on-the-fly" by making use of client-side XSL engines and XML data parsers. The present prototype also makes use of client-side XML processing, but has expanded to accommodate more complex XML mark-up. These XML processing capabilities have become standard with advanced web browser software. Despite a strong preference for server-side XML parsing among web developers, some desirable features of client-side XML processing include the offloading of processing from the server to the client, and the remote access of XML files, as well as portability (such as onto a CD). However, unlike server-side XML-to-XHTML transformations, client-side processing depends on the compatibility of the web browser to parse and render using XSL stylesheets, which, until recently, had been an unstable feature of standards-compliant browsers.
The web interface design involved two phases. The first was to build a static web display that allowed for easy browsing of the diary text. The second, and unfinished, phase would allow users to perform complex search querying of the Graves XML documents. For this prototype, the interface design involved a number of separate components. The static components of the site design include the general web design, XHTML layout and styling using XSL rendering and Cascading Stylesheets. The dynamic components involve using JavaScript for client-side interactivity. These components are brought together to form web-enabled documents.
The Graves Diary XML documents strictly conform to the Text Encoding Initiative guidelines, and therefore use standard tags and attributes that describe the accidental and substantive changes of the text. Although these standardized tags ensure the cross-platform interchange of information, XML mark-up is not suitable as a text-interface. Fortunately, XML is easily rendered using eXtensible Stylesheet Language or XSL - the so-called "stylesheets of XML." At the core of the interface design was the application of a series of XSL templates used to transform the XML into well-formed, standards-compliant XHTML documents. These templates are designed to "match" specific XML tags and attributes and format their contents. Stylesheet templates were therefore broken down according to the XML document structure, and each template treats the substantive and accidental elements separately. Attention to detail in the XML mark-up was reflected in the detail of the XSL-Transformed representation such that the diary's wide range of styling features - all encoded using TEI entities - were reproduced in the transformed XHTML document. As well, the interactive features of the interface - including image scans and spot-of-reference pop-ups - along with other dynamic display elements - were developed using client-side JavaScript. Although JavaScript has only been employed as an experimental script for user interactivity (for which JavaScript is quite suited), JavaScript functions are, to a certain degree, browser-dependent, and therefore are not platform independent.
The Graves Diary contains numerous letters, poems, photographs, clippings and other enclosures that are components of the transcription. As with each diary entry, each enclosure has a separate digital scan that is indexed in Graves XML documents. Each entry and enclosure also contains numerous biographical, geographical and bibliographical references that link to an external reference database. Because of this complex cross-indexing of media, reference information and enclosures, an important design issue for this project was deciding on a suitable indexing system that linked these components in a coherent display.
For this we created two modular XML index files: one file cross-indexes the collection of digital image scans of the diary (including enclosures) with the main diary files; another XML file lists reference entries identified with reference locations in the diary text. Both of these external XML files originated in different file formats and needed to be transformed into XML documents. The enclosure and digital image scan cross-tab index was first created as a Microsoft Excel spreadsheet and was transformed into an XML file using a specially designed VBScript macro. Similarly, the diary reference entries were transformed into XML via VBScript from their original format as three Microsoft Access databases. These XML files could then be included with the diary mark-up in the XSL templates and, as well, made the creation of image and file index displays straightforward.
A future phase of this interface project is to make the transcribed Graves Diary documents searchable online. For this, the implementation of the Open Source native XML database system eXist (http://exist-db.org/) has shown a promising start - with at least the proof of concept being established in a working prototype. The eXist search engine makes use of an extended XPath query language syntax to identify elements in a document. XPath is an established document syntax that is integral to XSL in that it defines the elements of XML documents for stylesheet transformations. eXist's enhanced querying includes basic XPath expressions to search through the nodal structure of the XML document, but it is also capable of full-text keyword searches, as well as queries on the proximity of search terms and queries that make use of regular expressions. Analyses of nodal relationships (e.g. parent-child relations between elements) are also possible with eXist. For a wide range of XPath expressions, the eXist search engine uses stored index files that reference the structure of the XML document. Information can then be retrieved without accessing the collections documents directly. This improves the speed and efficiency of information retrieval.
Graves, Richard Perceval. Robert Graves: The Years with Laura 1926-1940. London: Weidenfeld and Nicolson, 1990.
Meier, Wolfgang. eXist: An Open Source Native XML Database. Darmstadt University of Technology.