CHWP A.5 | Bear, "'The Lady of May' and the Rhetoric of Electronic Text" |
Among the many texts appearing online are those which have appeared before in print and hold enough attraction for generations of readers to have become what are called classics. As the Internet grows, questions arise as to how best to represent classic texts online. Some, of whom Michael Hart of Project Gutenberg is the best known (Neuman 1991: 365), have advocated using "pure vanilla" ASCII code, the lowest common denominator of text encoding, so that no one need be left behind while those having more buying power move on to more expensive and complex technology. Others feel no effort should be spared in marking up texts for research, which may require a more sophisticated technology to use, but offers the best chance of producing new knowledge. In the forefront of this latter movement are the proponents of TEI, the Text Encoding Initiative (Neuman 1991: 367). TEI is an implementation of the capabilities of the Standard Generalized Markup Language (SGML), a coding scheme that permits in-depth analysis of the parts of a text. SGML is currently the best approach for providing scholarly electronic editions to those few scholars doing computer-based analyses of texts, but may be "overkill" for the production of popular editions. Fortunately, there is a middle way between these extremes, offering much of the simplicity of ASCII with a glimpse of the power of SGML: HTML (Hyper Text Markup Language). HTML is a subset of SGML designed for transportation of hyperlinked documents, graphics, sound files, and motion pictures via a network such as the Internet. Users who have never otherwise attempted computer programming have discovered the ease of working in HTML. There are now more than fifty million Web-accessible documents (HotBot 1996), of which over five thousand are classic texts traditionally presented in codex book form, ranging from Homer's Odyssey to James Joyce's Ulysses (Ockerbloom 1996).
With software of a new type called a web browser, one can now consult a rapidly expanding library of texts in ways not possible previously. A text-only browser such as LYNX, when combined with speech software, can read online text to a visually impaired user. Browsers have search capability, so that each instance of a given word or phrase in a given work may be located and studied in context; every online edition is thus also a concordance. Selected portions of longer works in the public domain such as an act of Hamlet or chapter of Lord Jim can easily be downloaded, reformatted, printed out and used in class packets. Thus, although text read from a monitor is not as legible as from paper, electronic text is useful enough to drive a movement to provide such access. As noted above, however, there is disagreement on how to go about this.
At Cornell University, the Library of Congress, and elsewhere (McClung 1997), experiments are going forward in presenting the original pages of classic print editions in electronic facsimile, much as has been done through microfilm technology. Such images may contain visual information, such as marginalia in the handwriting of previous owners of the scanned print copy, which cannot without significant effort be presented by TEI or HTML, but have their own drawbacks. A single page scan takes from ten to a hundred times as much memory as stored text, and is accordingly slow to transport over a network. Also, an image does not easily support text searches, though dual editions are in production that will do so. For the time being, then, for networked access, SGML offers the best choice for a scholarly edition, ASCII is still suitable for the widest possible dissemination, and HTML, with its increasingly diverse options for presentation design, offers a solution for attractively formatted teaching editions.