[CHWP Titles] [CHC 2005]

Monkeying Around with Text

Terry Butler

University of Alberta

Terry.Butler@UAlberta.ca |||| http://www.arts.ualberta.ca/~tbutler

CHWP C.1, publ. January 2007. © Editors of CHWP 2007.


[Abstract / Résumé]

KEYWORDS / MOTS-CLÉS: text comparison, random text generation, monkeys theorem, Shakespeare, Defoe, Dickens, the Hardy Boys, abstracting, infinite numbers / comparaison de texte, génération aléatoire de texte, théorème du singe, Shakespeare, Defoe, Dickens, Hardy Boys, contraction de texte, nombres infinis


section 1. Monkeys
  2. Difference in the Text
  3. Masterpieces of World Literature
  4. Robinson Crusoe
  5. The Hardy Boys
  6. Conclusion
  Bibliography and Works Cited


1. Monkeys

If an army of monkeys were strumming on typewriters they might write all the books in the British Museum. (Eddington, 1928, p. 72)

So wrote Arthur Eddington, the gifted British physicist and science popularizer, in his 1927 Gifford Lectures The Nature of the Physical World. This conceit (which Eddington advanced only as a contrast to how much more unlikely was spontaneous order in a simple physical system) has become a hardy perennial; but Eddington was not the originator of it -- not by a long shot. I will take us on a guided tour of sightings of it, in order to frame a question which is important for textual criticism: "when is one text the same as another"? What degree of difference is acceptable, and in what contexts?

Eddington may well have met these monkeys through the French scientist Émile Borel, who wrote (Borel, 1913):

... Concevons qu'on ait dressé un million de singes à frapper au hasard sur les touches d'une machine à écrire et que, sous la surveillance de contremaîtres illettrés, ces singes dactylographes travaillent avec ardeur dix heures par jour avec un million de machines à écrire de types variés. …

Elsewhere Borel refers to the probability that "thousands of monkeys, randomly typing on typewriters, will reproduce exactly the contents of the National Library" (Borel. 1950, p. 105, translation Borel, 1965, p. 60).  This probability is expressed numerically as "10" with a negative exponent of more than 1 trillion zeros; a number which expresses both the certainty that the monkeys will eventually succeed, and utterly dwarfs any present conceptions of the size of the universe, or the length of time it might continue to exist.

This audience will be familiar with post-Eddington adaptations: "The Library of Babel" by Jorge Luis Borges; Isaac Asimov's "The Monkey's Finger" (1953); and "Epicac" by Kurt Vonnegut Jr. in his 1958 short story collection Welcome to the Monkey House.

My first encounter with the underlying idea here came when I read "The Universal Library", a story by Kurd Lasswitz which was anthologized in Clifton Fadiman's Fantasia Mathematica. Written in 1901, the story teases us with the mental construct of a library which contains all possible all works (texts of finite length with every possible character in every possible position), then delightfully blows it up by showing that the works would fill a far greater volume than the known universe. (Lasswitz translation, 1958):

"What?" said Mrs. Wallhusen. "You say everything will be in that library? The complete works of Goethe? The Bible? the works of all the classical philosophers?" "Yes, and with all the variations of wording that nobody has thought up yet. You'll find the long lost works of Tacitus and their translations into all living and dead languages. … all forgotten and undelivered speeches in all parliaments, the official version of the Universal Declaration of Peace, the history of the subsequent wars …"

With computers, we can translate this vastness of bulk to the intractable time it would take to generate such a series of works. You can watch the monkeys at work at:   http://ser.tninet.se/~ecf599g/aardasnails/java/Monkey/webpages/ (writing Shakespeare, of course).

My own initiation into humanities computing came through the delightful Computer Recreation puzzles set in the Scientific American. Brian Hayes, in "A Progress Report on the Fine Art of Turning Literature into Drivel" (Hayes, 1983), showed us how to build a random text generator, and, even more interestingly, how to create random texts that were inflected by the style of a particular author.

But the monkeys (and their typewriters) are merely the modernist guise under which this famous trope now appears. We can trace it back, to the 17th century English divine Archbishop Tillotson (Tillotson, 1719, p. 10):

[In Answer to the Epicurean System, he argues] "How often might a Man, after he had jumbled a Set of Letters in a Bag, fling them out upon the Ground before they would fall into an exact Poem, yea or so much as make a good Discourse in Prose? And may not a little Book be as easily made by Chance, as this great Volume of the World?

And the monkey's precursors were spotted by Lemuel Gulliver, in The Grand Academy of Lagado, when he saw the Literary Engine (Gulliver's Travels III:V, Swift, 1960, p. 148):

The Professor then desired me to observe, for he was going to set his Engine at Work. The Pupils at his Command took each of them hold of an Iron Handle, whereof there were fourty fixed round the Edges of the Frame, and giving them a sudden turn, the whole Disposition of the Words was entirely changed. He then commanded six and thirty of the Lads to read the several Lines softly as they appeared upon the Frame; and where they found three or four Words together that might make part of a Sentence, they dictated to the four remaining Boys who were Scribes.

Modern variants of the trope often task the monkeys with producing the works of Shakespeare, who so often acts (as in this case) as a strange attractor for cultural extremes. But Shakespeare himself is not uninvolved, as he mentions the famous random text generator of classical times (Titus Andronicus 4.1.103-108):

Titus: ... And, come, I will go get a leaf of brass, And with a gad of steel will write these words, And lay it by: the angry northern wind Will blow these sands, like Sibyl's leaves, abroad, And where's your lesson, then? Boy, what say you?

Dante says the same (Paradiso 33:64-66):

Even thus the snow is in the sun unsealed,  Even thus upon the wind in the light leaves Were the soothsayings of the Sibyl lost.

Henry Thurston Peck (Peck, 1898) explains: "The most famous Sibyl in antiquity was that of Cumae in Campania … whom Virgil represents as being visited by Aeneas".

In Book III of the Aeneid, the oracle at Delos explains to Aeneas to whom he must go for guidance (Aeneid 3:441-452):

Arriv'd at Cumae, when you view the flood Of black Avernus, and the sounding wood, The mad prophetic Sibyl you shall find, Dark in a cave, and on a rock reclin'd. She sings the fates, and, in her frantic fits, The notes and names, inscrib'd, to leafs commits. What she commits to leafs, in order laid, Before the cavern's entrance are display'd: Unmov'd they lie; but, if a blast of wind Without, or vapors issue from behind, The leafs are borne aloft in liquid air, And she resumes no more her museful care, Nor gathers from the rocks her scatter'd verse, Nor sets in order what the winds disperse. Thus, many not succeeding, most upbraid The madness of the visionary maid, And with loud curses leave the mystic shade.

2. Difference in the Text

For me, the most interesting aspect of random texts is not producing them (we have the services of that modern monkey, the computer, to do it for us), but dealing with the results. Who is going to read these texts? Or, an even more urgent problem, who is going to proof-read them?

Here is how it might go, courtesy Bob Newhart (Newhart, 1960):

You know the idea … if you put an infinite number of monkeys, at an infinite number of typewriters, they would type all the great books.  Now, they are going to type a lot of gibberish, too.  So they would have to hire guys to check the monkeys to see if they were turning out anything worthwhile.  … Look, I've got something: "To be or not to be … that is the gezortenblatt …".

When thinking about the deformation of texts, it occurred to me to start at the most extreme case: what is the maximum amount of difference which we will tolerate, and still grant that two texts are (in some meaningful way) "the same"?

In doing so, I am responding (in part) to suggestions made in Peter Shillingsburg's "Polymorphic, Protean, Reliable, Electronic Texts", where he says "every new embodiment of a literary text is a new, additional, and altered embodiment of it".  His thinking about texts takes us away from the naive position that there is an single, ideal, encoded digital representation of any literary work; rather, we can focus on the digital text as an opportunity to access and manipulate the text, and thereby learn something about both the digital instantiation and the work that it embodies.

So I started by tackling pairs of texts that claim to be "the same work", but are vastly different in scale. What are the characteristics of a short summary or précis (of a literary text) which make it adequate or acceptable?

The measures of adequacy for non-fictional précis's and abstracts are practical: a good précis is one which serves the information needs of the searcher -- it guides the searcher to the longer text in cases where that longer text meets the information needs; or it provides the sought after information. These criteria are not (normally) relevant to reading of literature.

A more useful handle on the problem came from the discussion of equivalence in translation studies: a translation of a text can be to have reached its goal if it achieves equivalent effects to the original. These effects can of course include the characteristics which interest us in literature.

I'm going to have time to talk briefly about three kinds of texts, and schematically lay them along two axes. The texts are:

3. Masterpieces of World Literature

These familiar summaries and commentaries, original produced by Frank Magill under the title Masterplots, are widely used précis's of world classics.

I chose to focus on English language prose texts, in order that the summaries were not different from the original in language and genre as well as length.

I will use Dickens A Christmas Carol for my first test case. How can we relate the summary (at 6500 characters, only 4% as long), to its original?

Dickens

I am bringing up a display environment which I developed, to help us look at, and think about these cases. The idea is to display the two texts in relation to each other. There are two approaches: paragraph by paragraph, or word by word.

The longer text is here above; A Christmas Carol strung out, like Christmas lights, along the top; the rectangular icons:

  1. suggest the relative length of each paragraph

  2. hint at the text by a mouse hover

  3. and give access to the text, with a click.

Below it is Magill's summary.  The two texts are aligned by a simple edit-distance algorithm: for each paragraph of the second text, which paragraph of the first text is it least unlike?  (A constraint is applied to prevent these really short texts from being strewn across the original text with no reference to order.)  And again, access to the text is provided (paragraph numbers, hover, and click).

This visualization is very rudimentary; I look forward to working with my colleagues at Alberta who participate in the Experimental Reading Workshop to improve it.

But already we can see interesting processes at work:

  1. the general for the specific: adjectives such as "frightened", "skimpy", "generous" cover one or more specific actions which Dickens presents without editorial comment.

  2. character and action predominate: mood setting and the varied voices of narrator and the characters are lost

This summary tells us what A Christmas Carol is about, but is likely not (in any meaningful way) equivalent in its effects to the original.

4. Robinson Crusoe

With Robinson Crusoe, we have an even more extreme contrast.  The Masterplot summary is less than 1% of the total length of the novel; what strategies does it adopt in order to try to provide an adequate précis?

Dickens

If we look at two other representations of this well-known text, we can see other results.  The "retelling" of the story, in words of one syllable, (Defoe / Godophin 1882) provides an interesting starting point. First, the text is considerably shorter, again, than the original; so the processes of selection and summary are at work.

Second, the constraint of one-syllable words actually forces the author into some awkward and hard to follow constructions.

Crusoe 1

A modern attempt, a retelling for beginning readers of English, makes for easier reading: but a vocabulary of 600 words imposes again some challenging constraints upon the author (Defoe / Taylor 2000).

(Unfortunately I cannot at present graphically present the original and all three of these précis's.) But you will have noticed that they do in fact cluster around certain episodes, and in fact, words and phrases in the original. In this way, we can see the emergence of "hot spots" in the text: the areas which attract our attention (as being memorable or characteristic of the work). The hot spots are also attractors for the choice of chapter headings, running heads, and illustrations (which are other forms of abstraction and summary, and not all deriving directly from the author).

5. The Hardy Boys

I haven't time for much discussion of my Hardy Boys examples. But, calling up two texts, with the same title, in this series of adventure stories, illustrates my other limiting case. Here is a visualization of The House on the Cliff: same title, same premise, but rewritten 60 years after it first appeared.

Crusoe 2

Here, as the paragraph by paragraph display makes clear, are two works which are "the same" -- but really, they are two different texts. The 1959 rewrite (revised again in 1987) completely changed the story; not just different character names, new figures, and new plot elements, but also a reordering of activities and changed motivations (Dixon 1927, 1959).

Discussion in the popular press has emphasized the supposed "dumbing down" of the original series, written by a Canadian, Leslie MacFarlane, under the pen name of Franklin W. Dixon.  But a comparison like this can also expose other strategies at work, including both a modern avoidance of certain kinds of cultural stereotypes, and a wilful wallowing in other kinds of stereotypes.

These two texts, visualized here as "poles apart" throughout their length, are "the same" text only in special bibliographic and cultural senses.

6. Conclusion

I hope that this approach, and possibly this textual tool, may be useful as an aide to exploring the nature of electronic texts and the evidence they make available to us. They can further our interest in and deepen our appreciation of the tractable, but also splendidly unpredictable commodity at our disposal when literary works are "re-formed" into electronic texts.

Bibliography and Works Cited

Asimov, I. (1953, Feb). "The Monkey's Finger". Startling Stories. pp. 77-83.

Borel, É. (1965). Elements of the Theory of Probability. Translation of Élémentes de la Théorie des Probabilités, 1950. Translated by John Freund. Prentice Hall.

Borel, É. (1913). "Mécanique Statistique et Irréversibilité". J. Phys. 5e série, vol 3. pp 189-196. [Reference given without specific page number on "Parable of the Monkeys" webpage, http://www.angelfire.com/in/hypnosonic/Parable_of_the_Monkeys.html. Accessed: 2005 Dec 13.

Borges, J. L. (n.d.). "The Library of Babel". English translation. Available online at: http://jubal.westnet.com/hyperdiscordia/library_of_babel.html. Accessed: 2005 Mar 28.

Dante. (1939). Paradiso. Translated by John D. Sinclair. Oxford University Press.  

Defoe, D. (1972). Robinson Crusoe. Oxford World Classics. Oxford University Press.

Defoe, D. (2000). Robinson Crusoe. Retold by Nancy Taylor. Penguin Readers Level 2. Pearson Educational. 

Defoe, D. (1882). Robinson Crusoe in words of one syllable. Retold by Mary Godolphin [pseudonym]. McLoughlin Bros..

Dickens, C. (1843). A Christmas Carol. Chapman & Hall.

Dixon, F. W. (pseudonym) (1987).   The House on the Cliff. Grosset & Dunlap.

Eddington, A. (1928). The Nature of the Physical World. The Gifford Lectures for 1927. Cambridge University Press.

Hayes, B. (1983). "Computer Recreations: A Progress Report on the Fine Art of Turning Literature into Drivel." Scientific American 249.3. pp 18-28.

Infinite Monkey Theorem. (n.d.) Available on-line at:  http://www.absoluteastronomy.com/ encyclopedia/I/In/Infinite_monkey_theorem.htm

Lasswitz, K. (1958). "The Universal Library." (Ley, Willy, translator) in Fantasia Mathematica; Being a Set of Stories, Together With a Group of Oddments and Diversions, All Drawn From the Universe of Mathematics. Clifton Fadiman, editor and compiler. Simon and Schuster. pp 237-43.

Magill, F. N. (1952). Masterpieces of World Literature in Digest Form. [Originally published as Masterplots.] Harper & Brothers.

The Monkey Shakespeare Simulator. (n.d.) Available on-line at: user.tninet.se/~ecf599g/aardasnails/java/Monkey/webpages/

Newhart, B. (1960). "An Infinite Number of Monkeys."  The Button Down Mind Strikes Back. Bob Newhart Vol. 2. WEP 6116. Warner Bros.

Peck, H.T. (1898). "Sibyl", Harper's Dictionary of Classical Antiquities.

Poundstone, W. (1985).   The Recursive Universe. William Morrow & Co.

Shakespeare, W. (1936). Titus Andronicus. In The Complete Works of William Shakespeare. Garden City Books.

Shillingsburg, P. L. (1993). "Polymorphic, Polysemic, Protean, Reliable, Electronic Texts." Palimpsest: Editorial Theory in the Humanities. Editors George Bornstein and Ralph G. Williams.  University of Michigan Press.

Swift, J. (1960). Gulliver's Travels. In Gulliver's Travels and Other Writings. Edited Louis A. Landa. Houghton Mifflin.

Tillotson. (1719). Maxims and Discourses Moral and Devine: Taken from the Works of Arch-Bishop Tillotson, and Methodiz'd and Connected, College Pamphlets 5, London. pp 10-11.

Virgil. (1909). Aeneid. Harvard Classics. P F Collier & Son, 1909.  

Vonnegut, K., Jr. (1958). "Epicac" in Welcome to the Monkey House. Delacourt. pp 268-75.