Imaginary Text

Text Processing Techniques & Traditions

JMaxFebruary 14, 2024 DH, history, typography
The Lindisfarne Gospels
The Lindisfarne Matthew. It says ‘Liber,’ but it’s more than that: a good starting point for this course.

This coming June, at the Digital Humanities Summer Institute (DHSI), I’ll be teaching Text Processing Techniques and Traditions. I taught this course at DHSI from 2015 to 2019, until the pandemic interrupted DHSI, and I had been meaning, since 2019, to re-write the course and offer an updated, reconsidered version of it – I’m pleased to be offering that this spring!

My approach in tpt&t is to zoom in on the continuities between print culture and modern computing, rather than seeing these as distinct lineages separated by a conceptual break. Text processing, which I argue begins with the advent of print in the 15th century, is the lens for this treatment, and typesetting especially is a thread that holds through the whole history. Typesetting, which has aesthetic roots in scribal practice going back almost to classical times, runs straight into the history of computing – in the early days of the 1970s and still today – and underpins a whole bunch of the ways in which we conceive of text in various media: what text is, how we create and develop it, how we manipulate it, and how we receive it. It is a useful figure for talking about computing cultures in a humanist context. The things humanists have been doing with text since the Renaissance (or earlier) are not so different from the way humanists treat text today, as they tag it, mine it, build models on it, and of course, prepare it for circulation.

A theoretical motif in this course, which I attempted to articulate in a book chapter bearing the same name as my DHSI course,[1] is the textual imaginary: the idea that in order to do anything meaningful with text – in the sense of processing, converting, mining, presentation – we need a mental model of the text and its structure. But there are many ways of conceiving of the structure of a text. ‘Lines of text set on standard-sized pages’ has been a model that’s lasted from scribal culture through right through to desktop publishing. ‘Texts as sequences of characters,’ indeed delimited by logical ‘lines’ underpin the Unix paradigm in computing, and are bedrock to our idea of texts as files. XML (and SGML before it) posits that texts are “ordered hierarchies of content objects,” the so-called OHCO thesis.[2] All of these things are – or at least can be – true simultaneously, and I make the case that fluency with text processing is the ability to see these different facets of textuality at the same time or at least to switch between them as needed. Armed with some good software tools, doing so is easy; as a practical outcome, that’s what my course is about.

On a more abstract level, though, it’s about cultivating an appreciation of our inherited (or ‘received’) conceptual models about what text is and thus what we might be able to do with it. Over the decades, layers and layers of tools have built up around a small set of fundamentals, and as time goes on, we can wind up focusing on the instrumental use of top-level tools, taking the more fundamental layers of these ideas as given or unchangeable. And that’s where I hope my telling the histories of these ideas and practices – going back to medieval scribal culture, through the print age, and directly into computing cultures – helps provide some handles on what’s going on inside the software.

See the DHSI website for more info on courses, registration, and whatnot.


  1. Maxwell, J. W. (2020) “Text Processing Techniques & Traditions: Why the History of Computing Matters to DH.” In Doing Digital Humanities 2: A Companion Volume. ed by by Richard J. Lane, Raymond Siemens, and Constance Crompton. Routledge. ↩︎

  2. See DeRose, Steven J. et al. (1997) “What Is Text, Really?” ACM SIGDOC Asterisk Journal of Computer Documentation 21.3: 1–24. https://link.springer.com/article/10.1007/BF02941632 ↩︎