Another fantastic day a DHOXSS 2018. This morning’s session with Elizabeth Wickes stressed the importance of minimum viable curation - the idea of accessing data with the constraints of a real-world situation. In this case something that actually related very well to my line of work: the presentation of a resource by a supervisor, from which the research student would need to examine and extract relevant data. In this case it was a far from ideal static webpage. The digging into the page’s source code in Chrome, and then the identification within the HTML of encoded data relating to a collection of letters lead us to use a Base64 decoder, to produce a more human-readable text, which could then be better rendered into HTML with some work on the sanitization to allow the page to display as fully human readable. All in all, it demonstrates how, with a bit of digging, humanities researchers can extract quite a lot from what appears on the surface as very little.

After the break, we had session with Neil Jefferies of the Bodleian Libraries. We looked at contextual data modelling, which entailed first of all examining knowledge models - how we go about making sense of the world, and accommodate uncertainty (something that the world of the humanities does quite a lot, but computer science does not do so well). We reviewed different data models, some of which we saw yesterday, including tabulation, trees and graphs. One of our key tasks was to try to identify the explicit and implicit entities mentioned in a very simple statement relating to a performance in another language of Shakespeare’s Romeo and Juliet. Easy, right? Not so much. Depending on our research aims, we might want to examine more or less information about the performance, the text, the author, and longer provenance of the work.

A second task also looked at this a particular object - in this case a manuscript - and how we might move through different levels of certainty in the knowledge model, which often leads to the more static attributes of the object being fewer in number, and the more dynamic attributes more numerous, but often the things that researchers in the humanities seize on. This is often why linked data approaches are increasingly important for researchers, because they help to encode the relations between subjects and objects in a multi-dimensional way.

After lunch, Elizabeth took us through her learning strategy for acquiring technical skills. Her ‘psychology first’ approach is, I think, a very good one. Too often we believe that we can simply acquire technical skills that are akin to using a new computer, or finding our way around a new piece of software, but entering the realm of programming languages, for instance, is a fundamental shift of our linguistic outlook. What really impressed me about Elizabeth’s approach to this was that many of the ways of writing a programme resemble practices of academic writing activities that we have long internalised, and treat as second nature. Take, for example, the approach to writing a programme (right).

There is parity here, as well as in the way that code is edited and revised, so even when we think that the language is completely alien to us, there are ways of parsing each line to find out where there may be errors (e.g. grammatical, syntactical). In short, from this model of corrections in our own academic (natural) language, we can begin to deal with (if not fully understand) what it is that computer languages are doing.

On the matter of language, the final session today was a paper by Professor Janet B. Pierrehumbert on social dimensions of the lexicon. Although I am not a specialist in computational linguistics, it was very helpful to see the different methods, analyses and results of reaction time experiments for recognition of phonetic variants, such as in dialects, and also around morphology. One of the fascinating conclusions of her experiments has been around the strong influence of gender as a cue in the way that word forms are generated, followed closely by ethnicity and age. Visual cues (in this case, whether an interlocutor is facing forward or in profile) had very little effect at all. As someone who is interested in the visual, I found this very intriguing.