Digital Humanities at Oxford Summer School

Today’s challenge was definitely one of bandwidth as there was a lot to take in! This morning involved working with both GitHub and Python. GitHub is something that I am planning to use, mostly for teaching materials in the digital humanities. Some of the feedback that we had from the Arts and Humanities in the Digital Age programme hinted at the need to understand the ways that researchers log versions of their projects, and provide others with access to the code and data that they are using. We managed to test out making changes to files, and therefore generating the version metadata, both online and through GitHub’s desktop application.

Python was a real treat because so many digital humanities researchers are finding that they need to deal with the programming language at some stage. The learning curve is steep, as Elizabeth Wickes noted. The analogy used today was the ‘bus tour’ of Python, so we were along for the ride - not in the driving seat. This was quite important to remember as it is very easy to get frustrated by a lack of vocabulary with which to take command of the data we want to work with. However, there were a few useful principles I have take away from the experience:

Python can be a useful way of managing large datasets that might be difficult to open and analyse. The readline() method, which can really help when working through a large file.
The concept of variables - the ability of a certain terms to be used to store values and represent a data type.
The concept of ‘slicing’ using square brackets and the start-stop-step approach, dependent on inclusive and exclusive positions, e.g. item 7 will be at position 7, but if I wanted to include item 16, I would need to set the stop position to item 17.

We also covered the role of loops, which you might want to execute once you have the items in a collection or list, and you can act on those items one at a time. For instance, displaying a list of the sender and recipient of each identifiable letter in a collection of letters and the reference number of each letter or folder that the letter is found in.

In my free writing session today I thought about how Python might exist in a workflow. I am a bit torn at the moment on a possible dataset for tomorrow’s session. This would involve downloading the .txt files of the full text of several 19th-century antiquarian texts and examining the subscribers’ lists to see if there are any particular patterns in the names and locations of individuals who subscribed to these publications. The big issue is that the data is far from clean - the text that has been transcribed automatically via optical character recognition (OCR) has a lot of errors, which clearly need a lot of work. Do I work first on the structure of the raw text file in Python, and then start to do some work on correcting errors in Open Refine? Would there be a value in then exporting the data into SQLite Studio, and working with different, but related tables of data? All food for thought tonight.

This afternoon’s session was led by David Tomkins, Curator of Digital Research Data from the Bodleian Library. We went through the digital repository process with an example form from Oxford’s Research Archive. This was a helpful activity in thinking through the minimum information that institutions expect to receive from researchers to ensure that the research data is deposited effectively. There were a lot of questions about the rights of access and permission to provide and allow access to certain data. In my line of work, one of the biggest issues is that of image permissions. If a researcher performs image analysis, and therefore can provide a numerical dataset, what value is this without, say, the thumbnails of the images themselves? A lot of these dependencies need to be thought through quite carefully.

This afternoon’s lecture was fascinating. A neuroscience project into ‘change blindness’ led by Professor Chrystalina Antoniades. This involved an examination of museum artefacts viewed by participants in real space and on screens. Interestingly, aside from some noticeable differences in noticing more changes in colour on screen, and more changes in design through objects in real space, overall there was little significant difference. This has implications for the teaching of art history, I think, in that much undergraduate teaching is still performed in the classroom, and independent now likely through digital images. I had long expected that this would mean that students would not be able to notice where there were subtle differences between objects: ‘You need to see the real thing!’ I would often say to people - perhaps this is not the case.

Some more work is needed here about the viewing distance (which seems to have an effect in the way our brains interpret a three-dimensional space), and possibly the influence of subject matter and how this interferes with the participant’s sensitivity to changes in their visual field, but nevertheless a really helpful study.