The Importance of Understanding Data

Researchers in the arts and humanities have spent a lot of time recording cultural phenomena, as well as writing about them. Historical research in particular has often been portrayed as a painstaking process of reading, transcribing, editing and publishing. Indeed, many of the records series beloved of today’s researchers are still in print form, and remain essential because they been indexed (an equally painstaking process) by professionals within the discipline.

Many postgraduate researchers who embark upon their own projects have ambitions to acquire information, but what they are doing - in essence - is creating data. Every word transcribed that forms part of a digital text can be recorded, and potentially used again. However - and this is the really key point - these data (names, dates, places, things) are essentially only human readable, which is why so few humanities researchers would recognise that they are collecting or creating data at all. To us, as humans, it is just information.

Understanding what constitutes data is important, not because it makes a research project in the humanities sound more ‘scientific’, but because of what others can do with your research when you have finished. A folder full of 500 photographs of hitherto unpublished manuscripts, carefully transcribed, is useful to the extent that another human wishes to examine, one by one, those 500 photographs and transcriptions, but 500 photographs with metadata that provides the dimensions of the image, its creator, and references to the collections and institution the original document came from is suddenly a lot more useful (how many of these photographs were taken from a single collection, for example?).

Transcribed text that is marked up effectively so that the structure of the original document can be discerned by both humans and machines facilitates search results. Likewise mark up of entities, such as personal names or place names, might allow patterns to emerge across texts that would be largely impossible to see through a manual analysis.

As I have begun to learn more making research data available and usable, and of documenting the processes that create that data, the more important I believe it is that future postgraduate researchers are able to explore and critique underlying datasets, not simply critically evaluate the interpretive methods and arguments in humanities research. It is for this reason that I am working on new training sessions that will introduce these concepts and practices for improving the creation and sustainability of research data.