Text Analysis on Antiquarian Sources

Over the last year, building on the excellent training from the Oxford Digital Humanities Summer School on data preparation, I have been working up a conference paper given a few years back for the Monumental Brass Society. A niche interest for historians, archaeologists and art historians, that is true, but I realised that the methods of extracting and structuring data from digitized publications from the nineteenth century provided an opportunity to get a sense of how frequently these particular objects (monumental brasses) featured in a local journal over time, namely Norfolk Archaeology.

Why this exercise might be helpful for those in historical disciplines is because it affords the comparison of the frequency of key terms across several volumes (in this case 12, spanning about fifty years) with the occasional indexes that were issued by societies. An index can be useful when getting a sense of a topic in a publication, but rarely does it list every reference to an object, person or place. The indexer has to be selective, filtering out the ‘noise’ to allow the meaningful ‘signal’ that can be found in the text for that particular readership.

The results of my work on OCR texts can be found on GitHub. They were obtained relatively quickly using the raw text files from www.archive.org, then fed through Voyant Tools to using the term ‘brass*’, and then some manually checking for false positives (e.g. when used adjectivally for another object, such as a ‘brass lectern’).

Subscribers lists in antiquarian publications also provide a rich seam to mine for people. The work on this antiquarian community provides some preliminary data for further work on scholarly networks. You can read about this in the ‘antiquarian networks’ project folder. If you are a postgraduate research student and you are interested in these techniques, do feel free to get in touch.