Extracting Relevant Journal Articles from E-mail Alerts

About three years ago I set up a number of e-mail alert subscriptions to Taylor and Francis online. These were for a range of titles in education, and I was interested in new publications on doctoral education. Chances are you do the same, with your own criteria.

For a while, this strategy worked quite well. When I was working through my Masters in Higher Education Practice, I set aside time each week to go through the alerts (normally numbering >30 e-mail messages), which were automatically sent to a specific folder in Microsoft Outlook. All was well.

Until recently. I scrolled down through my folders, and thought ‘I really must check the journal alerts. I’ll bet there are some things in there to follow up on.’ There was. Over 550 messages(!) To give you an idea of how long this would take to wade through, it used to take me about 10-15 minutes a week to work through the alerts manually as there were often false positives and some cross-posting alerts. Scale that up and we’re talking four and a half hours of trawling through e-mail alerts. There are ways of speeding this up, of course, using Outlook’s search and filter options, and working through each message in turn. The problem with this was that I couldn’t easily export the titles and URLs from the messages into a single file.

So, I decided to set myself a challenge using Python, which is a pretty intuitive and very flexible programming language. There are various ways of extracting text direct from Outlook messages (.msg format), so if you are more confident with Python, then check out Matt Walker’s msg-extractor on GitHub. If you are pretty much a novice (like me) then some scripts to read and write plain text, which I learnt thanks to the wonderful David King at the Open University in Milton Keynes, might be of use. All errors are my own. Comments welcome.

A few words of warning. These scripts work fairly well on alert messages from Taylor and Francis, and they were customised for that format of e-mail message. If you are using a different alert service, you may need to think about where the relevant terms are located, and whether or not you are likely to capture the URLs if they are on a different line.