Not waving but drowning. Reflections on data curation in the age of information overload

‘As of October 2019 there were over 300,000 clinical studies registered worldwide, compared to just over 2,100 in the year 2000’

It’s common to hear colleagues and friends bemoan the huge number of emails, texts and other messages with which we are bombarded on a daily basis. And if you go away for a week’s holiday, the state of your inbox when you return is likely to induce panic. The alternative – to check your phone while you’re away – is compelling, but then what’s the point of getting away from it all? Adding in all the other information that comes our way over the course of a 24-hour period, is it any surprise that terms like ‘information anxiety’, ‘infobesity’, and even ‘infoxication’ are becoming more commonplace?

According to statista, approximately 281 billion emails were sent and received worldwide every day in 2018, and this figure is projected to increase to over 347 billion by 2023.1 Being out of the office is no refuge as the trend is towards mobile usage: in December 2018, 43% of emails were opened via mobile, with webmail accounting for 39%, and old-fashioned desktop clients like Outlook 18%. No doubt the popularity of mobile instant messaging among the younger generation is driving similar growth there.

What about the creation and consumption of data within science? According to a study by Bornmann and Ruediger published in 2014, global scientific academic output is growing at a rate of 8–9% each year, equating to a doubling roughly every 9 years.2

Clinical studies

In healthcare, we have seen a huge rise in the number of clinical studies registered. Again, according to statista, as of October 2019 there were over 300,000 clinical studies registered worldwide, compared to just over 2,100 in the year 2000.3 No doubt some of this can be explained by tightening of regulatory requirements, but even so, the number of trials ongoing and planned by pharma is dizzying.

Numbers for the growth of scientific meetings are more difficult to come by, but in this author’s experience there are (or were, pre-Covid) many more opportunities to attend scientific get-togethers than there were a decade ago. The specificity of conference subject matter has also become more and more narrow, not to say niched. The growth in rare disease congresses is testament to this. All these meetings need content, that is, data to be shared as abstracts, presentations and posters that the authors hope will finally make it into the literature. No surprise then that an estimated 1 million papers pour into the PubMed database each year.

Keeping up to speed

So where does this leave members of the medical profession, and by extension, their servants within pharma and med comms? Who has the time to read and digest even a fraction of the terabytes of data that are new-minted every year? Even within a specialty or sub-specialty the numbers seem overwhelming. Indeed, Fraser and Dunstan, writing back in 2010, found that even in a narrow sub-specialty such as cardiac imaging, trainees reading 40 papers a day five days a week would take over 11 years to bring themselves up to speed.4 And in the meantime a further 82,000 relevant papers would have been published, requiring another 8 years!

Various authors have suggested a multiplicity of coping strategies over the years. An obvious one is journal clubs, where members bring what they think are the most important papers to the table. Critical reviews, such as those published by the Cochrane Collaboration, or even blogs, social media and pharma news websites such as PMLiVE, can help point the intrepid reader in the right direction. And of course, the pre-COVID face-to-face conference was a key gateway to greater knowledge.

Identify relevant data

And what of our industry clients who face the same pressures as their HCP counterparts? Here  med comms agencies have an important role to play, providing customers with services to help them safely navigate this hulking maze of information. Many agencies offer such services.

Porterhouse_Intelligence_HubPorterhouse’s own Intelligence Hub is a tailored surveillance programme that seeks to identify relevant data and turn it into concise, visual summaries that deliver value (not volume), alongside key, actionable insights.

Perhaps the current pandemic will stunt the growth of the literature for the next few months or years. It has certainly slowed down recruitment into clinical trials, which might eventually have a downstream effect. Nevertheless, it would be safe to assume that it will be a temporary hiatus. With the increasing popularity of virtual meetings and the endless opportunities they offer, coping strategies, data curation, and perhaps artificial intelligence will be needed to help us determine what data is important to us, and what is not.



  1. Last accessed September 2020.
  2. Bornmann L, Ruediger M. Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. arXiv:1402.4578 [cs.DL]
  3. Last accessed September 2020
  4. Fraser AG, Dustan FD. On the impossibility of being an expert. BMJ 2010; 341: c6815.

Author: Brian Parsons, Co-founder and Joint Managing Director, Porterhouse Medical Group