During an infectious disease outbreak, data can show how the disease is spreading and who is most at risk. Previous outbreaks, such as the 2014-2016 Ebola outbreak in West Africa, have highlighted the need for developing frameworks to improve the speed and quality of available data to develop diagnostics, treatments and inform control measures. As COVID-19 has spread globally there has been a substantial increase in the sharing of COVID-related “open data”; that is available to everyone to access, use and share.
We have seen some really successful examples of data sharing and use to support the response for COVID-19. For example, a crucial first step in identifying the pathogen and designing a diagnostic test, is the sequencing of the viral genome. Longer term, it tells us more about strain variation. For COVID-19, the first genome of the virus SARS CoV-2 was sequenced by Chinese researchers and shared through the Global Initiative on Sharing All Influenza Data (GISAID) on January 10th, 2020. As of June 2020, genomic data from 2,729 COVID-19 samples has been uploaded to GISAID.
We have also seen sharing of statistics such as the number of cases, deaths and tests done, through regular televised briefings, and/or social media. These data have been compiled and made available in usable formats by many research groups; one example is the dataset curated by Our World In Data which is compiled by the European Centre for Disease Control.
We have also seen novel uses of data generated from smartphones. Both Google and Apple have released anonymised location data, illustrating dramatic decreases in activity as countries implemented lockdowns. These data sources have been used to build complex models evaluating the effectiveness of measures, and to compare economic impacts.
All these data are undoubtedly useful, but is more work needed to ensure that we know how data are collected, and most appropriately used?
For researchers, for example, if you are using Google Mobility data to assess whether an intervention has been effective, is this a representative sample of the population or are you introducing bias into your analysis? For researchers and society, have the users consented to their data being used in this way? Are there any data which might identify the person? How can we protect people’s anonymity?
Please do share your thoughts on this, and have a look at these additional resources discussing this topic.