The Potential Role Of Open Data In Mitigating The COVID-19 Pandemic: Challenges And Opportunities
The scale and diffuse impact of the global 2019 novel coronavirus (COVID-19) pandemic is unprecedented in our lifetime. As of October 23, 2020, less than a year into the pandemic, there have been more than 41.7 million cases and more than one million deaths globally. Without a vaccine or widespread treatment access, the primary population-focused COVID-19 mitigation strategies are behavioral interventions such as restricting population mobility and encouraging good hygiene such as wearing facial coverings and washing hands frequently.
There is one tool for the COVID-19 response that was not as robust in past pandemics: open data. For about 15 years, a “quiet open data revolution” has led to the widespread availability of governmental data that are publicly accessible, available in multiple formats, free of charge, and with unlimited use and distribution rights. The underlying logic of open data’s value is that diverse users including researchers, practitioners, journalists, application developers, entrepreneurs, and other stakeholders will synthesize the data in novel ways to develop new insights and applications. Specific products have included providing the public with information about their providers and health care facilities, spotlighting issues such as high variation in the cost of medical procedures between facilities, and integrating food safety inspection reports into Yelp to help the public make informed decisions about where to dine. It is believed that these activities will in turn empower health care consumers and improve population health.
Here, we describe several use cases whereby open data have already been used globally in the COVID-19 response. We highlight major challenges to using these data and provide recommendations on how to foster a robust open data ecosystem to ensure that open data can be leveraged in both this pandemic and future public health emergencies.
Use Cases Of Open Data In The Global COVID-19 Response
One set of open data use cases is the curation of data from diverse sources for better visualization of the scope of the pandemic and information exchange. Researchers at Johns Hopkins University synthesized publicly available data from across the world into COVID-19 data dashboards displaying domestic and global trends in cumulative incidence, recoveries, and deaths; as well as supplemental interactive visualizations on topics such as flattening the curve and the availability of state-specific data on COVID-19 data by race. The New York Times has highly interactive domestic and global narrative data visualizations on the location of hotspots, local trajectories, deaths, and other outcomes. These narrative visualizations end with layperson summaries of current knowledge about the virus and how readers can reduce their risk. In Italy, the Io Conto civic platform allows directors and employees at public hospitals to report positive cases and other pandemic-related data, which has led to the provision of municipal-level data that were previously not accessible. These examples illustrate how non-governmental users including academics, journalists, and the civic tech community have creatively leveraged these open data to facilitate our understanding of the pandemic, communicate risk to the public, promote quick analysis by researchers, and enhance data quality efforts. The role of “open solutions” to facilitate research and information on the virus has been highlighted by both the United Nations Educational, Scientific, and Cultural Organization (UNESCO) and the Organization for Economic Co-operation and Development (OECD).
A second set of use cases has been the creation of mobile applications to empower consumers to make data-informed decisions on how to adjust their retail behaviors to reduce their personal risk. Early in the pandemic, South Korea disclosed information on places where infected persons visited, including the geocoded locations of retail shops and religious facilities. Using government data, private-sector software developers created popular mobile applications such as Corona 100m and Corona Map that send push notifications to users about the location of newly infected cases and their recent movements; for example, Corona 100m alerts users when they are within 100 meters of a location where an infected person has recently visited. Many other countries are following suit by implementing or developing their own “Corona apps.” The Taiwanese government’s publicly available real-time data on face mask availability have been used by developers to create applications that allow users to know where masks are in supply and peak shopping times, with goals of reducing anxiety about mask shortages and limiting crowding in stores.
Challenges To Using Open Data To Improve COVID-19 Population Health Outcomes
In normal circumstances, government open data activities to de-identify, reformat, publish, and promote the use of existing government data require considerable financial and staff resources and expertise, and these financial investments must be ongoing to ensure sustainability. Some have argued that government-led initiatives to encourage the development of mobile COVID-19 tracking applications are expensive, with unclear return on investment. These are considerable challenges due to limited resources stemming from chronic under-investment in public health.
A second challenge is data quality, timeliness, completeness, and availability. There has been under-reporting of cases, particularly where there is limited test availability. Early research estimated that in the months of March and April 2020, deaths that were officially reported as related to COVID-19 captured only two-thirds of the excess deaths. There are biases in reporting, with racial and ethnic minorities less likely to be tested in the US, and lower testing rates in Italy among undocumented persons from Africa and the Middle East. Without routine random testing, there may be artifacts such as large spikes after initiation of community testing initiatives and other targeted testing programs. There is variation in how mortality data are collected and defined; for example, England’s updated COVID-19 mortality classification is estimated to have lowered the number of deaths through mid-August by 12.8 percent. Different European Union countries are still using different methodologies for measuring the death toll. Evaluating excess mortality can circumvent data quality issues of reliable COVID-19 death classifications, but there is a lag time for mortality data to become available and provisional data may be incomplete.
For countries such as Korea and Singapore that disclosed granular data, additional challenges have been privacy concerns and increased social stigma, which can in turn discourage community members from being tested. There is a serious risk of reidentification, with some academic researchers and human rights activists raising privacy and civil liberty concerns about the amount of detailed information released and how it might be used by governments (for example, in South Korea, information includes age, gender, time, and name of businesses frequented including whether a toilet was used and a mask was worn). There are already examples of these data leading to individual damages such as a suspension of Uber accounts in Mexico to a couple drivers who gave a ride to an infected patient and their recent passengers; internet mobs in South Korea who have used data to re-identify individuals and harass them; and the public outing of an Australian doctor by the health minister. Beyond individual damages, the ability to re-identify individuals has increased stigma to minority groups. In South Korea, knowledge that cases have been linked to the Shincheonji Church of Jesus has led to discrimination against church members. Similarly, the outbreak of cases following a customer at a LGBTQ night club who became a “super spreader” led to stigma against the LGBTQ community.
Finally, as with any publicly available data and published statistical reports, it is important to provide sufficient context and metadata (information about the data, such as a codebook, data collection methods, coverage, limitations, and so forth) to ensure correct interpretations of the data by end users. For example, one commonly cited data specification guide is the Dublin Core Metadata Standards. Similar to the anti-vaxx movement, there have been conspiracy theories that have downplayed the death toll and severity of the pandemic including a recent Twitter repost from President Donald Trump with an incorrect interpretation of a Centers for Disease Control and Prevention report on the death toll from COVID-19. Although this was not specifically linked to open data, it reinforces the need to ensure that data users have complete information about the data and ongoing monitoring of online conspiracy theories.
Recommendations For Enhancing The Continued Potential Of Open Data For Pandemic Response
Advocate For Governmental Transparency Including Release Of Open Data
First and foremost, it is critical to continue to advocate for government transparency. The White House’s recent mandate that US hospitals send data on COVID-19 hospitalizations and equipment to a new federal database developed by a private contractor at the Department of Health and Human Services, rather than to the existing National Healthcare Safety Network run by the Centers for Disease Control and Prevention, has led to concerns about the transparency of the new data collection process and the potential impact on data availability. Data transparency has also been a concern at the US state level.
In Brazil, President Jair Bolsonaro gave the order to remove mortality data from the government website, and then was forced by the Supreme Court to resume the daily publication. In Mexico, citizens not only mistrust government data but also believe that lack of accuracy is part of a political strategy.
Continue To Invest In Open Data Infrastructure
Beyond a culture of government transparency, continued investments in open data infrastructure are needed. Successful open data efforts require developing leadership, governance procedures such as open data handbooks and processes to ensure that de-identification standards have been met, ensuring the development and publication of comprehensive metadata, committing resources for start-up and ongoing maintenance, and making the publication of open data a routine government function. This can be particularly challenging for public health agencies to prioritize during the current pandemic, given chronic underfunding and workforce shortages; pressures for public health officials to resign; and unprecedented demand on staff to conduct surveillance, contact tracing, and other activities related to the pandemic.
Enhance The Quality Of Open Health Data
More broadly, the pandemic can be an opportunity to renew attention on enhancing the quality, timeliness, and completeness of government-produced health data, by strengthening existing information management practices. In the European Union, interoperability guidelines were published for mobile contact tracing applications so that residents do not need multiple apps to report positive tests or receive alerts. In the US, national guidelines were developed to certify COVID-19 deaths to promote uniform reporting. However, there are persistent gaps in data quality and standardization across mortality data from local medical examiners and coroners across the US. Chronic challenges persist in data collection, curation, and publication; public agencies lack technical skills, data infrastructure, efficient information sharing and integration, and effective release of data in open and machine-readable formats. For example, sometimes governments publish COVID-19 dashboards, figures, and analysis without allowing users to access the underlying data, thus preventing them from scrutinizing the data, epidemiological models, and other behavioral predictions used for decision making.
Ensure Data Privacy
Another priority area of data infrastructure is developing techniques and rules for data de-identification: While there have been many developments in de-identification techniques, there remain critical gaps in tools, standards, and assessment techniques. An additional complexity of managing re-identification risk is that the balance between privacy and transparency differs across countries and needs to be tailored to the local context.
Cultivate The Open Data Ecosystem
Finally, ensuring open data can achieve its full potential requires cultivating an open data ecosystem to engage diverse data users to leverage data for new purposes. Publishing freely accessible data on the web is insufficient; proactive policies are necessary to stimulate a mutually interdependent set of actors with different roles and functions. In our examples, journalists, academics, and software developers were able to use open data on COVID-19 creatively to generate value. As the pandemic continues to expand, enhanced access to high-quality publicly available data has the potential to generate additional solutions to help mitigate the COVID-19 epidemic and future health risks.
A Call To Action
The continued release and maintenance of high-quality open data is challenging, particularly during an unprecedented public health emergency when chronically underfunded public health agencies are addressing other acute needs in their communities. However, rapidly evolving knowledge about the pandemic and the politicized nature of COVID-19 highlight the need to prioritize government transparency that includes the release of open data. Investment in the ongoing development and release of open health data about the pandemic has the potential to generate new solutions from diverse users, start to rebuild public trust in government, and strengthen open data efforts in the long term.
*** This article has been archived for your research. The original version from Health Affairs can be found
***