Mapping disease with big data

By Gagandeep Kang

Outbreaks of infectious diseases threaten lives, our health and the economy. In a world where people make multiple contacts at work, home or in leisure activities, live in close-built environments that are changing global ecology, and where travel is frequent, outbreaks arise and spread rapidly. Recent multi-country viral outbreaks such as Ebola and Severe Acute Respiratory Syndrome (SARS) claimed thousands of lives and cost billions of dollars.

When diseases are predictable, theoretically, health systems can be designed to manage them. For example, if hospitals know the seasonality of influenza, pneumonia or diarrhoea, they can plan for the surge in admissions, ensuring that beds and staff are available when needed. In India, this does not happen because hospital bed occupancy is high; but planning ensures supplies and drugs.

But in a public health-care system that is already stretched, when new diseases emerge recognition and response are slow and frequently inadequate in the early stages of the outbreak. Once the outbreak has spread and is more widely recognised (especially if there is political pressure), all available resources are brought to bear on the outbreak. This results in a gross disruption of services available for routine health care, resulting in unrecognised damage that can impact the structure of the system and delivery of health care well beyond the outbreak.

In a country with 1.3 billion people, with a marked inequity in health care, dense urban populations, multiple contact with domestic and wild animals, frequent internal migration, a large diaspora, international air links and a warm climate, we are uniquely positioned to be most at threat from indigenous and imported infectious diseases.

Hits and misses

Recognising the importance of surveillance, the Ministry of Health and Family Welfare set up an Integrated Disease Surveillance Programme (IDSP) at the National Centre for Disease Control that uses district and State-level systems to report weekly on outbreaks of disease across India. Efforts have been invested in building the system and trying to increase its capacity to generate actionable data. But even years after initiation, the bulk of surveillance reports continue to be syndromic, with less than a third of outbreaks laboratory confirmed. Although 40 to 50 outbreaks are reported each week, the most common outbreak reports are of diarrhoeal disease and food poisoning. Media scanning and analysis are a part of the tracking system, but there are lacunae because the IDSP did not report cases of chikungunya in September 2016 in Delhi even when reports filled the columns of national newspapers.

A different parallel system for infectious disease surveillance has come from the National Polio Surveillance Project (NPSP) which built a reporting system for 40,000 reporting points. It consists of individual health-care providers, facilities and organisations for cases of limb weakness or paralysis to identify suspected cases of polio. After polio eradication in India, the NPSP has shifted its focus to other infectious diseases and tracks outbreaks of vaccine preventable diseases such as measles, rubella and diphtheria. However, between the IDSP and the NPSP, the differences in focus and in methods even in diseases that overlap between the two programmes lead to discrepant data which can be challenging to address.

Using data

This is not a problem unique to India. Recognition of outbreaks and their investigation require investment in structures and systems where their utilisation is unpredictable, and therefore unappealing to policy makers. There has been a great interest in using data that is gathered incidentally to recognise outbreaks early or predict the potential for outbreaks. For example, Google Flu Trends launched in 2008, aggregated Google search queries to provide estimates of influenza trends in 25 countries. When it was initially compared to the tracking systems at the U.S.’s Centers for Disease Control and Prevention (CDC), it predicted the influenza season two weeks before CDC. However, it failed to predict the 2013 flu season and was halted in 2015. Analysis showed how big data analytics requires caution and a changing of algorithms over time. Nonetheless, the use of collated data sources for insights for public good is a key challenge that needs to be overcome to build rapidly responsive collaborations between industry, governments and academia, that share data while protecting individual privacy and respecting autonomy.

The velocity, variety and volume of big data defined by time and location are a resource that is currently under utilised in India because we have not yet built the systems for collaborations for the analysis and the inference we need for public health. In fact, chikungunya in Delhi was picked up by PromedMail and HealthMap, an online aggregation system. The power of social networks and other digital sources to generate signals for dynamic disease mapping and their control is an untapped opportunity not only for identifying outbreaks quickly but also, potentially for rapid disease mitigation. In 2015, the response to the floods in Chennai was an example of people coming together to use social media to provide on ground support on a finely resolved spatial scale. Planning for the use of such networks could and should be a key strategy in our disaster preparedness plans whether it is for infectious disease outbreaks or other emergency situations.

Gagandeep Kang is Executive Director, Translational Health Science and Technology Institute, Faridabad, Haryana.

The article was originally published in The Hindu on December 25, 2016, and has been reproduced with permission from the author.