Moneycontrol PRO
Upcoming Event:Attend Quants League - 5 Days Live Algorithmic Options Trading Virtual Conference @ just Rs. 600/- brought to you by Moneycontrol Pro. Register Now!
you are here: HomeNewsOpinion

Data, data, everywhere, but where’s the stuff that matters?

Effective data — and not personal data — can adequately inform policy making, and it is only effective policies that can eventually tackle the large problems and help create a more equitable society

August 06, 2021 / 12:08 PM IST

Our lives are data. Everything we do, everywhere we go, and everything we buy is data. Data is the new oil, the new gold, our Prime Minister told the world, and India can give the world the most amount of data at the cheapest of prices. He’s right, to be sure.

Be it large tech companies such as Google and Facebook, or governments that need Aadhaar linked to everything from tax returns to health IDs, data from Indians is powering businesses like never before. But what is this data? Are we, perchance, too focussed on only one kind of data — personal data?

The kind of data that large tech companies collect is mostly in order to be able to better predict consumer behaviour. If they know what it is we are going to do, they can be there waiting with the things and services we are likely to buy. A study found that women feel least attractive after they have been crying or ill, and on Monday mornings; and so digital marketers were advised that those are the best times to sell them beauty products and clothes.

How would companies know you've been ill or crying? Chats, location information, phone calls, and social media posts tell them everything they need to know. The so-called ‘data exhaust’ of our everyday lives fuels whole industries of data brokers, tech companies, machine learning, artificial intelligence, etc. The collection, collation, and analysis of such data has prompted the framing of (thus far unimplemented) privacy laws and several court battles in India.

On the other hand is another kind of data. Consider the plight of migrant workers from 2020. After the sudden and brutal lockdown was imposed in 2020, several thousand migrant workers found themselves unable to survive in locked down cities, and began to walk homewards to the villages they came from. Many died of the heat and thirst and hunger, or in road and rail accidents. It was a national crisis of vast proportions. Yet when asked how many migrants had lost jobs or had died in the lockdown, the government had no compunctions reporting that 'no such data is maintained'.


A very important data repository in India has been the National Crime Records Bureau (NCRB), which collects and collates data about various crimes from every district in the country. It contains year-by-year data on everything from murders and robberies to traffic-related prosecutions, crimes against women, etc. The low importance placed on data such as that collected by the NCRB was made evident when the government decided to merge the NCRB with the Bureau of Police Research and Development (BPRD) in 2017. It was suggested at the time that it would improve the efficiency of the NCRB, but it is important to remember that the BPRD and the NCRB had very different mandates. While that decision has since been rolled back, what hasn’t changed is how governments undermine good data collection practices.

Take for instance, data related to farmer suicides, one of the important sets of data that this repository contains. It is no longer a secret that farmer suicides have been steadily increasing as has agrarian distress for the past several years across India. The number of farmer suicides in the country (counted since 1995) crossed the 300,000-mark in 2014. P Sainath has shown how data related to farmer suicides has since been massaged to make the problem look less severe than it is.

In 2014, the government introduced changes in the methodology that moved some of the suicide numbers to other columns to reduce the official number of farmer suicides. By fudging the collection of data, using tactics such as reclassifying those working in agriculture (labourer, tenant, cultivator, etc.), or not counting women as farmers, the issue of farmer suicide is sought to be minimised.

But the most glaring failure of data collection currently is in COVID-19-related data. The death figures are an example. Some estimates have suggested that the official death count by COVID-19 in India (close to 400,000) is an undercount by hundreds of thousands. As people died on the streets waiting for hospital beds and oxygen, one of the large lapses was the complete absence of recording deaths due to COVID-19.

India has also fared poorly in terms of disease surveillance, in the pandemic, in spite of the much-vaunted Aarogya Setu and other technological interventions. The ‘track test isolate’ strategy fails when the first step itself is compromised. It is this failure that has now led the virus to run riot in rural areas where healthcare services and surveillance are, for all practical purposes, broken.

Vaccination has been another area in which we failed to collect good data. According to Gagandeep Kang, one of India’s leading virologists, India administered over 330 million doses of vaccine by July, but has not collected data on how effectively they are working. ‘We are wasting information,’ she said.

Another important data and disease surveillance tool is genome sequencing, which helps establish the impact of new strains of the virus.

Turns out genome sequencing was only conducted for about 1 percent of the positive samples collected between January and March.

All of these are examples of bad data collection and surveillance practices in sectors that really require them. Collecting and analysing such data would not be in violation of individual privacy (if done right), would help formulate better national and sub-national policies, and would help prepare better for future disasters.

The catch perhaps is that this kind of data is not necessarily profitable. Personal data that can be used for behaviour prediction and advertising is highly lucrative, and there is, therefore, a rush to gather it and use it. These other kinds of data, which are vitally important for issues such as disease surveillance and reducing agrarian distress, are not immediately money-making. This myopic view, that only sees immediate monetary profit as useful gain, allows other, important issues to fall by the wayside. It undermines the understanding that effective data about the things that matter can adequately inform policy making; and it is only effective policies that can eventually tackle the large problems — alleviating poverty, saving farmers from desperation and suicide, preventing another pandemic, etc. — and help create a more equitable society.

Vidya Subramanian is a Research Affiliate at the South Asia Institute, Harvard University. Views are personal and do not represent the stand of this publication.
Vidya Subramanian is a Research Affiliate at the South Asia Institute, Harvard University. Twitter: @Vidyas42.
first published: Aug 6, 2021 09:14 am

stay updated

Get Daily News on your Browser
ISO 27001 - BSI Assurance Mark