Excerpts from former UIDAI chief Nandan Nilekani’s address on ‘Data as the oil of the 21st century: India’s response’ at the Delhi Economics Forum on July 22
Why would Google develop self-driving cars? Google first had web search, search led to local search, which led to maps, maps led to street views, and street view led to data on everything around that. In the meantime, machine learning became powerful. It started identifying objects on the road, and all this led to self-driving cars.
Fundamentally, when you have massive data pool you can enter other markets very easily.
For example, recently when the Amazon bought Whole Foods, the prices of retailers dropped in the US. The traditional retailers said, “Oh my god! Amazon, which is the online retailer, is buying the physical retailer… what will happen to us?”
The historic businesses didn’t collect data to the level of detail these new businesses are. Data is the new kind of resource which we need to give out very strategically. It has become more important because of certain developments in the last 15 years.
When did data become the source for business? Essentially, it started with Google monetising the search data so that they could see ads and then Facebook did that to social media. They used data for targeting customers to exactly understand a person’s preferences and selling to the consumers what they want. Between 2000 and 2010, by and large, data was used for monetisation.
The big thing in the last six years is the use of data for Artificial Intelligence (AI). As an idea AI has been there for 40 years. It goes back to the Turing Machine. In the last few years there was so much data that we were not able to do a lot of things with it.
An important breakthrough was ‘deep learning’, which uses layers of neural networks to quickly use data to figure out something. A good example of deep learning is image recognition. In 2011, it had 30 percent error rate. A human being gets it right 95 percent of the time. But from 2011 onwards, deep learning has been applied with image recognition… and now, as of last year, the error rate is less than five percent – better than human beings.
These properties are being used to make self-driving cars. Many aspects of learning are becoming AI-driven. The fact that we have AI offers a huge competitive advantage. It adds value to the data. So data is becoming precious not only because it helps you make money but it also helps you automate and do all kinds of new things and getting to new markets. So data is really the oil of 21st century.
Two examples show how data disrupts scale: digital advertising in the US and payments in China. In the US, Google and Facebook have captured 71 percent of the total digital spend. In the US, newspapers are getting out of business and television is facing a threat. What’s important is that in the growth of digital ad spending in 2015-16, 89 percent of growth was captured by the two platforms! And the two companies didn’t even exist 15 years back.
In China, there has been a rise of digital payments. China’s mobile payments are $5.5 billion – 15 times than that of the US. They have done an amazing job of using QR codes. In China, you can just scan a QR code with a smartphone and make the payment.
However, these payments are dominated by two companies: Ali Pay, part of the Alibaba Group, and WeChat, which belongs to Tencent. Between these two companies they own 90 percent of the total market share in China.
Two companies in the US dominate digital ad and two in China dominate payments. It is a good example of how data has this central prophecy: the more you have data the better you get.
Now if you combine data with AI you get both scale and speed. Again, I’ll give you one example of the US and one of China.
In the US, look at Netflix. Ten years back, it was providing content to consumers in DVD. In 2007, the same year iPhone was launched by Apple, Netflix launched a video streaming service. And today Netflix has over 100 million customers watching videos on their service globally. The difference is, Netflix has data on ‘who’ is watching, ‘what’ they are watching, ‘how’ they are watching, ‘when’ they are watching and ‘what’ do they like. You can use this data to make better programmes.
Initially, Netflix was not into the content business, it was a distribution company. It was selling DVDs and CDs. However, in 2013 Netflix started making its own content. The first show was House of Cards and then they went on to make many other shows.
Interestingly, if you look at this year’s Emmy awards, they got 93 nominations. It’s incredible! HBO, the grand daddy of this business, which produces Game of Thrones, had 110 nominations. This is the power of the company which has so much data about its users.
One of the inventions of Netflix is ‘post-play’, designed for binge watching. When you finish watching an episode you can start watching the next one. If you are watching a movie then at the end another movie will begin, which they think you may like based on the data. Netflix’s slogan is ‘more viewing less sleep’. They say they only compete with sleep.
Similarly, in China, Alibaba is a big guy in payments. They have developed an app-based money market fund ‘Yu’e Bao’ and have told the consumers to “just keep loose money with us and keep the rest with the bank”. Within four years it has become the world’s largest market for mutual fund. It has about $165 billion in mutual fund. It is not a bank, but a product that they created and were able to use their customers data to sell them this product.
It is an example of how this new world is changing so fast where data and AI allow you to dramatically scale up and enter new markets. Data and AI are causing disruption at large scale and speed.
We see platforms that accumulate user data rapidly, disrupt industries and wield disproportionate influence and create silos. Because data is their asset. This leads to ‘data domination’. It’s already there. The world is just waking up to this challenge.
Currently, the EU is very active. They are bringing vast privacy laws, called general data protection regulations. They recently imposed a big fine on Google for promoting their own retail products. The UK has asked its top banks to open up data so that there could be competition.
Japan has announced some tough anti-monopoly rules on data. China’s cyber security laws are very strict and state that you have to keep your data within the country. Apple recently announced a deal wherein it agreed to store all data related to Chinese residents within the country’s geography.
All over the world this is becoming a huge issue. Therefore, it is important that India has a strategic position on data. This is not a technology problem, but a policy one. The most important resource of the 21st century is data. You need to have a strategy.
There are three different dimensions related to it. One may be called ‘data colonisation’ – more and more of your data is offshore, essentially unaccountable to your country’s laws, which happens when you use internet based in other countries. Second is privacy. A nine-judge constitutional supreme court bench is discussing on it, so I won’t comment. Third is the ‘winner takes all’ behaviour of data, which leads to monopoly.
The reason why public policy is so confusing is because of the existence of multiple dimensions. There is a competition dimension. There is a privacy and a national security dimension wherein lies the problem of how you create domestic champions. But it is important that you have strategic positions on data.
China has made a huge commitment to data and AI. Currently, China and the US are the two big leaders in AI. Now what can India do which is different?
We suggest ‘inverting’ this data ownership. Instead of data being owned by companies, intermediaries or the government, users own their data. Can we invert the pyramid so that the data belongs to individuals? India is in a position to do this.
Inversion of data means instead of data being used to sell things to you, it is used to empower the user. Is there a policy, technology and infrastructure solution where data can empower each user for his or her own benefit?
When you invert data ownership, you take your data from various data producers and use that to reap benefits for yourself. This is basically giving data back to the people.
If you can’t trust the government, industry or anyone else, the best thing is to give data back to the people. Give people the control over data.
India has the technological infrastructure for every user, be it an individual or a small business, to be able to get his or her data from any source in real-time. But what is the objective here?
One, it prevents colonisation. Here I’m not using the term in protectionist sense. It is in the sense when data is collected in India by anyone, be it an Indian or an international company or the government, I should have the right to claim it back. And, by the way, none of these guys give it back.
This has to be decided by the Indian law. This will be applicable on Indian and foreign companies. That has to be part of your policy and law and that will help in getting the data back from anyone.
Second, inverted data defends privacy – means private companies collect data with the users consent. But fundamentally you have to take it (the internet service requesting access to your data) or leave it. The best I can negotiate terms with my service provider on what they do with my data. And that again defends privacy because I have control over my own data. And this again could be done by a law for those services provided in India whether it is an Indian provider or a global provider.
Another important thing is that data portability of this type enables competition and innovation. Because once my data is portable, for same reasons as in the case of MNP and banking, I should be able to get my data from the service provider and give it to someone else if I feel so.
Fundamentally, if you invert data and give it back to the users, you address three challenges in one shot – issue of colonisation, privacy and creation of competition and innovation.
This is called data democracy. Data is empowering when it is in the hands of people.
And the right time is now because India is adopting technologies at an unprecedented scale. We have close to 300 million Jan-Dhan accounts, 1.16 billion people have Aadhaar, 350 million people have smartphones.
There is a product called ‘e-sign’. They use Aadhaar number for signing a document. This was designed by my former colleagues Ram Sevak Sharma (present TRAI chairman) and Pramod Verma (at the UIDAI). Since its inception, 20 million e-signatures have been done.
With GSTN coming in, e-sign [usage] would touch the roof. The DigiLocker has around 4.8 million users and 6.9 million documents are stored with 17 departments. The UPI, launched a year back, is the fastest growing digital platform having over 10 million transactions in the month of June.
E-KYC used to electronically open a bank account, get a SIM card, get an insurance policy and get a pension, has already crossed 350 million mark.
The authentication of Aadhaar number rose from 800 million to a billion in a month. It’s a serious volume. You have national class systems.
We also have Aadhaar payment bridge (APB), which connects your Aadhaar with the bank account… there are a half a billion entries.
We are running the world’s largest direct transfer system, which has done 2 billion transactions worth nine billion dollars. It’s not on paper, its real, it’s working, it’s happening on scale.
India will go from being data poor to data rich. And the rate at which data is growing is faster than the GDP growth.
India has unique digital infrastructure in two ways. From Aadhaar locker, UPI, BHIM we call it as India Stack, which enables transactions at low cost. At the same time, we have some major platforms that are going live for business – GSTN and FASTag, an Aadhaar for vehicles. Around 18 percent of the toll transactions are happening on FASTag.
Then there is Bharat Bill Payment system, which essentially converts all bills into paperless bills. This is a massive, national scale system, all of which is going to generate a lot of data. So suddenly we have digital infrastructure for consumers, and we have infrastructure for businesses.
The other important thing we have is the unique identification system. Because of online frauds we need two-factor authentication through which you use card and a PIN. India is the only country where I can use the entire JAM (Jan-Dhan accounts, Aadhaar, mobile) infrastructure for authentication. Because Aadhaar itself provides biometric authentication, the mobile phone authentication happens using OTP. And thanks to UPI we have an interoperable PIN infrastructure.
An important thing to realise is that we can’t use the western model here. The west became economically rich and then data rich. How do you think India becomes data rich before it becomes economically rich? The businesses that will emerge will be those that allow users to use their data to improve their lives. It is a fundamental shift that you have to make. For example, the businesses can use this data to get credit. Now with the digital footprint, they can go to the lender and ask for credit. GST data will be highly beneficial for business tax payers. The businesses can obtain copies of invoices from the GSTN and share it with the lender. This way, data will democratise credit.
We are on the verge of ensuring that instead of credit going to corporate houses, which are deeply in debt, it goes to the millions of businesses who have never got credit from the formal economy. Similarly, we can use the electronic toll collection (ETC) data wherein the trucker can take a loan showing the number of trips he made last month. We have a chance to eliminate knowledge asymmetry with migration of individuals to the formal economy.
Although you can argue that we have data from NSSO and CSO, you don’t have data in real-time. With national level systems in place, you have data in real-time. Because when you have one million invoices a month in the GST system not only you know the business [value], but also the product code. Even line item in the GST has something called HSN code (harmonised system nomenclature) and service has a SAC (services accounting code). So whether it is a hair dye or a hair cut you have a code for that. You can get the data every Friday. You can review it.
From the ETC system you have real-time data on truck movement and combined with ETC and e-bill under GST you know the product being transported. So not only we know what is in truck because of HSN code but also how much is their weight.
If you go to the NDSAP [National Data Sharing and Accessibility Policy), you know exactly how the economy is becoming cashless. You know how many transactions took place that day on UPI or BHIM. So suddenly from cash to cashless, whether it is goods or trucks, you have real-time data.
So what specific recommendations do we have?
Unlock all public databases. In other words, I should be able to draw my income tax return or GST detail. This is also in line with the existing laws like the RTI which requires to proactively provide information to the public. This is an extension of that, where private information of the consumer will be given back with his or her consent very easily. We can do this proactively for the government databases and other things.
The same thing is required by a regulator in respective fields. The RBI mandates that banks provide mutual funds transactions to the customer on his request. And most important is that you need to create a law – a data empowerment law. The bigger issue is how we get Indians use their data.
Therefore, we need to have a law, that whosoever collects it, be it an Indian or a global company or the government, all of them share data with consumers and businesses. This is a fundamental thing.
These things involve no money. It can be done immediately and can put the economy on a cycle of growth using data.
We need to solve our problems, empowering users through data, through removing knowledge asymmetry at scale and playing the cycle of innovation back.
(The article appears in the August 16-31, 2017 issue of Governance Now)