Not yet, but government and industry are trying to make the cornucopia of information on the net available in Indian languages
Taru Bhatia | January 9, 2017 | New Delhi
A 2016 report of the Internet and Mobile Association of India (IAMAI) is a bit of a misnomer: it’s called ‘Proliferation of Indian Languages on Internet’, but these languages account for hardly 0.1 percent of internet content. The internet, as the Chinese and speakers of other languages would agree, is dominated by English – as high as 56 percent of the content.
This has serious consequences for India, where barely 10 percent of the people can use English. “We are leaving 90 percent of the people behind,” says Rajat Moona, director-general of the Centre for Development of Advanced Computing (C-DAC), a government research and development organisation. “We’ll not be able to move ahead without them.”
What’s worth noting, though, is that, despite the scarcity of local online content, the country has witnessed a sharp rise in internet usage in rural parts. The number of people with mobile internet access reached 87 million in December 2015 – a 99 percent growth over the previous year, according to the IAMAI report. Around 75 percent of new internet user growth is expected to come from rural India, and these users will prefer content in local languages, says a Nasscom-Akamai Technologies 2016 report. The growth in internet usage in non-metros and rural areas owes largely to people buying more mobile phones; though not everyone is going for smartphones, even simple phones these days allow internet usage. Data from the Telecom Regulatory Authority of India (September 2016) indicates a positive growth in wireless subscriptions in Haryana (4.04%), Odisha (2.67%), West Bengal (2.39%) and Assam (2.33%). Delhi, on the other hand, saw a decline in growth rate with only 0.93% increase in subscribers.
“When you look at usage in local languages, the number is growing 45 percent year-on-year,” says Rakesh Deshmukh, co-founder and CEO of Indus OS, which developed the eponymous operating system in 12 Indian languages. “With initiatives like providing language support, this number will certainly grow in terms of rural versus urban population.”
Recognising this shift, the government, in September, mandated that manufacturers ensure that cellphones support all 22 official Indian languages. Besides, they should allow typing in English, Hindi and the user’s choice of one more Indian language. Implementation is expected from July 2017. But what use is that capability without online content in a language the user knows? “Having content in local languages is an important objective of Digital India,” says Ajay Kumar, additional secretary, ministry of electronics and information technology (MeitY). “Unless we have that part in place, most people will remain excluded.”
Hindi, Ahomia anyone?
As the medium of international communication, English dominates internet and is likely to continue doing so. Equalling that will be impossible, but translation is a way out. Technology Development for Indian Languages (TDIL), a Rs 50 crore-100 crore government programme, is working on machine-translation technology, for both text to text and text to speech. But machine translation so far is of patchy quality. “It has to be supplemented with human endeavour, and that’s a costly affair,” says Kumar. “But once technology improves, human effort can be minimised.”
C-DAC, for instance, is crowdsourcing the work of volunteers for text to text translation of central government websites. The Rs 15 crore project began in 2013, and so far, 60 websites have been translated. The focus largely remains on Hindi, though. “Some 100 websites have been identified to be made available in Indian languages. But right now we are talking only about Hindi. We are in the process of translating into other Indian languages, for which we have to identify volunteers,” says Moona.
Public contributors register on C-DAC’s localisation project management system application and suggest word meanings. There are also C-DAC-appointed members who log in and contribute. A specialist group then goes through the growing corpus, validates the contributions and finalises choices. “Crowdsourced data helps us create a parallel corpus [of words and meanings] that will improve machine translation,” says Moona. Government agencies – and private players too – are free to use the corpus: the IRCTC and Yatra portals of the railways, Bank of Maharashtra, and Snapdeal are already using it.
“The quality of translation really depends upon the data resource you create, so that the machine can understand every use of the word and what it means in each context. This is a huge task, but as the corpus you build gets better, machine translation will keep on improving,” says Kumar. However, there’s no deadline yet to make all government websites available in Indian languages.
Business has already seen an opportunity and is preparing for the expected surge from rural India. A recent Google ad shows a bespectacled man in shirt and trousers reading a newspaper at a railway station. From among a group of workers squatting a little away comes a voice reading out the very headline the man is scanning. He turns around, irritated, only to have all the headlines read out one after the other. It turns out that one of the workers is getting all his news updates on his smartphone via Google in Hindi. The internet giant’s initiative is one of many that online business is taking to attract more users and eventually profit from it.
Snapdeal, an e-commerce platform, has developed a user interface that supports 11 regional languages, including Hindi, Telugu, Gujarati, Tamil and Marathi. “A significant part of Snapdeal’s users come from tier-2 and tier-3 cities and the multilingual interface, developed on the basis of feedback from buyers and sellers, gives us better access to a larger audience and enables everyone across India to explore and transact without any language constraints,” says a Snapdeal spokesperson in an email reply.
Digital wallet companies, gaining popularity post demonetisation, are also seeking to expand their footprint through regional outreach: Paytm, which leads the market with 150 million users, is set to launch its application in ten Indian languages. “Our goal is to make payments and commerce more inclusive, and this new feature will help us expand the market to include users who would prefer their native languages,” says Deepak Abbot, senior vice-president of Paytm. MobiKwik, another digital wallet, has also customised its application for Indian languages. “Apps in regional languages will help them understand the wallet user interface better and form the habit of using wallets,” says Mrinal Sinha, MobiKwik’s chief operating officer.
Content-based startups are recognising that localisation will gain them a following. In Shorts, an app that compiles and distributes news, was launched in English in 2013. Sensing that growth lay in regional languages, Azhar Iqubal, its founder, decided to add a Hindi version in 2015. Today, the app has been downloaded five million times, with the Hindi version accounting for more than 10 percent. “As we see it, Hindi is going to be the big player in regional languages. Looking at its growth rate, we see our Hindi users overtaking English users,” he says. “I have been getting requests for applications in other regional languages too. In 2017, we will probably explore other languages.”
But the most popular websites continue to use English as the main language. Nikhil Pahwa, founder of MediaNama, a mobile and digital news portal, says content development in regional languages is held up because there’s no revenue model. “The kind of money local content developers are making is far less than what English content developers are making. Advertising is less because fewer people are advertising in local languages on the internet,” he adds. IAMAI notes that of the Rs 179 crore digital advertisement market, only five percent goes into local language ads. By 2020 though, with growth in local content online, it is expected to grow to 30 percent.
The government’s e-bhasha initiative, under the MeitY, is soon to be declared a mission mode project (MMP), which means it will be fast-tracked. “Once that happens, it will be about best practices for web developers to follow. There will be guidelines on how localisation of content is to be achieved and how mobile platforms have to be developed,” says a C-DAC spokesperson.
The hard graft
One stumbling block for content developers and users alike is the non-availability of user-friendly keyboards (and fonts) in Indian languages. Virtual keyboards are being used to bridge the gap, but they can be cumbersome. “Most Indian languages have 55 characters or more. They have many half-letters, diphthongs and so on. How does one type them? Combining them with a smartphone ecosystem is a challenge. At Indus OS, we have largely addressed the issue,” says Deshmukh. But he connects this to the problem of content development: “When you build a technology in Indic languages, lack of content becomes a roadblock. We need a lot of data and information in other languages as well.”
Another is the Sisyphean endeavour of updating translations and following up on changes that websites make from time to time. “Most of the government websites are dynamic – which means they keep changing their content frequently. It’s not as if you translate once and the work is done. It requires continuous updates and continuous management,” says Moona. He also mentions websites that use language as advertisers practise it— indifferent grammar, wayward punctuation, which is especially difficult to machine-translate. Artificial intelligence and self-learning programmes are expected to reach a level of sophistication which will make human intervention unnecessary. But till then, crowdsourcing is the only way out.
To overcome the hurdle posed by widespread illiteracy, the government partnered with Indus OS in 2015 to develop text-to-speech technology for smartphones for eight languages that in future can work even without an internet connection. Indus OS already has a wide range of applications in English and 12 Indian languages. It has also teamed up with five mobile makers, including Micromax, Karbonn and Intex, to make the software available. But for this to succeed, memory and operating speeds have to be optimised for low-cost smartphones.
Beyond all divides
The trend towards internet usage in regional languages has been recognised as an opportunity by business. The government has to equally recognise that policy must be geared not only to ease things for business, but also to ensure an equitable and inclusive internet for people from across all languages and, needless to say, classes.
(The story appears in the January 1-15, 2016 issue)
Budget 2018, forecast to be a “please all” budget, has come out as a “disappoint all” budget. The public is looking askance at a budget that gives with one hand but takes away with both, the Sensex has gone into a tailspin and the pink papers are issuing dire warnings.
Should public sector banks be privatised?
Billionaire jeweller Nirav Modi, whose properties are being searched after Punjab National Bank reported a massive fraud of Rs 11,000 crore, is a good reason why banking reforms
“Gender based discrimination is worldwide and not alone in India. Offences against women are much more severe in cases of international trafficking, forced prostitution and pornography, women including migrant and refugee women face double barriers on virtue of their gender,"said Dr Rashmi M Oza
A group of 104 women beneficiaries of the Pradhan Mantri Ujjwala Yojana participated and shared their experiences at LPG Panchayat organised at the Rashtrapati Bhawan on February 13. The women delegation, falling under the BPL category, came across 27 states to narrate their success stories
BEML Ltd, a defence PSU, has delivered the first intermediate metro coach (car) unit to Bengaluru Metro Rail Corporation Limited (BMRCL). The intermediate car would be integrated with the existing three-car train sets to double the present passenger carrying capacity of the Bengaluru’