Digitizing of government databases due to RTI, e-governance and UID will create huge amounts of data

samirsachdeva

Samir Sachdeva | April 2, 2011


Anand Naik, director systems engineering, Symantec India
Anand Naik, director systems engineering, Symantec India

Anand Naik is director systems engineering at Symantec India. In an exclusive interview to Governance Now he  speaks about the issues that the government sector faces in regard to their storage, data management, disaster recovery and data protection. He discussed the key challenges involved and how Symantec can help in same. Edited experts from an exclusive interview with Samir Sachdeva:

What are the key drivers that are driving the growth of data in private and government sectors?

According to IDC, the digital universe is growing faster at a rate of 60% and is projected to be nearly 1,800 exabytes this year, a 10-fold increase over the past five years. A study by them said that the worldwide volume of digital data grew by 62% between 2008 and 2009 to nearly 800,000 petabytes. It also predicts the amount of digital information created annually will grow by a factor of 44 from 2009 to 2020. This kind of data growth is a big challenge for organizations since buying more storage drains the enterprises in terms of the costs involved. 95% of this data is in the unstructured format and both the private and government sector are suffering from this problem.

Companies hold on to data because either they are required to or because it makes good business sense. Information is a valuable asset for any organization, just like its employees or IP. It is very difficult to predict what kind of information will be required in the future; hence, most companies tend to save everything. The kind of policies a company has in place also has a huge impact on its information management policy. Regulation and compliance is another important driver for information growth.

For e.g. the recent directive issued by the Home Ministry to all telecom operators makes it mandatory for them to store all text messages for six months. This will create mountains of data for the telecom operators considering the fact that Indians generate between 130 – 150 billion SMS’s a month. Over a period of six months, the volume of data will be huge. The recent financial crisis has created the need for strong financial reporting. Compliance to these standards will see companies experiencing a massive increase of data that they need to deal with.

Various government initiatives also have contributed to the growth of data in this sector. An example of this is the digitization of all land records, which will generate huge amounts of unstructured data in the government sector. The $1.5 billion digitization drive has been initiated in a bid to ensure transparency and secure conclusive land titles.

The problem with managing unstructured data is that data is typically very scattered and is not assigned ownership. In addition to that, IT teams also struggle with identifying old or irrelevant data, allocate storage to the appropriate business unit or department, and understand data usage and consumption trends. There is an urgent need for implementation of the right technology to handle critical, semi and unstructured information.

What implication will the data growth have on government processes? Which are the key IT projects which are leading to this data growth and in which way?

The government is looking at a focused way on the growth roadmap, therefore has taken aggressive steps towards streamlining government process and digitizing relevant information. The digitizing of government databases due to acts like the Right to Information, roll out of numerous e-governance projects and initiatives like the UID are expected to create huge amounts of data in the government sector.

Some of the key initiatives taken in this direction will propel the need for new IT infrastructure, namely the UID project, mission-mode projects,  state data centers and new applications within the states under the National e-Governance Plan (NeGP).

The digitization of land records projects started in 2009 is one such example of unstructured data in the government sector. The $1.5 billion digitization drive has been initiated in a bid to ensure transparency and secure conclusive land titles

The UID project is another such example. The government aims to cover half of India’s population with UID numbers by 2014. This will involve photographing a staggering 600 million Indians, scan 1.2 billion irises, collect six billion fingerprints and record 600 million addresses.


How will you differentiate between structured and unstructured data?

Data that grows in a predictable manner and follows a definite pattern is structured. Unstructured data is everything structured data is not. It can take any form, such as documents, spreadsheets, and emails. Unstructured data may include audio, video and unstructured text such as the body of an e-mail message or word processor document.

One of the ways to determine whether data is structured or not, is to ask whether or not the data can be sorted in a programmed manner. If the answer is “yes”, the data is structured. Substantial portions of data can also be classified as semi-structured data. While word processing documents and presentations are considered unstructured, they are actually semi-structured documents. If it is possible to do a search using search engines the data is either structured or semi-structured.
 

What challenges will be faced in management of the unstructured data? How can government minimize data loss?

The general corporate mindset is to keep everything. However, recent IDC report found that the worldwide volume of digital data grew by 62% between 2008 and 2009 to nearly 800,000 petabytes. The report also said there's a growing gap between the amount of digital data being created and the amount of available storage. 

Organizations are saving a lot of information. Also the same data is stored under different names by multiple people thus leading to growth in redundant data.

The need of the hour is to manage it effectively leading to a direct impact on the costs.

A clear understanding of different types of data helps in managing them differently. As substantial portion of data in public domain is unstructured or semi-structured, special analytical tools are needed to add intelligence to analyze and understand them. Nevertheless, semi-structured and unstructured data are generally understood as unstructured in the IT world at present.
A few challenges in managing unstructured data are:
•    How and where to store the large volumes of unstructured data—also the hardware costs, management overhead costs, etc., associated with storage
•    How to manage retention of unstructured data and archival policies thereof
•    How to secure unstructured data, and ensuring consistent security
•    Finally, availability and recoverability of unstructured data

The government sector will witness significant data growth in the near future. The digitizing of government databases due to acts like the Right to Information, roll out of numerous e-governance projects and initiatives like the UID are expected to create huge amounts of data in the government sector. Huge amounts of data involve systematic investment by various government departments towards storing the data as well as securing it.

What will be the costs involved in the storage of this voluminous data?

A recent IDC report estimates that the total cost to manage the world's installed base of external storage is around 60 per cent of all storage related spending. This includes software, power, cooling, administration, personnel and services and excludes the cost of acquisition of the storage.

Most CIOs and vendors are now going in for technologies like thin provisioning, virtualization or automation or dynamic provisioning and federation of data to reduce total cost of ownership. Managing storage costs and increasing the efficiency of their systems is top priority for most of the CIOs today. Data management is becoming more complex and there issues with respect to new data types, storage in silos, low utilization along with power and cooling challenges.

According to IDC, economies of scale have helped reduce power and cooling costs, but other costs related to external storage prove to be more expensive than the storage itself.


What are your solutions in area of data storage? Which technologies are used by you for data storage?

The solutions we offer enable users to optimize their storage costs for unstructured data and make intelligent backup and recovery decisions based on context and frequency of use. By classifying this data and assigning ownership, information loss can be prevented, and storage costs and data usage can be optimized.

Our information management strategy to enable organizations to protect their information completely, deduplicate everywhere to eliminate redundant data, delete confidently and discover efficiently.

1.   The recently launched Symantec Data Insight for Storage provides new visibility and control to organizations into the ownership and usage of unstructured data, reduce storage costs and align their information assets to business goals. Leveraging the Data Insight technology Data Insight for Storage helps organizations promote accountability for storage consumption through a new chargeback process. Additionally, it gives organizations the management tools to improve storage reclamation, archiving and data lifecycle management initiatives and policies. This solutions allows for improved efficiencies of storage consumption by giving IT the ability to hold business units accountable for the storage space they utilize, through allowing IT managers to see who created, who utilizes, and who is responsible for data .

2.   Symantec FileStore is another solution that we offer to address this issue. A clustered Network Attached Storage (NAS) system, FileStore integrates trusted Symantec solutions including Enterprise Vault, NetBackup, Endpoint Protection and SmartTier (dynamic storage tiering) to optimize the management of unstructured data. FileStore leverages Symantec's proven Veritas Cluster File System (CFS), Veritas Cluster Volume Manager, and Veritas Cluster Server to provide an incredibly robust foundation on which to host file services.

3. Backup Exec 2010 enables organizations to protect more data while reducing storage and management costs through integrated deduplication and archiving technology. It delivers centralized management to easily extend backup infrastructure across a distributed environment and remote offices so that server and desktop data protection is easily managed from a central office. Backup Exec 2010 offers a flexible approach to eliminating duplicate data without adding complexity. 

4. NetBackup 7: NetBackup 7 helps enterprise-level organizations protect, store and recover information with greater efficiency and reliability through a single, unified platform. It enables organizations to simplify information management while reducing data stores and network traffic by integrating deduplication everywhere.

      
What is the mechanism for data protection in your solution?


Symantec’s information management approach enables organizations to protect completely, Deduplicate everywhere, Delete confidently and discover efficiently. We help organizations get information under control, store it efficiently and manage it for rapid recovery or discovery.

Data protection means completely means re-thinking data protection technologies, including backup, archiving, and security. Nowadays, more and more organizations are deploying active archiving, but only for one or maybe two applications– typically email and file shares. Protecting completely means the elimination of physical tapes by using active archiving for long-term retention of information, and making backup a short-term disaster recovery platform.

Data loss prevention technology works adds to the protection as it can identify and track the most critical information: organizations will know where it is, how it is being used, and you can implement policies to retain it and prevent inappropriate use.


What are the key disasters which can be threat to these storage facilities and how will they be addressed?
   
Storage is a key investment area for many organizations. Though expensive, organizations are required to keep data to ensure regulatory compliance or to minimize future litigation risks.

Natural disasters have emerged as one of the biggest threats to storage facilities in the recent past. Floods, earthquakes, tsunamis have increased the vulnerability of organizations to data loss. The last few years have witnessed the loss of millions of records, research, documents due to the impact of natural disasters.

Other forces that can damage or destroy storage facilities are material instability, improper storage environment like maintaining the right temperature, humidity, light, dust etc. all need to be taken care of.
Companies in India have now started recognizing the importance of having a disaster recovery system in place. Implementing of technologies like backup, recovery and archiving solutions that help organizations store, archive, back-up and restore data is on the rise. This trend is also reflected in the India findings of the Symantec Disaster Recovery Study. In its sixth year, the study predicts that 70 percent of organizations surveyed are concerned about data loss as an impact of a disaster.

Indian enterprises are adopting new technologies such as virtualization and the cloud to reduce costs. However, adoption of such disruptive technologies is adding more complexity to their environments and leaving mission critical applications and data unprotected

The number of applications and the amount of data in virtual environments is expected to grow significantly in 2011, thereby increasing the need for disaster recovery solutions that protect these applications. The 2010 Symantec Disaster Recovery Survey India findings found that while a little more than half of data within virtual systems is regularly backed up, there is significant room for improvement. 70 percent of those surveyed were concerned about data loss as an impact of a disaster.

Cloud storage is also expected to go up significantly in 2011. Respondents of the survey reported that their organization runs 29 percent of mission-critical applications in the cloud. While 41 per cent of Indian enterprises report that security is the main concern of putting applications in the cloud, 27 percent responded that ability to backup is the biggest challenge.

Symantec recommends ensuring that mission-critical data and applications are treated the same across environments (virtual, cloud, physical) in terms of DR assessments and planning. Enterprises should use integrated tool sets for managing physical, virtual and cloud environments to save time, training costs and help better automate processes.

In what areas Symantec can add value in the State Data Centres (SDCs) being established in each state?


Symantec enables organizations to have a better control over data growth with a holistic approach that enables enterprises to be truly efficient in the way information is managed and stored.

Organizations can reduce the cost and complexity of growing data volumes and ensure that their infrastructure needs are aligned with your business requirements.

Symantec’s portfolio of data center management software is architecture and infrastructure agnostic, giving maximum agility and flexibility, while deriving maximum costs benefits.

Symantec Consulting Services provides a Data Center Transformation (DCT) framework, designed to cost-effectively address the challenges of IT infrastructure complexity while driving IT operational excellence, thereby enabling data center evolution and transformation—from subsidized cost center into strategic business investment.

What is the road ahead for you?
 
Information is the fundamental connecting fabric of businesses today. Businesses rely on information to run their business, operate efficiently and comply with corporate governance practices and industry regulations.
In spite of the importance that information holds, most organizations today are not organized around information. This results in the creation of information islands – an inability to secure and manage information consistently, leading to duplication, inefficiency and the inability to use information for efficiency and competitive advantage. The results of the 2010 Symantec Information Management Health Check Survey also highlight the same. Eighty-seven percent of respondents believe in the value of a formal information retention plan, but only 46 percent actually have one. The survey also showed that many enterprises save information indefinitely instead of implementing policies that allow them to confidently delete unimportant data or records, and therefore suffer from rampant storage growth, unsustainable backup windows, increased litigation risk and expensive and inefficient discovery processes.

The fact that information is growing at a rate never witnessed before is further adding on to the issue. Every organization today, across any vertical, is a highly data intensive, information driven organization. Companies today have to deal with massive amounts of data. Data is more dispersed and links between data is more complex than ever before. According to a recent estimate in a special report on “Data Deluge” in the Economist mankind created 150 exabytes (billion gigabytes) of data in 2005. This year, it will create 1,200 exabytes. Business critical unstructured data is growing at an alarming rate and today accounts for 85 percent of all organizational information. In fact, according to IDC, over the next four years, the digital universe will expand by 400 percent, while IT budgets will expand just 20 percent and IT staffing just 10 percent.

This information age has made it crucial for vendors to create solutions that integrate across all aspects such as security, compliance, storage, management, backup and recovery to ensure customers derive greater value from our solutions.

Symantec’s approach to information management is built on four pillars: Protect completely, Deduplicate everywhere, Delete confidently and discover efficiently.

Symantec enables organizations to get information under control, store it efficiently and manage it for rapid recovery or discovery. We allow organizations to move away from information islands to an information management platform that is architecture and infrastructure agnostic, giving maximum agility and flexibility while at the same time deriving maximum cost benefits.

 

Comments

 

Other News

Elections 2024: 1,351 candidates in fray for Phase 3

As many as 1,351 candidates from 12 states /UTs are contesting elections in Phase 3 of Lok Sabha Elections 2024. The number includes eight contesting candidates for the adjourned poll in 29-Betul (ST) PC of Madhya Pradesh. Additionally, one candidate from Surat PC in Gujarat has been elected unopp

2023-24 net direct tax collections exceed budget estimates by 7.40%

The provisional figures of direct tax collections for the financial year 2023-24 show that net collections are at Rs. 19.58 lakh crore, 17.70% more than Rs. 16.64 lakh crore in 2022-23. The Budget Estimates (BE) for Direct Tax revenue in the Union Budget for FY 2023-24 were fixed at Rs. 18.

‘World’s biggest festival of democracy’ begins

The much-awaited General Elections of 2024, billed as the world’s biggest festival of democracy, began on Friday with Phase 1 of polling in 102 Parliamentary Constituencies (the highest among all seven phases) in 21 States/ UTs and 92 Assembly Constituencies in the State Assembly Elections in Arunach

A sustainability warrior’s heartfelt stories of life’s fleeting moments

Fit In, Stand Out, Walk: Stories from a Pushed Away Hill By Shailini Sheth Amin Notion Press, Rs 399

What EU’s AI Act means for the world

The recent European Union (EU) policy on artificial intelligence (AI) will be a game-changer and likely to become the de-facto standard not only for the conduct of businesses but also for the way consumers think about AI tools. Governments across the globe have been grappling with the rapid rise of AI tool

Indian Railways celebrates 171 years of its pioneering journey

The Indian Railways is celebrating 171 glorious years of its existence. Going back in time, the first train in India (and Asia) ran between Mumbai and Thane on April 16, 1853. It was flagged off from Boribunder (where CSMT stands today). As the years passed, the Great Indian Peninsula Railway which ran the

Visionary Talk: Amitabh Gupta, Pune Police Commissioner with Kailashnath Adhikari, MD, Governance Now


Archives

Current Issue

Opinion

Facebook Twitter Google Plus Linkedin Subscribe Newsletter

Twitter