by Sampriti Sarkar on October 31, 2017
In this age of digital transformation, data is the new currency without which any an organization is bound to lose competitiveness in the market. While data comes in many forms, the non-traditional unstructured data known as dark data holds a significant potential if explored. Dark analytics is recognized as one of the major disruptive technology today. The article sheds light on dark data analytics and its applicability to businesses.
What are dark data and dark analytics?
In case of data, ‘dark’ refers to something that is hidden or unearthed. Gartner coined the term dark data and described it as data which is collected but not used for anything more than its intended purpose. Dark data can be interpreted as a subset of big data which constitutes the biggest volume.
Dark analytics focuses mainly on raw text-based unstructured data that has not been tapped or analyzed before. The data includes text messages, emails, audios, videos, images, etc. Further dark analysis can also target deep web, which comprises everything on the internet not indexed by search engines, including a small part of inaccessible sites, called the dark web. Customer information, log files, previous employee information, raw survey data, financial statement, account information are some examples of unstructured data that are considered as dark data.
Why dark data often remains unexplored?
International Data Corporation (IDC) stated that 90% of the unstructured data is never analyzed. The following reasons may explain such behavior. Most of the dark data is unstructured, which may be difficult to explore and analyze. Until recently, technology was not sufficiently advanced to harness such humungous quantity of data. Moreover, the number of resources required and the difficulty of having dark data analyzed poses a potential opportunity cost for businesses.
In today’s business operations, data analysis gives the competence without which any business will not be able to reach their target and stay in the competitive race. Insights from dark data will help in decision-making, make new target paths for future, give new opportunities, reduce risk and increase ROI. By exploring dark data and experimenting with dark analysis, various insights can be found on businesses, consumers, which may not be possible from data currently in their possession. For example, server log files can provide website visitor behavior.
Dark data comprises of mainly the following three dimensions:
• Traditional unstructured data: This includes untapped data which remains available with the organizations but is not explored and remains idle. Unstructured data like emails, documents, messages etc., are mostly in text-based form and remain untouched. It has been estimated that nearly 80 to 90 percent of data in the world is unstructured.
• Non-traditional unstructured data: This dark analytics dimension comprises of different categories of unstructured data that cannot be mined and analyzed using traditional analytics techniques. This comprises of audio and video files, still, images that could not be explored until now. These may help to get more insights on customers, employees, markets, and operations.
• Data in the deep web: This dimension covers the largest body of untapped information covering data from academics, government agencies, communities, and other third-party It is roughly estimated that the size of the deep web is 500 times larger than the surface web which is commonly searched by people.
• Data source and authenticity: It can be difficult to trust data’s integrity and authenticity obtained from dark analysis. Without the accuracy, transparency, and authenticity, companies may be exposed to regulatory, financial and brand risk.
• Respecting privacy: The spectrum of privacy law is huge in case of audio and video data sourced outside. It is necessary to remain aware of the varieties of global privacy law. Data appearing potentially harmless can carry potential privacy risk.
• Legal and regulatory risk: Data covering confidential information like credit card information can appear anywhere in dark data collections. This can involve legal and financial liability. Dark data may contain sensitive information which can compromise important business activities and relationships.
• Reputation compromise: A data breach, especially sensitive and confidential data, can result in a compromise of reputation for the company. Businesses could end up losing consumer confidence and trust.
• Leveraging deep web signals: The dark web represents only one small portion part of the larger deep web, but it has always been at the root of cyber issues. So from the cyber-risk point of view, it is likely to magnify the risk.
• Open-ended exposure: Dark data may contain unknown and untapped sources of intelligence, causing exposure to loss or harm.
Preventive Measures when dealing with dark data
• Regular auditing and trimming of database.
• Data should be encrypted to ensure data security.
• Data retention and self-disposal policies should be in place.
The field of dark data analysis is relatively new but the potential it holds for the future is exciting. Till date, analytics is limited mainly to structured data, but the vision is changing slowly as a wide range of opportunities and insights open up from unearthing and analyzing dark data. Businesses are gearing up to use tools to analyze dark data to avoid losing competitiveness in the market.