Big data is a term often heard. Yet only the tip of the data iceberg is actually used. According to IDC (2017), more than 99% of all available data is not analyzed [1]. What is the reason for this and how could this data be used?
Big Data is a concept that has come up many times at IThappens.nu. For an explanation of this you are referred to, for example, this article by Tjerk Timan. But what is Dark Data? Dark data is all the data that an organization possesses but no longer uses for purposes other than those for which it was originally collected. [2] A lot of data within an organization can therefore be categorized as Dark Data, because this data is stored but no longer used for anything [3].
Three types of Dark Data can be distinguished:
- Traditionally unstructured data: This is data that is often hidden in written sources, such as emails, notes and reports. This data is unstructured because it is not organized in a database.
- Non-traditional unstructured data: This is mostly visual and audio material, which can be of value if properly analyzed. These analyses just cannot be done with traditional techniques. Nevertheless, there are increasing advances in technologies that make these analyses better and better. Today, there is already a hospital trying to convert X-rays into data and analyze them [4].
- Deep Web: The last category deals with data hidden in the Deep Web. Through the Deep Web, data can be obtained that often cannot be found with standard search engines such as Google and Bing. Examples are medical and banking data. [3][5]
So Dark Data is not used to its full potential, mostly because this data is not structured. However, also this data can be analyzed, by using Dark Analytics. Dark Analytics is not focused on organizing large amounts of unstructured data, but on analyzing specific data in a specific scope. Before starting an analysis a number of things need to be clear. The problem must be clear, what one would want if the problem did not exist, and what data is needed to analyze the problem.
Risks
Dark analytics is a relatively new concept which also carries its pitfalls. Some common risks are:
- Certain data protected by laws and regulations, such as personal and transactional data, may appear in Dark Data. This should be handled with care.
- Dark Data may also contain business-sensitive information, about business processes or competitive advantage, for example.
- Dark Data contains a lot of unknown and non-evaluated data. This can cause problems for users if they do not know how to handle it.
- Another major risk that applies to all types of data is reputational risk. When there is a data breach, it can result in great damage to an organization.
- However, if an organization chooses not to invest in dark analytics, a third party may be able to gain a competitive advantage. [5][6]
To minimize these risks, several strategies can be followed.
- Dark Data could be continuously inventoried and assessed. For example, it could be periodically reviewed to see if new technologies have become available that can extract value from Dark Data. In this way, it would be possible to see if the Dark Data of the past, contains valuable information for the present or future.
- All data with potential value can be encrypted. In this way it is made very difficult for unauthorized persons to read and use the data.
- It should be determined which stored (Dark) Data should be kept or deleted. Clear agreements must be made about this, what criteria must the (Dark) Data meet in order to be stored and how should this be secured?
- Many organizations perform periodic security, risk and exposure audits. In these checks (Dark) Data could also be included. [6]
Opportunities
Despite the risks, Dark Analytics also offers opportunities. For example, Dark Analytics can help predict demand for goods and services by analyzing click behavior. Dark Analytics can also help analyze customer feedback. By analyzing server log files it becomes possible to obtain statistics related to Internet traffic. [5]
Dark data examples
Valuable use of Dark Data can be seen at Copenhagen Airport. This airport collected all kinds of data using their WiFi routers. In this way, passengers could be tracked in the terminal. This data was used to determine the most visited places, to which, for example, marketing- or security actions could be linked. [5]
Another example where Dark Data can be of value is with a manufacturer’s website. At first it might be thought that only data directly related to marketing and sales is relevant, but the location of visitors can also be interesting. For example, when visitors come from abroad but the producer does not market there. In this way, a potential sales market could be tapped. Even if the producer has no ambitions or means to tap a new market, this can still be interesting information for a competitor or partner. [7]
Much data that an organization possesses can be of value as long as it is properly analyzed. The data itself may not provide valuable information to the organization, but it can be valuable to other organizations. The data that is not currently being used for other purposes is called Dark Data. Using Dark Analytics, even this data can be analyzed and become valuable.
[1] Reinsel, D., Gantz, J., & Rydning, J. (2017). Data Age 2025. IDC.[2] Gartner. (2017). Dark Data. Opgeroepen op Juli 19, 2017, van Gartner: http://www.gartner.com/it-glossary/dark-data
[3] Deloitte. (2017). Tech Trends 2017: Dark analytics: Analyzing unstructured data. Deloitte.
[4] Faggella, D. (2016, Augustus 29). Machine Learning Healthcare Applications – 2016 and Beyond. Opgeroepen op Maart 3, 2017, van teachemergence: http://www.techemergence.com/machine-learning-healthcare-applications/
[5] Chowdhury, A. P. (2017, Mei 11). Shining a light on Dark Analytics in the data-driven age. Opgehaald van Analytics India: http://analyticsindiamag.com/shining-light-dark-analytics-data-driven-age/
[6] Tittel, E. (2014, September 24). The Dangers of Dark Data and How to Minimize Your Exposure. Opgehaald van CIO: http://www.cio.com/article/2686755/data-analytics/the-dangers-of-dark-data-and-how-to-minimize-your-exposure.html
[7] Spotlessdata. (2017). Are you using your Dark Data effectively? Opgeroepen op Juli 26, 2017, van Spotlessdata: https://spotlessdata.com/blog/dark-data-data-cleansing