Todays’ business leaders have a myriad of technologies at their disposal to lever innovation initiatives, and many initiatives are dedicated to making sure we transform our data into valuable insights and effectively become data-driven. The goal is to have the right information at the right time, which nowadays usually means in near real-time.
These insights will shape our strategy and decisions around two main areas. On the one hand, we will create our data-backed customer intelligence that will become the foundation for customer experience improvements and will lead the way towards the latest trends: hyper-personalization and micro-segmentation.
On the other hand, we will need to be precisely and timely informed to realize new levels of operational efficiency, including every project around digital transformation in operations, industry 4.0 programs, Internet of Things (IoT), process automation, predictive analytics, and other must-have trends.
Therefore, as data becomes a primary raw material for any innovation project in organizations, having the right data becomes more than imperative. But the crude reality is that business leaders are aware of the shortcomings of their current data lakes and data warehouses, the weaknesses of the technologies being used to collect data, and the fact that there are highly valuable data sets that are not being leveraged at all.
The term Dark Data has not been around for too long and, for some years, has been eclipsed by Big Data. Dark Data refers to the information assets that companies generate during their regular activity but are not collected, stored, and therefore not used in any manner to derive insights or for decision making.
Dark Data has begun to be on the radar for many organizations due to its high importance in terms of penetration and the great opportunity that leveraging these data hides in terms of business growth and competitive advantage.
According to KPMG research, 80 % of companies’ data is Dark Data, a tremendous hidden resource that flows untapped through major organizations. The iceberg metaphor has been extensively used to represent the vast amount of hidden data that most companies have underwater (not collected). So Dark Data is not just a small portion of Big Data. It is the most significant slice of the pie and holds a massive amount of potential for those who want to harness its immense power.
Operational Technology (OT) is the name for the hardware and software technologies that have supported the control, automation, and monitoring of devices and machines.
These technologies were designed and implemented well before computer systems were first mentioned, and Information Technology (IT) became a topic by itself. As a result, OT and IT have evolved in parallel with very limited touchpoints.
The reality, however, is that IT has outpaced OT in terms of cost, flexibility, and available workforce, among other relevant areas. Due to the critical nature of the industrial processes managed by operational technologies, most vendors developed systems that were highly resilient and reliable. These features came at the expense of lack of interoperability because few standards existed. When criteria were defined, many vendors preferred to remain on the proprietary side, claiming the benefits of specific add-ons.
The fact that illustrates this journey is that hundreds of different and incompatible industrial automation protocols are currently in use. As a result of this evolution, highly valuable operational data is trapped inside machines, devices, and sensors. Vendor lock-in, proprietary protocols, and lack of interoperability have inhibited machine data to be shared and used to govern and unlock efficiencies in modern companies.
The gap between OT and IT is more significant than ever. As an example, the release of Part 14 of the OPC UA Standard, known as Publish-Subscribe, was announced in 2018. Still, the Isis Toolkit, described at the 1987 ACM Symposium, already had this functionality, and IBM MQSeries has been providing this enterprise functionality since 1993.
There are many technologies to extract data from industrial machines and devices, including but not limited to OPC, SCADA, network sniffing, and proprietary monitoring protocols.
With the rise of a higher volume of machine, sensor, and IoT data, comes the need to access data as soon as it is available. The value of this type of data degrades with time, however. And because data flows are created at high rates, it accumulates quickly into large volumes that become cumbersome to manage and impossible to process through manual processes and traditional hand-coding approaches. This continuous flow of messages and events can increase the effectiveness, agility, and responsiveness of decision making and operational intelligence.
Wi-Fi, for example, is a ubiquitous technology that is present in almost every facility, either public or private, in-door, or out-doors. The reality is that Wi-Fi is mostly used to provide networking connectivity to access internal systems or interact with the Internet. However, the power of Wi-Fi from a data and analytical perspective is beyond the pure wireless networking connectivity. Each Wi-Fi Access Point (the device that is used to connect your endpoint to a wired network) is already generating a wide variety of data that remains poorly explored.
This device contains relevant information on the endpoint device (identified by a MAC address), the signal power, and some other in-depth technical metrics.
These Dark Data can be used to deliver mobility intelligence to companies, such as locating devices and understanding how those devices are moving around.
Networking is the technology behind the Internet that has reshaped the information exchange in modern society. Not every byte of data transmitted over a network (mostly using TCP/IP network protocols) is stored because it is hardly feasible; however, this network traffic hides a massive potential in terms of Dark Data.
Tons of in-transit data remain hidden and unleveraged due to the difficulty of collecting and processing these temporary transactions flowing over the network (B2B API, XML integrations, read-only transactions, product searches, quotes, inventory checks, etc.) without the hassle of modifying the backend systems. Using technologies such as network sniffing and deep packet inspection (DPI), the network transit data can be collected and transformed into valuable insights.
Companies are not always monolithic when considering information systems. Many businesses have a distributed nature, such as retail or manufacturing, in which both brick-and-mortar shops and factories are spread out over many locations.
Those distributed locations also run information systems, including software and hardware, and the interoperability between these satellites and central (headquarters) systems has always been an issue due to the different solutions that exist, ranging from client-server, batch synchronization, independent organizations, and many hybrid approaches in between. Unconsolidated IT landscapes with unconsolidated distributed locations running legacy systems and/or different technologies are the main reason why the companies still have data silos. Companies face a big challenge to integrate such distributed silos into corporate intelligence.
Data integration technologies such as file transfer or ETL for database synchronization, and edge computing to embrace the capacity of distributing your custom logic over many places, represent the swiss-knife technologies for companies looking to consolidate distributed data.
To leverage traditional data we need a technology that supports a traditional data integration process, such as the well-known Extract-Transform-Load (ETL), currently supported by multiple products. For Dark Data, however, we need to be tackling more sophisticated and complex data sources. Three necessary steps need to be covered to leverage Dark Data and make them usable for further usage: Capture, Process and Store.
The capture step is about grabbing your hands on interesting data sources. Depending on the technique we use for capturing these data, we can be more or less invasive. For example, polling (request and response) data techniques can cause some overhead in our systems. Still, more innovative approaches such as network sniffing and deep packet inspection offer an opportunity to gather these data without impacting the related systems.
The process step consists of cooking the raw captured data to make them usable afterwards. On many occasions, the captured data will be massive and will require some degree of summarization. In other cases, the data will entail some enrichment, such as translating codes into understandable labels. Data processing should be strongly optimized to cope with the ultimate requirements for near real-time insights. Edge computing technology, for example, is helping to streamline the processing of Dark Data, as it helps to transform these large data sets in pre-processed data (smart data) close to the source, avoiding unnecessary network bandwidth consumption and costly massive computing power in cloud servers, by sending only cleansed and purpose-oriented data.
The final store step represents the closing of the ingestion. After tapping the right data source and processing whatever is needed to yield high-quality data, permanent storage is necessary to make these data usable. The location (cloud, on-premises), durability (volatile, permanent), structure (SQL, XML, JSON, etc.) and transport technology (HTTPS, MQTT, File, SQL database) are all aspects that need to be decided, but at the end of the day, as any other data set, once captured and processed our Dark Data will become data ready to be used for any further purpose. At the end of this step, our Dark Data has muted into “standard” data.
Leading organizations are already in the race to adopt innovative technologies that leverage valuable Dark Data.
The ocean of opportunities that this unexplored resource opens to business and operational improvements is hugely appealing for companies all around the world. Industries such as Finance, Healthcare, Travel, Hospitality, Logistics, Retail, Manufacturing, Telecommunications, Energy, and Utilities, are all living a profound and continuous transformation in where technology is one of the main pillars, and to be letting go the potential of more than 80% of their data is an acknowledged pain point for all of them.
As Datumize technology enables now to efficiently and affordably collect and manage these Dark Data, we are continually discovering new use cases and success stories, even from sectors less mature in terms of digitalization and data analytics penetration.
(Lightweight, high-performance, streaming enterprise data integration software for Dark Data)
(Cloud management graphical platform that covers the complete lifecycle of Datumize’s products)