The Dark Side of Data: Insights into Analytics and Dark Data

Conrad Rebello
May 31, 2024
9 min read

Updated: Jul 2, 2024

Tons of unused (dark) data sits in storage, potentially valuable for product teams, but its complexity makes it hard to interpret.
Digging into hidden data goldmines (dark data) gives PMs the edge to craft stellar products & streamline costs through unearthed customer intel.
Unlocking valuable product insights by following a structured approach involving data discovery, cleaning, integration, analysis tool selection, and data visualization.
Unused data raises ethical concerns about privacy, security, user rights, and energy consumption, requiring responsible management practices.

The relentless march of technology brings not only innovation but also new complexities. For today's product managers, one persistent challenge lies beneath the surface of this data explosion: the underutilization of a vast and valuable resource – dark data.

What is Dark Data? Insights into Hidden Product Analytics

An iceberg graphic with the tip referring to structured data while the rest of it consists of uncovered, dark data & is below the surface

The definition of Dark Data, as laid out by Gartner, refers to any kind of digital information which is collected and stored during regular business activities but not utilised. Data is said to be the new oil which has become a necessity for businesses to function. Advanced technologies have facilitated the collection of different kinds of data & from various sources, but the question lies, is it being used?

Today, the majority of the data that is stored in data centres have been labelled as dark. The quantity of unused data in some companies may be high but it can never be low. This is because the sheer volume and complexity of big data has the ability to overwhelm processing capabilities thus rendering them of no particular use. Organizations might lack the necessary infrastructure or expertise to handle and analyze all their data effectively.

Moreover, an organisation might not even be aware when data is being collected. As such, data keeps on accumulating. There are concerns about the ever-growing volume of data and content being collected and stored, particularly regarding its necessity and potential privacy implications. However, businesses argue that this data, even if seemingly irrelevant now, could hold unforeseen value in the future. By carefully analyzing this vast information pool, researchers might uncover solutions to problems we haven't even encountered yet. But the question arises, can product teams efficiently utilise dark data?

Dark Data as a Strategic Advantage for Product Managers:

The Modern-day Product Manager relies heavily on data driven decisions to make decisions regarding how to best please the customers. While seemingly insignificant on its own, dark data is data which when harnessed effectively, can be a powerful tool for product managers.

Various bubbles consisting of text explaining the benefits of using dark data, listed below in depth.

1. Data-Driven Insights:

Combining valuable insights from unused data with traditional data provides a more complete picture of the customer journey. Sources like customer support logs, social media mentions, product reviews, and user-generated content can offer invaluable insights into customer preferences, pain points, and behaviour patterns. This allows product managers to make data-driven products through decisions regarding product development, marketing strategies, and resource allocation.

2. Business Intelligence & Other Insights:

Beyond surveys, unused data offers a goldmine for product management teams. By keeping track of competitor social media, customer service interactions, and even sensor data, they can predict market trends, identify customer pain points, and optimize pricing strategies. This empowers them to make data-based decisions that keep them ahead of the curve and prioritize on creating winning products.

3. Data Compliance & Risk Mitigation:

Dark data sources like internal communications, legal documents, and regulatory filings can reveal potential risks, adherence issues, or legal concerns related to products or services. By taking advantage of AI and machine learning to scrutinise this data, product managers can proactively mitigate risks and ensure conformity with relevant regulations and industry standards.

4. Cost Optimization:

Dark data might be pivotal in identifying underutilized resources and inefficient processes, leading to significant cost savings. For example, by analyzing server logs, product managers can pinpoint areas where infrastructure usage can be optimized. This detailed analysis reveals opportunities to streamline operations, reduce waste, and allocate resources more effectively, ultimately driving down costs.

Strategies for Product Managers to Unlock Product Insights from Dark Data

1. Data Discovery:

Data discovery is the process of identifying and cataloging various data files within an organization, including untapped and unused data. By fostering collaboration and breaking down data silos, you unlock the true potential of dark data. Here are some common sources to consider -

Sensor data :

Data generated by IoT devices and sensors embedded in products provide real-time learnings into product usage, performance, and potential issues.

User Needs via Social media :

Posts, comments, likes, shares, and user interactions on social media platforms offer valuable learnings into customer sentiment, brand perception, and market trends.

App Usage Logs:

Uncover how users interact with features within your app. Identify underutilized features, discover usage patterns, and inform future development decisions.

Customer Support Interactions:

Discover patterns in transcripts from chat logs, emails, and phone calls to understand customer pain points, identify feature requests, and gauge overall customer sentiment.

After data is discovered, classify the data based on its type (e.g., text, images, audio, video), content, and potential value.

2. Data Cleaning:

The data cleaning step involves identifying and removing or correcting errors, inconsistencies, and inaccuracies present within the raw data. It can be approached in different ways -

Handling Missing Values:

Advanced automation tools can be used to detect and handle missing values, either by removing rows with missing data or imputing values using techniques like mean/median substitution or machine learning algorithms.

Removing Duplicates:

Duplicate records can lead to skewed analysis. These tools can identify and remove exact or problematic duplicates based on defined rules or similarity metrics.

Standardizing Unstructured Formats:

Data from different sources may have inconsistent formats (e.g., dates, currencies, units). Cleaning tools can standardize formats across the dataset for consistency.

After cleaning, the data may need further processing to prepare it for analysis or modeling. This can be done by methods of -

For Texts :

Tokenization - chops text into bite-sized pieces like words or phrases.

Stemming and lemmatization - tackles tricky variations of words, ("scanning" becoming "scan.")

Normalization - ensures consistency (converting everything to lowercase or removing punctuation)

For Images :

Resizing - Scaling images to a consistent size for efficient processing and analysis.

Format conversion - Converting image formats to meet specific requirements (PNG to JPEG)

3. Data Integration:

Following data discovery and cleaning, the focus shifts towards data integration, the linchpin that unlocks the true value of unused data. Previously isolated data points, like customer service interactions (unused data) and website behaviour patterns (traditional data), can now be connected, revealing valuable learnings that would have remained hidden. This process involves harmonizing the cleansed dark data with existing, well-structured data sources within the organization. It identifies and maps corresponding fields between disparate data sets. For instance, customer IDs in a CRM system might need to be mapped to user IDs in website analytics data. Data transformation might also be necessary to ensure consistency across formats, like converting dates to a universal format or standardizing names. Additionally, even after cleaning, inconsistencies might arise during integration. This step involves identifying and resolving any remaining data quality issues that could hinder analysis. Finally, the consolidated data set is loaded into a central repository like a data warehouse or a data lake (all data types) for further analysis. This centralized hub acts as the foundation for powerful insights.

4. Analytical Tool Selection:

When selecting analytical tools for data integration, especially in the context of unused data, organizations must consider factors such as the volume and variety of sources of data (structured, semi-structured, and non structured), data integration capabilities (extraction, transformation, loading, cleansing), data visualization and reporting capabilities, etc.

Data Type:

The data's structure plays a crucial role. Structured data (e.g., customer demographics in a CRM) is readily analyzed by traditional tools. Semi-structured data (e.g., log files) requires tools that can handle varying formats. Completely unstructured data (e.g., customer reviews) necessitates text analytics capabilities.

Objective Analysis:

This serves as the base for what a product manager is trying to answer with unused data. To understand customer sentiment, text analytics platforms dissect reviews, social media, and call transcripts, gauging emotions and brand perception. Machine learning tools, on the other hand, predict future events by interpreting sensor data and user activity logs, enabling preventive maintenance and anticipating customer churn.

Tool Capabilities:

Popular automation tools offer varying combinations of these features, but the optimal selection depends on the organization's specific data landscape, technical expertise, and long-term strategy. Statistical analysis software like R or Python are perfect for uncovering trends and relationships within organized data.

5. Data visualisation :

Data visualization plays a crucial role in deriving findings from integrated data, including non-structured data. Effective data visualization tools can help organizations transform complex and disparate data into clear, compelling, and actionable visualizations.

Support for Various Data Types:

As organizations merge diverse sources of data, including structured, semi-structured, and non structured latent data, visualization tools should support a wide range of data types, such as text, images, audio, video, and time-series data.

Storytelling with Data:

Visualization goes beyond static charts. Interactive dashboards allow users to explore the data themselves, leading to the discovery of new findings. This interactive storytelling with data empowers stakeholders to ask further questions and delve deeper into specific areas of interest.

AI/ML-Driven Visualizations:

Leverage AI and machine learning capabilities within visualization tools to automatically generate insightful visualizations based on the integrated data, reducing the need for manual exploration and analysis

Customization and Branding:

Organizations may require customization options for visualizations, including branding, colour schemes, and layout configurations, to align with their corporate identity and reporting standards.

Security and Data Governance:

Data visualization tools should provide robust security features, such as role-based access control, data encryption, and auditing capabilities, ensuring the protection of sensitive data and compliance with industry regulations.

Ethical considerations arising from Dark Data & Addressing Them -

1. Dark Data - Sourcing & Usage:

Responsible organisations must conduct thorough due diligence on data management practices. This involves verifying the provenance of the data, ensuring it wasn't obtained through unethical or illegal means. Disorganized data often contains personal information, like customer records or employee data. Unintended use or breaches are direct privacy violations. Collection and storage of information should be only for data that is necessary. Organizations must develop clear data retention policies with secure disposal methods for outdated data. Building trust and safeguarding user privacy require establishing clear internal policies.

2. Security Concerns & Data Governance :

Unmanaged data presents significant risks, including data breaches, regulatory non-compliance, and potential financial penalties to the organisation. The sheer volume of data further complicates the protection of critical information assets. Since its existence and location are often unknown, it becomes a prime target for malicious actors. To mitigate these risks, organizations require robust strategies with data conformity. Data privacy regulations like GDPR mandate that organizations have a clear understanding of the data they possess. Implementing regular data audits and deploying advanced security measures are crucial steps in transforming this unused data from a potential liability into a valuable asset.

3. Right to Erase Data :

When dealing with data, it is crucial to address the ethical consideration of providing individuals with the right to access and erase their data. Organizations must invest in advanced data discovery tools capable of scouring disorganized structures. They must develop robust data erasure and deletion processes that can securely remove personal data from formless data files upon request. Simply "deleting" data from a system might not be enough. Dark data can also lurk in backups, archives, and disaster recovery systems. Solutions to be considered may involve techniques like data overwriting, secure data wiping, and ensuring the complete removal of data from backups and archival systems.

4. Energy Consumption:

Storing massive amounts of unanalyzed data requires significant energy for data centres. These facilities constantly run servers, cooling systems, and other equipment, contributing to electricity usage and associated carbon emissions. Redundant copies, outdated information, and scattered data across various systems increase the overall storage footprint, requiring more energy to maintain. Data centres transitioning to renewable energy sources like solar or wind power can significantly reduce their carbon footprint. Combating the environmental cost of dark data requires a multi-faceted approach. Data centres are transitioning to renewable energy sources like solar or wind power which has significantly reduced carbon footprint. Additionally, power purchase agreements with renewable energy providers or on-site renewable energy generation opportunities can be explored.

A, balance scale graphic alongside the various ethical considerations, same as listed above

In conclusion

As we look to the future, the potential for leveraging dark data by product managers becomes increasingly promising. With analytics tools like Google Analytics and other advanced analytical tools, product managers can uncover qualitative and quantitative discoveries that were previously hidden. These discoveries will drive the development of new features and enhance product portfolios, allowing for more personalized user experiences throughout the lifecycle of a product.

The ongoing digital transformation in the business world demands that product managers use dark data to make informed decisions based on real-time input, customer behaviour, and iterative feedback. By integrating dark data into agile methodologies, they can rapidly iterate and adapt to fast paced market changes, ensuring alignment with user needs and business goals.

Moreover, as organizations strive to optimize their budget and resources, the ability to leverage dark data will be crucial in enhancing UX, improving user engagement, and adhering to best practices. The future scope for dark data is vast, promising a new era of innovation and efficiency in meeting dynamic business needs and driving sustained success.