April 22, 2021

Top Challenges with Data Centralizing for Threat Investigations

Threat investigations are one of the most important tasks security analysts face today. To quantify the importance and complexity here are a couple of statistics from a recent IBM “Cost of a Data Breach Report 2020.” According to the report, the average time to detect and contain a data breach caused by a malicious actor was 315 days. That’s a long time. Additionally, we’ve all heard the saying that “time is money” well how about this? “Organizations that are able to contain a data breach in less than 200 days saved an average of $1.12 million compared to organizations that took more than 200 days to contain a breach,” that is pretty compelling.

Threat investigations have long relied on data centralization, namely the SIEM which promised a “single pane of glass”. With all data in one location, analysts can detect, investigate, and contain threats faster.

However, organizations have long faced challenges with centralizing data, some of these challenges have evolved over time but we’re going to focus on the three main challenges faced today.

Challenge 1: Too Many Data Sources

Cloud services are the future of business. However, trying to collect, normalize, and store all of the data that hybrid and multi-cloud infrastructures generate is a beast. Setting up a centralized location that can get you real-time visibility into this fractured architecture means bringing together your:

On-premise IT
Infrastructure-as-a-Service (IaaS)
Platform-as-a-Service (PaaS)
Software-as-a-Service (SaaS)

Each of these must feed into your Security Information and Event Management (SIEM), Log Management System (LMS), or integrate with your Security Orchestration, Automation, and Response (SOAR) tool.

Pulling all of the different locations that generate data into a single location can feel overwhelming. In fact, I’ve never seen an enterprise with all of their data in one place, you need to manage the collection, data classification, normalization, retention periods, and TTLs on time-sensitive data. It’s like loading the dishwasher; you need to plan out how to fit everything in one place and hope it’s all dishwasher safe so it is useful the next time you need it.

Challenge 2: Too Many Data Types

Most security professionals will agree that data security acronyms are the alphabet soup of technology. When trying to aggregate data into a single location, you can feel like someone trying to add too many alphabet noodles to the soup. You know that the more data you aggregate, the better the investigation will be. However, each data source uses a different format, making it hard to correlate events across a distributed infrastructure. Ultimately as important as the data itself is how it is presented, facilitating the analyst’s understanding of how it’s related and interpreting what it means.

To appropriately correlate data for meaningful searches, you need to be aggregating, at minimum, the following data types (as available in your environment):

Identity and Access Management (IAM)
Vulnerability Assessment (VA)
Endpoint Detection and Response (EDR)
Intrusion Detection Systems (NDR)
Threat Intelligence (TI)
User and Entity Behavior Analytics (UEBA)

While everyone loves having more alphabet pasta in their soup, no one enjoys trying to collect and aggregate the alphabet of security into a single location.

Threat investigations today are the equivalent of that bowl of soup. Analysts are expected to scan, identify, and collect each letter, then organize them and reassemble them word by word into a complete sentence by hand.

Challenge 3: Too Expensive

Security data centralization is expensive. Consider for a minute the size of just event log information, per Google:

256 KB: size of a log entry
512 B: length of a log entry label key
64 KB: length of a log entry label value

Even more concerning, VMWare’s log ingest rate lists the following:

Small: 30 GB/day log ingest rate requiring 8 GB memory to manage 2000 events per second
Medium: 75 GB/day log ingest rate requiring 16 GB memory to manage 5000 events per second
Large: 225 GB/day log ingest rate requiring 32 GB memory to manage 15,000 events per second

Moreover, this only covers event log data when you might also be collecting information from IAM, IDS, and EDR solutions separately and it only covers the infrastructure.

Compound this further with the software provided by data storage vendors and much-detested ingestion-based pricing models. Ultimately, collecting all of this information and bringing it to a single location becomes cost-prohibitive.

What this means is that organizations are forced to choose what information they will collect and are likely to struggle when an incident arises.

Query.AI: Centralized Insights without Centralizing Data

Query.AI has a new and unique approach to threat investigations giving analysts the power to access data where it lives instead of centralizing it. Analysts get the access and centralized insights they need while alleviating the three primary problems organizations face when trying to centralize data.

Query.AI provides the market’s only security investigations control plane for modern enterprises. Its browser-based platform delivers real-time access and centralized insights across on-premises, multi-cloud, and SaaS applications, without duplicating data from its native locations. With Query.AI, security teams gain a simple and effective way to meet their security investigation and response goals while simultaneously reducing costs.

Contributed by:

Query

Simplifying Search