October 5, 2021

Moving Past Universal Data Centralization

As we discussed in our last blog on SOC evolution, once upon a time things were much simpler than they are today. Data volumes were initially small and largely network based, which enabled organizations to house all data on-prem, and tuck it behind a cozy perimeter where they could stack protective security technologies. Back then, organizations wanted to centralize all their data into one data store so they could manage logging and compliance alongside the detection and response capabilities they needed.

As technology evolved, new capabilities with different data types and data formats required extensibility beyond the typical network log that contained IP addresses, ports, and the number of bytes transferred. The inclusion of new data sources like threat intelligence, file monitoring, users, and applications blew up data volumes and started to erode the pipe dream of a single, centralized data store.

The “Original Gangster” (OG) – Technological Limitations

The proof is in the pudding as some have said. Thinking back to my time at ArcSight, the promise of a single pane of glass for users to drive their security operations was never realized, and has long remained out of reach.

For example, when ArcSight started out with its Enterprise Security Manager (ESM) product, it was using Oracle as the primary relational database for its back end. We learned that Oracle could be configured for large numbers of small transactions, such as writing event logs with volumes in the thousands per second, or alternatively as a data warehouse with a much smaller number of bulk data imports on which to run queries and reports to get answers. Oracle could not handle both well – high volumes of consistent write activity with a significant number of large queries against the dataset for analysis – which is, in essence, what SIEM required.

This is important because it forced the first of many changes, which led to the decentralization of centralized data way back in 2007.

ArcSight introduced Logger, which had a homegrown database intended to overcome some of the technical challenges associated with the relational databases of the time. The introduction of Logger created a tiered architecture, and encouraged customers to send the bulk of their data to distributed logger tiers and forward a subset of those events to ESM for specific analytics and analysis.

In theory this could have worked, but even single vendors with multiple solutions failed to create the integrations that customers desired, and therefore never delivered on the promise of a single pane of glass. As a result, analysts were still consistently pivoting from ESM, to loggers, to other products. They were looking at multiple interfaces, asking multiple questions, and having to learn and use multiple query languages. This was obviously not intentional on the part of ArcSight, and is certainly not a criticism of ArcSight. I’m simply trying to communicate that gaining insights into decentralized data has been a really difficult problem for a long time, and it continues to prove challenging to this day.

The Reality Problem – This Sh$% Is Expensive, Oh, and That Other Stuff

You may be thinking to yourself, “That is so old school. Technology has changed, and scale isn’t an issue today.”

I largely agree, however technology was only the initial problem. There are many companies that successfully cracked the code to build highly scalable and distributed collection, indexing, search, and analytics capabilities, built with their own intellectual property or by leveraging big data technologies like the apache technology stack scale is largely a thing of the past. Still however, universal data centralization and a single pane of glass remained out of reach. There are three main reasons for this: cost, type of data, and politics and bureaucracy.

Cost – Technologies that centralize data are expensive, and the alignment to ingestion-based pricing has forced organizations to be selective and decide what data should go to their central repository and what should not, namely very verbose high-volume data sources such as network and endpoint which are both extremely valuable in support of security investigations.
Types of Data – To support security investigations context is king and, generally speaking, contextual data is point-in-time, meaning you must get it from the source of truth and not from an archive. There are a few simple examples of this. One is threat intelligence, which queries directly from the threat intelligence platforms and ages out almost as fast as it is created. Another example is Identity and Access Management (IAM). You can’t figure out from a central repository who a user is, what role they have, their current access, or whether their account is active, locked, or disabled. You need to go to the IAM system that authenticates the user and grants access in real time.A third example is asset management information that can reside within configuration management databases (CMDB) or vulnerability management systems, which provide insight into the current state of an asset, the operating system, vulnerabilities, what it’s used for, and how critical it is. Asset information has always been difficult to lock down, but in the world of here today and gone tomorrow cloud systems, the data necessary to answer these questions could live anywhere.
Politics and bureaucracy – While it’s disheartening to believe, it’s not only our government that is laden with politics and bureaucracy. These two challenges exist in almost every business in the world. Security is negatively impacted when teams struggle to gain access to systems and data that are owned by different teams or departments, and when the priorities of those functions are misaligned or competing. This often creates a tension and wall that security professionals are unable to work around. I can’t tell you how many times I’ve heard in my career, I’d like to have that data, but “John” runs that team or system and he won’t play nice.

In short, I’m saying that universal centralization has always been a lofty goal. It was unattainable in a much simpler time, but in today’s world, it is honestly impossible. Companies should stop trying to fit the square peg into a round hole, leave their decentralized data where it lives, and embrace new approaches to accessing, gaining context, and acting on that data with modern capabilities.

Contributed by:

Query

Simplifying Search