Cloud Data Invisibility Blog Neal Bridges

April 13, 2023

Overcoming Cloud Data (In)visibility without Centralization

The security search revolution has begun – and I’m 100% on-board.

‘Invisibility’ might be a strong word, but ‘limited visibility’ is an understatement. Cloud is taking over business operations at an aggressive pace, but our access to the mounting data is restricted by the amount of money we are willing to spend on our SIEM or other centralization technology (AWS S3, Snowflake, etc). This is not an acceptable security practice, but it is the way it is – or at least that’s how the industry has traditionally seen it. Thankfully, a better way is on the rise, and I am so grateful for this team of pioneers!

I remember when I created the SOC at Abbott Laboratories over a decade ago. This was the first time I put the theory I had learned from my experience as an Ethical Hacker into practice building my own successful security operation. The one thing I was certain about on my first day: visibility is key. I needed a SIEM, and because most of the threats hit the user first, I needed visibility into the users. I needed a reference architecture model that consisted of visibility into the endpoints, including email. And I needed log centralization. Voila! Now I can secure the network while I listen to “#SELFIE” by The Chainsmokers.

Fast forward to 2023. Visibility is still key, but the scope of visibility has drastically changed in our post COVID world.

Visibility isn’t just about the users on the endpoint anymore. Now, we have this sprawling cloud infrastructure with very articulate and well-designed cloud applications like SalesForce or Hubspot, Azure or AWS, and the list goes on. The perimeter has effectively extended well beyond a physical on-prem base to each and every application your company opens from wherever they are in the world. Eureka! Visibility needs to include your cloud infrastructure! Unfortunately, that’s easy to say but hard to implement, because you’ve got hundreds, if not thousands of microservices to account for. In order to manage full-court visibility, we have several major problems to consider.

Problem 1: Inventory

Inventories are huge. I mentioned this in my last blog: you need a software bill of materials.

On-prem infrastructure is mostly hardware inventories. If you bought SAP or Oracle, you got a license, and you installed it on a server inside of a data center. So you really only kept inventory on the physical data center or physical server inside the data center. Easy peasy.

Now, you don’t usually get the same level of hardware licensing with virtual infrastructure, like Docker, Kubernetes, etc. Much of the software used to build modern day SaaS applications is derived from the open source community. Unfortunately, they often do not come with a bill of materials. How do you protect it if you don’t know what’s in it?

You might have an elastic container that has a digital copy of a piece of software. Or, you may not be able to keep an inventory of that virtual server, because a truly elastic server infrastructure expands and contracts based on load inside of the cloud. You spin up more of these servers and you spin up more of these microservices and they expand as you expand your service offering. How are you going to keep track of that expanding digital physical infrastructure as your business expands and contracts based on demand?

Alas, we have no real visibility into what our risk footprint is as we expand into the cloud.

Problem 2: Vulnerability Management

Many cyber security experts haven’t yet put into perspective the problem with visibility into vulnerabilities in your cloud environment.

In legacy on-prem, you would likely have an internal and external vulnerability scanner like a Nessus or Qualys. You would scan an IP address, receive a set of vulnerabilities, submit those to IT, and IT would be able to patch the vulnerabilities. It was pretty linear. Of course, vulnerabilities don’t exactly work like that inside the cloud.

Let’s say you have a container that is only instanced when a particular microservice is being called for that application. The container may include a piece of code that has a vulnerability in it. The container spins up, and you get a vulnerability warning from something like AWS Inspector. But then the container shuts down when that microservice is run. What’s the actual vulnerability exposure there?

In an on-prem infrastructure, a vulnerability runs 24x7x365. It is always on in your infrastructure. If an attacker were to be in your network, scan your environment, and see that vulnerability, he’d be able to exploit it because it is always there. But in a cloud piece of infrastructure, a vulnerability may only exist for 15 minutes before it turns down. That microservice may then be spun up 1000 times to support your organization. Is that 1000 vulnerabilities, or is it only one vulnerability with one particular piece of software? Do you still track it? Keep a record of it? Do you still try to patch it? What’s the actual risk?

Vulnerability and risk management best practices are currently under construction, but you’ve still got to protect your organization while we figure it out.

Problem 3: Centralization

Centralization was originally the cure for problems one and two, but, as it goes, is now also a problem. The traditional solution was to take all of your cloud watch data, cloud trail data, on-prem data, vulnerability data, etc. and pull it into your SIEM.

I think back on my decision tree from Abbott, and it was clear that I first needed to centralize. I set up a SIEM to have a single pane for all of my data. But, in today’s cloud environment, that is now completely cost prohibitive. Cloud infrastructure has effectively 2, 3, or maybe 4Xed, the amount of data that needs to be centralized.

So, the current situation is that you know you need to centralize, but only have so much money. Let’s assume you have $10 million to spend on Splunk or Elastic or some other SIEM vendor. You find out what size bucket that $10 million buys for you: one, two, three, four terabytes, whatever the case is. As a security leader, you have to look at the size of the bucket and decide, “What is the most important thing I can put into that bucket?” No leader puts 100% of their infrastructure, or even 100% of the most important things, into that bucket because there just isn’t enough room.

So, you figure out what percentage of your organization should be visible by centralizing, and everything else relies on the pivot. Analysts have to keep a tab open for Guard Duty, AWS, EDR, or another piece of technology, receiving alerts in each and having to respond across the board.

Modern day visibility is a fiscal construct based on the size of a bucket, and it’s not practical because of the sprawl of our ever-growing cloud infrastructure.

How to Overcome the Visibility Struggle

We have outgrown the mentality that this bucket is the only data I should care about. Cyber defense is in a losing fight, and to know that we then have to make terrible financial decisions about visibility is a horrible place to be. Each of the problems I listed above exist because of an antiquated mentality.

A group of security practitioners who have spent their lives inside of security organizations, inside of the security investigation space, inside of security operation centers, have said we are DONE with the status quo when it comes to security investigations and searching for your data. We realized a pain that has existed in our industry for decades, and we have built a solution.

Most SOCs today receive an alert regarding malware for analysts to review, then pivot to Crowdstrike or other technology to investigate which command and control (C2) it is calling out on, the hash of the malware,etc. These data points are still available in the API, but because of modern day SIEM architectures, it doesn’t come over with the alert, causing analysts to spend too much time searching for answers. With federated search, you can not only get that alert, but you can see all relevant data without pivoting into other technologies to do it.

SIEM vendors’ answer to gaining visibility was that you have to centralize, and buy more ingestion or “events per second” costs. It worked for a while, but now we know better. We know the data exists on technology. Why don’t we just leave it there and search for it on the technology? Just leave the data where it is and get more visibility.

This is a big problem to solve and a huge opportunity to transform security operations. At Query, we’re working on it full time, stay tuned for more.

Let the security search revolution begin!

Contributed by:

Neal Bridges

CISO, Query