This blog is part 2 of a 4 blog series on Measuring and Optimizing Enterprise Security Search Costs. See part 1 here.
Security teams are collecting, centralizing, and storing data in SIEMs, EDRs, enterprise search platforms, big data lakes, and vanilla cloud blob storage. The primary purpose is to store, lookup, and investigate activity data of individual cybersecurity “entities” of interest such as Devices, IPs, File Hashes, Users, Emails, etc.
How the investigation starts
Analysts’ investigations typically originate from an alert or a threat hunt. In either case, the analyst has a starting point — an entity of interest. For example, let’s make the starting point a file hash from a suspicious malware alert over an email attachment. The analyst investigation would initiate their investigation by looking for that file hash in all their security data platforms. This typically means multiple browser tabs looking through each console.
Beyond the browser level tabs, there are then tabs within any given search console. For example, in a SIEM platform, the analyst would use the SIEM console to run multiple searches across different data sources.
There are two levels of searches: let’s say N platform consoles, and average M console-specific subsearches. The total number of searches for the initial entity of interest then is M x N searches. In our conversations with several analysts, we found that M x N is typically high single digits and sometimes low double digits. (For more on how we interviewed analysts, please see Top Three MDR Investigation Challenges.)
The above searches are initial level to see what results match for that file hash, across the data sources. In the next step, the analyst performs a series of hops/pivots and comes up with a new set of searches to run.
How the investigation pivots
The analyst would search threat intelligence sources to confirm whether the file is malicious. Next, they would search what devices have had that file. The analyst will then run new sets of searches to find which users own those devices. Suspecting those user accounts are compromised, yet another set of pivoted searches would be to review those users’ activities. Then the analyst may pivot to search for and investigate external IPs those devices communicated with after the potential malware execution.
The pivoting across the graph of possibilities to follow a chain of interest goes on and on, and requires more of the above searching in different data sources. For our estimation of these entity pivots covered further in this document, we will use ‘L’ to represent the number of entity pivots for follow up searches, i.e. links in the chain.
The question then arises: why isn’t the above investigation fully automated via SOAR?
Indeed some of the most common paths could be automated. (See the survey results at Top SOAR: Learnings, Successes, and Challenges – Query.) In our survey, we heard that, on average, only the most simple three paths are suitable for automation. For the other paths, analysts have to make human decisions based upon the data, and then pivot accordingly. The analysts bring their own environmental context, efficiencies, and instincts and decide which paths to follow and which to discard. In the above malware investigation example, they may decide to search for user activity by email only in some of the data sources where they know it is a relevant search for their current investigation. Therefore, for our estimation formula, we will bring ‘p’ as the factor that represents the percentage of paths analysts follow, since they make data-driven human decisions and discard unnecessary paths.
What is the relationship between the above factors and is there a way to measure and optimize? Let’s look at those aspects next.
Defining and calculating “Analysts’ Searches per Investigation” (ASPI)
While it is difficult to know all the search steps in the path analysts follow, we have to make some estimates. Based upon our learnings from the analyst interviews referenced earlier, we define “Analysts’ Searches per Investigation” (ASPI) as below to reflect the number of manual searches during an investigation:
ASPI = L x M x N x p, where:
- L is average number of entity pivots
- M is average number of consoles
- N is average number of data sources within a console
- p is the percentage of possible paths that analysts decide to follow
In our conversations with analysts (see the interview process here), we heard the average estimates for a typical investigation (an investigation like the suspected malware in email attachment discussed above), as:
- L: 3 entity pivots
- M: 3-5 consoles
- N: 3-5 data sources within a console
- p: 25-75% of searches considered relevant and performed
The ASPI would then be 3 x 4 x 4 x 0.5 = 24 different search operations to complete the investigation. We did not have this formula shared, or even defined, on our end during the analyst interview process. We only asked them the number of search operations per investigation, and the answers were 5 to 50 searches per investigation. A lot of it can be attributed to the maturity of the organization’s cybersecurity program and the analyst resources available to truly complete the work. We heard of several scenarios where the team was knowingly cutting corners and doing limited investigations because of lack of resources.
Putting it all together
For now, we believe that our ASPI formula is a good way to estimate the number of searches analysts need to complete an investigation in their current infrastructure. (Please contact us at contact@query.ai if you would like to share your opinions, agreements/disagreements, and experiences over the above estimation process. We would love to hear from you.) We will be validating the above formula in our subsequent analyst feedback interviews.
High ASPI increases analysts’ costs and impacts their efficiency. Open Federated Search for Security reduces ASPI by an order of magnitude. This comes from its abilities to run parallel searches across all external platforms and automatically run followup queries for relevant entity lookups.
Next week: Reducing/Optimizing ASPI, a measure of manual human steps.