Databricks

Databricks is a multi-cloud Data Lakehouse platform that supports business intelligence (BI), data streaming, warehousing, data science, and security-relevant use-cases. In their own words, Databricks describes the platform as: “the Databricks Data Intelligence Platform is built on lakehouse architecture, which combines the best elements of data lakes and data warehouses to help you reduce costs and deliver on your data and AI initiatives faster. Built on open source and open standards, a lakehouse simplifies your data estate by eliminating the silos that historically complicate data and AI.”

Security and IT teams use Databricks as a direct Security Information & Event Management (SIEM) replacement or as an alternative data store to expand SIEM use cases such as enrichment, big data analytics, machine learning (ML), artificial intelligence (AI), and detections. Like other similar warehouses such as Snowflake or Google BigQuery, Databricks has a centralized catalog (the Unity Catalog) that registers all metadata within the platform and organizes like datasets into Schemas which in turn have their own Tables (or Views, Federated Views, and/or Materialized Views) that contain batch- and streaming-data from upstream sources such as Configuration Management Databases (CMDBs), Endpoint, Network, Identity, and Cloud-native data sources.

Databricks’ Unity Catalog provides more than just metadata management by also allowing governance, monitoring, observability, and Role-based Access Control (RBAC) into data objects in the catalog. It uses the Delta Lake open table format and uses the Delta Sharing Protocol to support external federation use cases, multi-cloud data hosting, and integration into big data frameworks such as Apache Spark and BI tools such as PowerBI and Tableau.

Query extends the use cases of Databricks — be it as a direct SIEM replacement or augmentation datastore — by supporting incident response, investigations, threat hunting, red team targeting operations, and additional continuous compliance, audit, or governance tasks through the usage of federated search. Query does not retain or duplicate any data, does not rely on per-search or ingestion cost models, or require users to learn how to use the Unity Catalog, Delta Lake table format, write optimized SQL queries, manage additional infrastructure, nor learn Spark or other concepts.

Using the Query Configure Schema no-code workflow, customers can onboard any table of table or view from Databricks and fully map the data from a selection of OCSF event classes, including their own Entity mappings. This allows any data – normalized and standardized, or not – ingested into Databricks to be utilized from the Query platform. This includes the Query App for Splunk, allowing engineers and analysts with SPL experience to used a SQL-based warehouse with Databricks without any additional training or cost of ingestion into a Splunk index.

Some use cases that can be fulfilled using Query with Databricks include:

  • Incident Response: Use Query to correlate an Indicator of Compromise (IOC) from an event in an EDR or CNAPP tool against onboarded network, identity, or cloud-based logs stored in Databricks.
  • Investigations: Use Query to search across HRIS or ERP data and/or IAM logs stored in Databricks tables to aid in insider risk investigations where users are joiners, movers, leavers, or vacationing in high risk areas.
  • Threat Hunting: Use Query to perform multi-value searches across IOCs and exploitable vulnerability data such as CVE IDs, IP addresses, user agents, subnets, ports, hostnames, and more to find evidence of attempted contact or bilateral communications within your network and application logs in threat hunting.
  • Compliance management: Search across either known resources by their GUIDs and names, or across specific compliance findings from upstream CSPM and CNAPP tools or data pulled from ERM, IRM, and TPRM tooling into Databricks
  • Internal Audit: Search for any device or user across all logs stored in Databricks to check for logins that don’t use MFA, plaintext passwords assigned to users, or non-challenged authentication attempts in your tech stack.

To integrate with Databricks see the integration documentation here.

For more information about Query’s Data Model based on OCSF see here.