
A decade ago, log management was commonly used to capture and retain events for compliance and security use cases. As adversaries and their TTP’s grew more sophisticated, simple logging evolved into security information and event management (SIEM) and the power of rule-driven correlation made it possible to turn raw event data into potentially valuable intelligence. Albeit challenging to implement and make everything work properly, the ability to find the so-called “needle in the haystack” and identify attacks in progress was a huge step forward.
Today, SIEM’s still exist, and the market is largely led by Splunk and IBM Q-Radar. Many customers have finally moved into cloud-native deployments, and are leveraging machine learning and sophisticated behavioral analytics. However, new enterprise deployments are fewer, costs are greater, and — most important — the overall needs of the CISO and the hard-working team in the SOC have changed. These needs have changed because security teams have almost universally recognized that they are losing against the bad guys. The reduced reliance on the SIEM is well underway, along with many other changes. The SIEM is not going away, but its role is changing rapidly, and it has a new partner in the SOC.
Why the role of the SIEM is rapidly Diminis
It is Too narrowly focused: the mere collection of security events is no longer sufficient because the aperture on this dataset is too narrow. While there is likely a lot of event data to capture and process in your events, you are missing out on vast amounts of additional information such as OSINT (open-source intelligence information), consumable external-threat feeds, and valuable information such as malware and IP reputation databases, and even reports from dark web activity. There are endless sources of intelligence, far too many for the architecture of a SIEM.
COST (Data explosion + hardware + license costs = bad outcome): With so much infrastructure, both physical and virtual, the amount of information being captured has exploded. Machine-generated data has grown at 50x, while the average security budget grows at 14% y-o-y. The cost to store all of this information makes the SIEM cost-prohibitive. The average cost of the SIEM has skyrocketed to close to $1 million dollars annually, which is only for license and hardware costs. The economics force teams in the SOC to capture and/or retain less information, in an attempt to keep costs in check. This causes the effectiveness of the SIEM to become even further reduced. I recently spoke with a SOC team who wanted to query large datasets searching for evidence of fraud, but doing so in Splunk was cost-prohibitive and a slow, arduous process, leading to the effort to explore new approaches.
The results are terrifying — A recent survey by the Ponemon Institute surveyed almost 600 IT security leaders and found that, despite spending an average of $18.4M annually, and using an average of 47 products, a whopping 53% of IT security leaders “did not know if their products were even working”. Clearly, a change is in order!
Enter the Security Data Lake
Security-driven data can be dimensional, dynamic, and heterogeneous, thus, data warehouse solutions are less effective in delivering the agility and performance users need. A data lake is considered a subset of a data warehouse, however, in terms of flexibility, it is a major evolution. The data lake is more flexible and supports unstructured and semi-structured data, in its native format and can include log files, feeds, tables, text files, system logs, and more. You can stream all of your security data, none is turned away, and everything will be retained. This can easily be made accessible to a security team at a low cost. For example, .03 cents per/GB/per month if in an S3 bucket. This capability makes the data lake the penultimate evolution of the SIEM.

(Image source: http://www.redkid.net/generator/sign.php)
If you are building a security data lake, you will be able to focus on more strategic activities:
Threat hunting: Sophisticated adversaries know how to hide and evade detection from off the shelf security solutions. Highly skilled security teams will follow a trigger — which can be a suspicious IP or an event — and go on the attack to find and remediate the attacker before damage occurs. The experience of the threat hunting team is the most critical element for success, however, they are highly reliant on vast amounts of threat intelligence so they can cross-reference what they are observing internally, with the latest threat intelligence to correlate and detect a real attack.
Data-driven investigations: Whenever suspicious activity is detected, analysts begin an investigation. To be effective, this must be an expeditious process. With the industry average of 47 security products in use in the typical organization, this makes it difficult to gain access to all of the relevant data. However, with a security data lake, you stream all of your reconnaissance into your data lake and eliminate the time-consuming work of collecting logs. The value of the process is to compare newly observed behavior with historical trends, sometimes comparing to datasets spanning 10 years. This would be cost-prohibitive in a traditional SIEM.
The data lake automates the processing of the data when loaded (known as parsing), making it even easier for the security team to focus on the most critical elements of their job, preventing or stopping an attack.
Large volumes of historical data, often going back a decade, to determine if a specific pattern is typical or an anomaly.
Interesting companies to power your security data lake:
If you are planning on deploying a security data lake or already have, here are three cutting edge companies you should know about. I am not an employee of any of these companies, but I am very familiar with them and believe that each will change our industry in a very meaningful way and can transform your own security data lake initiative.
Team Cymru is the most powerful security company you have yet to hear of. They have assembled a global network of sensors that “listen” to IP-based traffic on the internet as it passes through ISP’s and can “see” and therefore know more than anyone in a typical SOC. They have built the company by selling this data to large, public security companies such as Crowdstrike, FireEye, Microsoft, and now Palo Alto Networks, with last week’s acquisition of Expanse, which they snapped up for a cool $800M. In addition, cutting-edge SOC teams at JPMC and Walmart are embracing what I espouse in this very article and leverage Cymru’s telemetry data feed. Now you can get access to this same data, you will want their 50+ data types and 10+ years of intelligence inside of your data lake to help your team to better identify adversaries and bad actors based on certain traits such as IP or other signatures.
Varada.io: The entire value of a security data lake is easy, rapid, and unfettered access to vast amounts of information. It eliminates the need to move and duplicate data and offers the agility and flexibility users demand. As data lakes grow, queries become slower and require extensive data-ops to meet business requirements. Cloud storage may be cheap, but compute becomes very expensive quickly as query engines are most often based on full scans. Varada solved this problem by indexing and virtualizing all critical data in any dimension. Data is kept closer to the SOC — on SSD volumes — in its granular form so that data consumers can leverage the ultimate flexibility in running any query whenever they need. The benefit is a query response time up to 100x faster at a much cheaper rate by avoiding time-consuming full scans. This enables workloads such as the search for attack indicators, post-incident investigation, integrity monitoring, and threat-hunting. In short, Varada can help your team gain access to the data they need, get consistent and interactive performance, and stop worrying about managing usage costs or dealing with data ops.
Panther: Snowflake is a wildly popular data platform primarily focused on mid-market to enterprise departmental use. It is not a SIEM and has no security capabilities. Along came engineers from AWS and Airbnb and created Panther, an open-source security platform for threat detection and investigations. The company recently connected Panther with Snowflake and is able to join data between the two platforms to make Snowflake a “next-generation SIEM” or — perhaps better positioning — evolve Snowflake into a security data lake. It is still a very new solution, but it’s a cool idea with a lot of promise for the future for Snowflake customers.
In summary, the average security organization spends $18M annually and has been largely ineffective at preventing breaches, IP theft, and data loss. The fragmented approach has not worked. The Security Data lake, while not a simple, “off the shelf” approach, centralizes all of your critical threat and event data in a large, central repository with simple access. It can still leverage an existing SIEM, which may leverage correlation, machine learning algorithms and even AI to detect fraud by evaluating patterns and then triggering alerts. However configured, the security data lake is an exciting step you should be considering, along with the three innovative companies I mentioned in this article.
I would love to hear your thoughts about how a security data lake can help you and your team and what it means for your existing SIEM investment. You can reach out directly at dan@hightide-advisors.com or https://www.linkedin.com/in/schoenbaum.
If you’d like to read more of my articles — focused on cybersecurity and advice for investors and executives on how to improve company go-to-market can be found here: https://schoenbaum.medium.com/
Comments