Tuesday, January 16, 2007

10 Pounds of Packets in a 5 Pound Bag


Richard Bejtlich has been talking a lot about the difference between Network Security Monitoring (NSM) and "alert-centric" technologies like Snort. His basic premise is that "real" NSM requires more than just IDP alerts and packet logs, it requires event notifications, full packet logs of the entire network and flow data as well. He also quotes me as saying "Richard, I wrote Snort so you don't have to look at packets". This isn't quite right. I think what I actually said was "Did you think about how much data you're going to record if you do that on a high speed network? We wrote IDS so that we wouldn't have to record everything."

I get it, I understand what the NSM guys are saying and I really don't disagree with them at all. The problem I have is that if you try to deploy this concept in a large network environment with lots and lots of sensors, you've got some big problems to overcome. Let's look at the problems.

1) Flow aggregation - As I see from Richard's latest post on Cico's MARS product, he wants the raw flow data, not just statistical NetFlow rollups. RNA does that already, as can Snort with the right options turned on. This works fine as long as the network environment is relatively small and you don't try to roll up all of the data for post processing and analysis. If you do aggregate it to a central collector, then you've got the multiplication problem on the aggregation link(s), namely the more traffic the sensors see and the more sensors you have the more they pump up to the collector and have to push into the database that you're using to be able to manage all this information. If you're in an environment where you're aggregating more than a few million flows per hour, that's a ton of data to manage if you figure ~40-bytes per flow record (in binary format). That's 200MB of data just for flow data for an hour, almost 5GB per day. 150GB per month. Those also have to get blasted into a database so they can be worked with, so you're going to be translating all that data into SQL insert statements and then pumping it into the local database on your aggregation machine or across the network to a database server (or cluster). That's a lot of processing, a lot of network bandwidth and a lot of disk, not to mention a lot of RAM to maintain the indices in memory for the database. It's not that this isn't doable, but now we're talking about offloading the work across multiple machines at a minimum and that's going to increase your costs dramatically. Overall this isn't a huge problem (the NetFlow analysis/NBA guys do it for a living) but it is a big one in any large enterprise, it takes a lot of work to scale technology to work with it effectively.

2) Traffic aggregation - If you thought the flow aggregation problem was fun then start logging all the traffic on your network. Let's take a fairly well utilized modern enterprise network backbone running at a sustained 500Mbps, that's 62.5MBps of data to record on a single sensor. 225GB/hour of packet traffic, 5.4TB per day from a single sensor. All that data is going to need to be rolled up too, unless you're going to spool it into a local database and do distributed queries across the network for packet traces. At that kind of data density your NSM sensor is going to need a NAS device someplace nearby so that the data can be stored, it's going to be really hard to do that on a 1U appliance just due to physical drive space limitations. Once you have all that data, you're going to need to be able to work with it, so it's got to be in a database or it has to be indexed on the filesystem in some logical fashion so that smaller chunks of data can be rapidly located, decoded and presented to the user on demand. There are companies that build products to do this, I can't really speak to their effectiveness. I can hook a high-speed collection process like daemonlogger up to a big disk and grab all this data, but once again how much value are you really getting for recording all that data vs the logistical overhead of trying to maintain all that information in a usable fashion for extended periods of time? What's the time horizon if this data? Do I need to keep a week/month/year of this data live in a database for referential purposes? If there's going to be any expectation of success the amount of data that's kept "live" is going to have to have some pragmatic constraints.

3) Alert aggregation - This is what IDP vendors spend their time working on getting to their users. We have pretty well established metrics as to what is acceptable in this realm in terms of sustainable event rates, data overload thresholds for analysts, data density and so on. This is the de facto standard in IDP because this is the thing that people are paying for, we've got to generate the events and everyone wants to see them since that's what the technology is supposed to be doing. This is a lot of data to deal with too, and this is the raw information that analysts have to work with. We do a lot at Sourcefire to pare down the number of events analysts have to deal with via our Impact Assessment technology that's enabled by RNA, so it is possible to do effectively in large environments even with less than optimal tuning of the sensor infrastructure.

When I started Sourcefire one of the things that I decided to do to get people to want to pay for something that was free (i.e. Snort) was to try solving their data management problems. If you look at most of the IDS vendors before Sourcefire was founded, they would sell you IDS sensors and management front-ends but they wouldn't solve the biggest problem that most people would have once they deployed the technology, namely managing the information produced by the sensors. As we all know, IDS can generate immense amounts of data with just alerts and if you want to be able to work with that data it needs to go into a database that has been optimized for the data set. Prior to Sourcefire, you could buy $250k worth of sensors from vendor X and when you deployed the sensor grid you'd call vendor X and ask them how you're going to manage all those alerts. Their answer was typically "go call Oracle, they make a really nice database and we'll sell you professional services if you need help setting it up." This greatly increased the cost and complexity of deployment of the IDS solutions. When Sourcefire started I decided that this was an area where we could add real value, so we built what is now called Defense Center allowing customers to have a plug-n-play appliance that solved their data management problems and provided a path to deploy large infrastructures of our gear quickly. As you can see from our S-1 filing, this was probably a Good Idea.

A "real" NSM infrastructure is going to primarily be built around the idea of collecting, moving and storing data and then making it highly available in a variety of presentation formats for users. If you try to do this on a network that's generating lots of traffic across lots of sensors/segments, the likelihood of building a scalable solution that anyone is willing to pay for is vanishingly low. You're going to need hundreds of terabytes of disk, a dedicated out of band management network for moving data, huge database servers AND the management and sensing infrastructure to actually grab the data.

Now we want to scale it. I know from experience that there are large distributed international enterprises out there that have remote offices sitting on the other side of 128kbps (and below) links. They get really irritated when you saturate that link to pump out a continuous stream of security data. These organizations also have core networks with 10Gbps links that can sustain 2+Gbps of internal traffic for hours. That's a couple terabytes per hour of traffic you want to log, give or take, just in the core. Then you have the rest of the enterprise with 100+ sensors deployed that are seeing varying amounts of traffic but say none of them go below 10Mbps typically, so that's another TB of data every hour you've got to collect and forward to a central aggregation point. Then we throw in the flow data (lots of small records to insert) and the event data (more small records to insert) and you've got a data aggregation nightmare. Concentrating this data to a central collector or a load balanced set of collectors will saturate a gigabit line so you're going to either have to figure out how to leave it local on the sensors and perform distributed queries against it or you're going to have to deploy a bunch of additional network gear to absorb the load.

The cost of deploying a solution like this will make today's IDP deployments look like rounding error and the amount of time required to sell this into an enterprise will make today's sales cycles look like selling fast food.

Then we've got training. I know what the binary language of moisture vaperators, Rich knows the binary language of moisture vaperators, lots of Sguil users know it too. The majority of people who deploy these technologies do not. Giving them a complete session log of an FTP transfer is within their conceptual grasp, giving them a fully decoded DCERPC session is probably not. Who is going to make use of this data effectively? My personal feeling is that more of the analysis needs to be automated, but that's another topic.

One of the comments made to one of Rich's posts said
It seems that a lot of these SIM and IDS/IPS systems are really now being sold to small and medium enterprises without any regard to the amount of additional staff time and expertise that will be required to maintain them. Consequently I find that the ones I've used aren't oriented towards making investigation of an incident easier but are there simply to send out more alerts under the premise that more alerts is surely better because we're detecting and stopping more attacks.
That's incorrect. They're being sold to extremely large enterprises (Fortune 100) and when they're sold in those environments there's an expectation that they will scale. There is more data that we can get to the users of these systems for sure, but everything is an unrealistic expectation given the realities of the large enterprises that these technologies are sold into.

Recording everything doesn't scale today but maybe someday it will. Like after the Singularity.

Technorati Tags: , ,

4 Comments:

At 11:08 AM, Blogger bamm said...

I thought Rich did a good job responding to your post, but I'll throw my two cents in here. Current technology can make implementing NSM in a "large" environment tough, but just because you cannot collect 100% of the raw pcap on a network or even 100% of the session data does not mean you shouldn't collect any. I don't think it's feasible to say that Snort won't drop some packets or that RNA won't misidentify some flows, does that mean you shouldn't use either? I actual enjoy helping people implement NSM and Sguil in those large environments I only wish I had more time and experience doing it.

Do you really believe all those SIM/SEM companies out there really focusing on Fortune 100 companies? That implies these vendors cam only target 100 different companies (give or take a few as the ranks change slightly year to year). I realize the price tag on these products is outrageous, but I doubt they have a business plan that focuses on that small of a niche.

On a side note, anyone (and I am not implying you) who compares NSM to SIM/SEM/MARS/Snort/etc doesn't fully grasp the concept of NSM. NSM is a _process_ that should help you implement a _technology_ solution. SIM/SEM/MARS/Snort/RNA/Sguil/etc are all technologies. Too many people are out there skip the process part and simply implement technology without any type of charter. Unfortunately, most vendors out there are pretty good at fanning those misguided flames.

Bammkkkk

 
At 11:17 AM, Blogger Andrew Hay said...

Hey Marty,

I tend to agree with your observations on this topic but I can also see where Richard is getting his thoughts from. Richard comes from the old school of "packet analysis is king". In my experience these people are difficult to sell the benefits of an NBA solution as it cannot perform the deep packet inspection that they have come to rely on.

That being said, I don't think that any vendor has released a complete solution yet. A solution that gives the organization the high level view of traffic and security state of an NBA/SIM/SEIM solution AND the low level packet analysis and forensic capabilities that the packet heads want.

The company that perfects this will surely be the one that comes out on top in the SEM/SEIM/NBA/NBAD market in my opinion.

 
At 11:44 PM, Blogger Martin Roesch said...

Bamm:

I didn't say that it wasn't fine to do in a targeted manner, I was saying that doing it wholesale is impractical. The implication I got from Rich's post was that it was necessary to collect full traffic if alert data was going to be useful. If we're not recording everything how are we to set things up to prerecord only the sessions that you care about and discard the others? That implies a whole different level of intelligence.

It's very possible to make Snort record all these types of data and make it highly configurable, we may even choose to do so at some time in the future. It still doesn't make pervasive traffic logging practical in larger environments.

I know that all of the SIM/SEM companies aren't targeting the Fortune 100, but the biggest ones are. If you want to build a large security technology company you're going to have to do business with very large companies and large governmental entities. Pedanticism isn't going to make this discussion more productive, feel free to substitute "Fortune 100" with "medium to large enterprises and the government".

I understand what NSM is. If everyone out there was prepared to implement NSM then we'd be selling NSM. They aren't and since I like to get paid for a living we build what they're asking for and try to advance the state of the art when we can with stuff like RNA. As soon as I start seeing RFPs for NSM issued and paying customers asking for more NSM features you can bet I'll be implementing more technolgies to support it.

 
At 11:51 PM, Blogger Martin Roesch said...

Andrew:

I agree with Rich that NSM has a lot of value, I just wanted to point out that if I tried to do that at our larger users (in terms of deployment size) it'd be a disaster since there's no way it can scale when performed wholesale.

There are things that can be done to improve the state of the data we get today and I'm thinking about what that would look like. Stay tuned...

 

Post a Comment

<< Home