Monday, January 29, 2007

Thoughts on Alerts


I've been thinking about my previous post regarding NSM methods and the "log everything" mentality that I believe is unworkable in medium to large environments. Given that I'm a guy who doesn't like to give people "it's impossible" for an answer and I don't like "unsolved" problems, I've been thinking about some of the other things that could be put into events that would make them more useful for NSM-style incident analysis. My thinking on this topic was further bolstered by Bejtlich's recent post on his NSM process.

Given that "alertocentrism" is a Bad Thing, what are some of the other things we could do with an engine like Snort that could add value to the events that it generates? I'm not going to recommend logging everything, although you certainly could do that pretty easily. I noticed from the post referenced above that flow analysis seems to constitute a large portion of the time that is spent performing NSM. Given that Snort 2.x (and 3.x) already have the ability to log flow information (albeit somewhat limited in stream4), what are the things that we could do to improve alerts?

A Snort unified alert typically contains the following information:

  • An Event structure containing
  • generator ID
  • snort ID
  • snort ID Revision number
  • classification ID
  • priority
  • event ID
  • event reference
  • event reference time
  • Event packet information containing
  • packet timestamp
  • source IP
  • destination IP
  • source port/Icmp code
  • dest port/icmp type
  • protocol number
  • event flags
Additionally, flow records from Snort (stream4) look like this:
  • start time
  • end time
  • server (responder) IP
  • client (initiator) IP
  • server port
  • client port
  • server packet count
  • client packet count
  • server byte count
  • client byte count
I've been thinking that one thing that could be done that would be pretty easy and add some value would be to add "point-in-time" flow summary data to Snort events. The idea behind doing this would be to add the data for the flow that the event occurred upon to the event data. Something like this:
  • Event structure (as above)
  • Event packet info (as above)
  • "Flow point" information including
  • flow start time
  • last packet time
  • initiator packet count
  • initiator bytes
  • responder packet count
  • responder bytes
  • initiator TCP flag aggregate (if any)
  • responder TCP flag aggregate
  • last packet originator (initiator/receiver)
  • alerts on flow (count)
  • flow flags (bitmap)
I think that this kind of information could certainly be useful for putting an event into context within its flow, the analyst could see if there has been bidirectional interaction prior to the event, get a sense for the number of alerts on the flow prior to the current event, etc.

There are some other things that could be done along with this. I think that adding in flow point data along with doing things like post-event packet logging would probably be more useful than what we have today. I know post-event logging is not what you want in a full-blown NSM context but it certainly helps to constrain the data management issue associated with just logging every packet and it's better than nothing. I suppose we could also add things like persistent logging to the system as an option (thinking more in the Snort 3.0 timeframe) to allow continuous logging of selected packet traffic, of course this is a DoS waiting to happen so it'd have to be turned off by default and have some pretty serious constraint logic associated with it (in terms of port/protocol/IP filtering).

I'm going to think about this more, anyone NSM-heads have any thoughts on the topic?

Technorati Tags: , , ,

Tuesday, January 16, 2007

10 Pounds of Packets in a 5 Pound Bag


Richard Bejtlich has been talking a lot about the difference between Network Security Monitoring (NSM) and "alert-centric" technologies like Snort. His basic premise is that "real" NSM requires more than just IDP alerts and packet logs, it requires event notifications, full packet logs of the entire network and flow data as well. He also quotes me as saying "Richard, I wrote Snort so you don't have to look at packets". This isn't quite right. I think what I actually said was "Did you think about how much data you're going to record if you do that on a high speed network? We wrote IDS so that we wouldn't have to record everything."

I get it, I understand what the NSM guys are saying and I really don't disagree with them at all. The problem I have is that if you try to deploy this concept in a large network environment with lots and lots of sensors, you've got some big problems to overcome. Let's look at the problems.

1) Flow aggregation - As I see from Richard's latest post on Cico's MARS product, he wants the raw flow data, not just statistical NetFlow rollups. RNA does that already, as can Snort with the right options turned on. This works fine as long as the network environment is relatively small and you don't try to roll up all of the data for post processing and analysis. If you do aggregate it to a central collector, then you've got the multiplication problem on the aggregation link(s), namely the more traffic the sensors see and the more sensors you have the more they pump up to the collector and have to push into the database that you're using to be able to manage all this information. If you're in an environment where you're aggregating more than a few million flows per hour, that's a ton of data to manage if you figure ~40-bytes per flow record (in binary format). That's 200MB of data just for flow data for an hour, almost 5GB per day. 150GB per month. Those also have to get blasted into a database so they can be worked with, so you're going to be translating all that data into SQL insert statements and then pumping it into the local database on your aggregation machine or across the network to a database server (or cluster). That's a lot of processing, a lot of network bandwidth and a lot of disk, not to mention a lot of RAM to maintain the indices in memory for the database. It's not that this isn't doable, but now we're talking about offloading the work across multiple machines at a minimum and that's going to increase your costs dramatically. Overall this isn't a huge problem (the NetFlow analysis/NBA guys do it for a living) but it is a big one in any large enterprise, it takes a lot of work to scale technology to work with it effectively.

2) Traffic aggregation - If you thought the flow aggregation problem was fun then start logging all the traffic on your network. Let's take a fairly well utilized modern enterprise network backbone running at a sustained 500Mbps, that's 62.5MBps of data to record on a single sensor. 225GB/hour of packet traffic, 5.4TB per day from a single sensor. All that data is going to need to be rolled up too, unless you're going to spool it into a local database and do distributed queries across the network for packet traces. At that kind of data density your NSM sensor is going to need a NAS device someplace nearby so that the data can be stored, it's going to be really hard to do that on a 1U appliance just due to physical drive space limitations. Once you have all that data, you're going to need to be able to work with it, so it's got to be in a database or it has to be indexed on the filesystem in some logical fashion so that smaller chunks of data can be rapidly located, decoded and presented to the user on demand. There are companies that build products to do this, I can't really speak to their effectiveness. I can hook a high-speed collection process like daemonlogger up to a big disk and grab all this data, but once again how much value are you really getting for recording all that data vs the logistical overhead of trying to maintain all that information in a usable fashion for extended periods of time? What's the time horizon if this data? Do I need to keep a week/month/year of this data live in a database for referential purposes? If there's going to be any expectation of success the amount of data that's kept "live" is going to have to have some pragmatic constraints.

3) Alert aggregation - This is what IDP vendors spend their time working on getting to their users. We have pretty well established metrics as to what is acceptable in this realm in terms of sustainable event rates, data overload thresholds for analysts, data density and so on. This is the de facto standard in IDP because this is the thing that people are paying for, we've got to generate the events and everyone wants to see them since that's what the technology is supposed to be doing. This is a lot of data to deal with too, and this is the raw information that analysts have to work with. We do a lot at Sourcefire to pare down the number of events analysts have to deal with via our Impact Assessment technology that's enabled by RNA, so it is possible to do effectively in large environments even with less than optimal tuning of the sensor infrastructure.

When I started Sourcefire one of the things that I decided to do to get people to want to pay for something that was free (i.e. Snort) was to try solving their data management problems. If you look at most of the IDS vendors before Sourcefire was founded, they would sell you IDS sensors and management front-ends but they wouldn't solve the biggest problem that most people would have once they deployed the technology, namely managing the information produced by the sensors. As we all know, IDS can generate immense amounts of data with just alerts and if you want to be able to work with that data it needs to go into a database that has been optimized for the data set. Prior to Sourcefire, you could buy $250k worth of sensors from vendor X and when you deployed the sensor grid you'd call vendor X and ask them how you're going to manage all those alerts. Their answer was typically "go call Oracle, they make a really nice database and we'll sell you professional services if you need help setting it up." This greatly increased the cost and complexity of deployment of the IDS solutions. When Sourcefire started I decided that this was an area where we could add real value, so we built what is now called Defense Center allowing customers to have a plug-n-play appliance that solved their data management problems and provided a path to deploy large infrastructures of our gear quickly. As you can see from our S-1 filing, this was probably a Good Idea.

A "real" NSM infrastructure is going to primarily be built around the idea of collecting, moving and storing data and then making it highly available in a variety of presentation formats for users. If you try to do this on a network that's generating lots of traffic across lots of sensors/segments, the likelihood of building a scalable solution that anyone is willing to pay for is vanishingly low. You're going to need hundreds of terabytes of disk, a dedicated out of band management network for moving data, huge database servers AND the management and sensing infrastructure to actually grab the data.

Now we want to scale it. I know from experience that there are large distributed international enterprises out there that have remote offices sitting on the other side of 128kbps (and below) links. They get really irritated when you saturate that link to pump out a continuous stream of security data. These organizations also have core networks with 10Gbps links that can sustain 2+Gbps of internal traffic for hours. That's a couple terabytes per hour of traffic you want to log, give or take, just in the core. Then you have the rest of the enterprise with 100+ sensors deployed that are seeing varying amounts of traffic but say none of them go below 10Mbps typically, so that's another TB of data every hour you've got to collect and forward to a central aggregation point. Then we throw in the flow data (lots of small records to insert) and the event data (more small records to insert) and you've got a data aggregation nightmare. Concentrating this data to a central collector or a load balanced set of collectors will saturate a gigabit line so you're going to either have to figure out how to leave it local on the sensors and perform distributed queries against it or you're going to have to deploy a bunch of additional network gear to absorb the load.

The cost of deploying a solution like this will make today's IDP deployments look like rounding error and the amount of time required to sell this into an enterprise will make today's sales cycles look like selling fast food.

Then we've got training. I know what the binary language of moisture vaperators, Rich knows the binary language of moisture vaperators, lots of Sguil users know it too. The majority of people who deploy these technologies do not. Giving them a complete session log of an FTP transfer is within their conceptual grasp, giving them a fully decoded DCERPC session is probably not. Who is going to make use of this data effectively? My personal feeling is that more of the analysis needs to be automated, but that's another topic.

One of the comments made to one of Rich's posts said
It seems that a lot of these SIM and IDS/IPS systems are really now being sold to small and medium enterprises without any regard to the amount of additional staff time and expertise that will be required to maintain them. Consequently I find that the ones I've used aren't oriented towards making investigation of an incident easier but are there simply to send out more alerts under the premise that more alerts is surely better because we're detecting and stopping more attacks.
That's incorrect. They're being sold to extremely large enterprises (Fortune 100) and when they're sold in those environments there's an expectation that they will scale. There is more data that we can get to the users of these systems for sure, but everything is an unrealistic expectation given the realities of the large enterprises that these technologies are sold into.

Recording everything doesn't scale today but maybe someday it will. Like after the Singularity.

Technorati Tags: , ,

Upgrading the Apparatus


I've acquired several new gadgets over the past four months and I thought that people might be interested in my experiences.

1) New phone - HTC TyTN
With Sourcefire expanding like it is, I need a phone that I can use anywhere in the world. I've been using a Sony z600 for a few years now and buying prepaid SIM cards for whatever country I'm heading to. The downside of this is that nobody can really call me when I'm abroad and if the SIM card runs out of money then I've got to jump through hoops to get it working again. I solved that problem by moving to Cingular and getting the TyTN. The TyTN is a quad-band GSM phone with tri-band HSDPA data and 802.11, plus Bluetooth. It runs Windows Mobile 5 but is still pretty useful despite that. Compared to the Treo 650 it's replacing it's been reliable and pretty straight forward to use. The Windows paradigm is so so on the mobile platform, it's certainly got a lot of clicking required to perform complex tasks compared to Palm. That said, it's a good phone and runs the apps that I need on a mobile device (Mail, IM, SSH and a web browser). Overall I've been about as happy with this phone as I am with any cell phone, the data connectivity is nice and it worked well when I was overseas in December.

2) New laptop - MacBook Pro
I passed on the initial MacBook Pro release and waited very patiently for the Core 2 Duo processors to make their debut in the Apple laptop line and was rewarded with this very nice machine. The Core 2 Duo chip has a few nice new features over the first generation Core Duo chips including the EM64T instruction set, a larger L2 cache, higher performance and lower power consumption. It turns out it also runs cooler than the Core Duo.

Since my laptop is my primary development/presentation/communications/everything machine, I got it maxxed out with a 2.33GHz CPU, 3GB of RAM and the 200GB hard drive and the glossy screen option. It's a heck of a lot faster than the PowerBook it's replacing for pretty much everything I do except maybe MS Office since it's run in emulation via Rosetta. It's also great for running Parallels, the OS X virtualization environment. I've been running XP under Parallels for various esoteric applications, like running my telescope and CCD cameras for astrophotography and it works like a champ.

This is without a doubt the best laptop I've ever owned, it's fast, stable and like all Macs, it just works. It's a great development platform, a great travel machine and all around nice computer.

3) Novatel Wireless XU870 HSDPA modem

I used to use a Novatel EV-DO card on Sprint for my mobile internet needs but the MacBook Pro has an ExpressCard/34 slot and there was no EV-DO card available for it. Luckily, since I was switching to Cingular anyway I found this card. It supports GPRS/EDGE/UMTS/HSDPA data connections up to 3.6 Mbps and uses a standard SIM card for network access. It also has drivers for OS X available, so it's pretty much a winner across the board. I got a second 3G SIM card from Cingular, set it up as a modem in OS X and it was off to the races! I also got international data roaming turned on for the SIM card so it even worked overseas. I've been surprised at how often it's been able to find an HSDPA signal to connect to, it's worked really well everywhere I've tried to use it (including France and the UK).

Supposedly there's a firmware upgrade that will be coming in the not too distant future that will allow the modem to handle up to 7.2 Mbps over the air. Once it comes out I'll be sure to post my experiences with it.

Technorati Tags: , ,