Wall of Shame: Logging

Firstly, I wanted to call this Log Management - but then it occured to me that the point of this post is that few people actually log, let alone manage the logs at all. So I felt it was more prudent to identify the failure specifically. This is almost worse than than patching.
Lets assume for a second that the worst has happened. Your website owned and defaced. Your data stolen and sold to some state sponsored entity or criminal syndicate. You are in the front page news for all the wrong reasons. Your CIO is screaming down the phone and you have to sit infront of a governance board to explain what went wrong within 30min.

However there is one tiny problem (yes - over and beyond the above).

You have no logs. Nothing.

You thought you were logging - I mean most applications do right? Only as you investigate you uncover the following:

the Apache logs were never enabled due to "performance issues" (at least your Application Support team tells you).
the system logs were suspiciously clean.
even error logs, IDS, network logs (netflow, ACL/firewall logs, etc) all come up null. Again, suspiciously clean.

You have been told to start giving answers, only you have none to give.

This is NOT a position you want to find yourself in!!!

Yet, you'd be amazed how often it happens. Or more specifically businesses (big and small) leave themselves susceptible to this sort of incident. Most people accept a bare minimum in their logging standards and yet don't realise their true purpose.

Despite being rooted in providing an audit trail of potential security events, they also help provide audit trails in general. In turn, fulfil audit and compliance requirements. Sarbanes Oxley is built on ensuring these sorts of checks are in place. However I'm not even talking about the level of compliance often required under Sarbanes Oxley. I'm talking basic logging of security events that is either never done or done so poorly as to be laughable.

I often see the centralised log collector used for all sorts of things. Shameful examples I have seen:

central log collectors used as bastion hosts by various admins (very common, mostly due to where these boxes are positioned on the network based on existing network segments);
the log collector used as a miscellaneous dumping ground (e.g. router configs, torrent files, etc),
a large portion of the IT infrastructure teams with full administrator access to the log collector;
directory services that did ZERO logging due to performance impacts and proprietary vendor lock in forcing the client to purchase a commercial product if they wanted security event logs;
ERP billing systems that did zero audits (total breach of SarOx) due to performance constraints and lack of vendor know how on what to implement let alone how.

This is just off the top of my head.

Everyone, logs are your friends. If you can't be stuffed patching, implementing proper application security, identity management, etc - then for gods sake, do one thing right: get you logs in order.

Some advice, in order of evolving maturity:

Any logs are better than none.
Build a dedicated log collector, and nail that sucker down.
Build dedicated processes around the management and response to those logs.
Do you want more logs?
Think strategically

For anyone starting on this journey, I recommend you start with the NIST guide 800-92.

1. Any logs are better than none.
Yep. Its lame. But when we're talking dragging an organisation into the 20th century (yes, 20th) this is often a huge hurdle. Who will view the logs? How? Who will action them? Will the task be rotated to prevent staff boredom and lapse in response?

Yeah it sucks. I know. I've spent 8 hr shifts in the past reviewing logs. Believe me when I say I know the pain. You'd be amazed at how date/time stamps, IP addresses and port numbers blur in that time. I swear you can see ASCII art like Neo. But if you play your cards right, you can get through this phase early and quickly if you jump to phase 3 fast.

2. Build a dedicated log collector, and nail that sucker down.
You want a dedicated log collector for several reasons:

if the host is compromised, all data retained on it is suspect. however if logs are sent to a dedicated collector real time then the potential for them to be compromised is virtually nil.
in turn, this helps to preserve the body of evidence - particularly if legal proceedings are required.
it enables a more strategic approach in future (will get there later).
it makes security operations easier as there is a single repository for investigations.

Hardening is absolutely required because you need to ensure that chain of evidence. By that I mean:

The system should be sufficiently hardened in line with best practise (duh!),
It should reside in a fully segmented network zone, firewall, dedicated VLAN, etc. This could be a consolidated security operations DMZ.
User access should be restricted to security operations staff only.
Backup/Archiving options should be in alignment with your data retention requirements as provided by legal council. I usually say 18 months as the 7 years is generally only required for financial transactions. I keep getting flip-flop advice from security professionals and legal eagles on this one. At the end of the day, I'll defer to the lawyers on this.
If you're worried about the hard drive filling up, just get as big a disk as you can and set the logs to rotate at a certain volume. If you're even more worried about clobbering your logs, see point 1.

This host is going to be relied upon heavily and built upon over time. You may decide to do all sorts of funky things. Data replication, multiple nodes due to log volume, high availability

3. Build dedicated processes around the management and response to those logs.
Ok, you are now collecting logs, have a central interface to view and search, what now? Its time to start thinking about what you want to do them.

What logs are you collecting? What events are you collecting?

In the early days you may lack any form of automation of any kind. You might configure syslog to give you the bare minimum (e.g. SU events, SSH logs - success/fail, that sort of thing). Some may disagree but I say that's fine. Anything more at this point and you're going to be overwhelmed and probably do nothing with them. Start light and build up over time.

Get the events for Unix hosts sent to the Unix team. Windows events to the Windows teams. Don't know how to get syslog events for Windows hosts? Look at tools like Snare. Not sure what Windows security events are or where to start? Try here.

Getting to phase 3 I think is the "bare minimum" when it comes to logging. And there's no shortage of open source tools to help either. If you do no logging today (or poor logging) this is your target end state for the moment.

Being the early stage, seriously don't worry about logs filling up too much. Try to undertake some capacity planning and forecasting but don't sweat it if you get it wrong. Eventually you'll have the data you need to make more accurate estimates but not likely at this initial stage. That said, this is early days and disk being as cheap as it is (assuming you're not using SAN just yet) then you'll be fine. If in doubt, remember rule #1!

4. Do you want more logs?
Finally you're at a point you can be more bold with your logging. You've covered your critical hosts? Check. What about IDS/IPS? All your hosts (Unix and Windows)? Web server/application logs? Authentication services such as Active Directory, Novell, RADIUS, TACACS+? Remote access services (Citrix or VPNs)?
I haven't even discussed integrity logging yet! If you can look at integrity monitoring/logging, please do so. Prioritise your critical hosts and go from there. This is something that really is neglected in a lot of environments yet I'm loathe to advocate it (despite how important I think it is) simply because of the number of businesses that aren't even here yet (i.e. still at stage 0!).

If you have finally reached this phase however, then this the part where you get to start to slowly and incrementally build upon your foundation.

Use a risk based approach. What aren't you logging that you really should? Equally important, how much more information can you take in without being overloaded? Are the processes currently working? If not, how can they be changed or otherwise optimised?

This is the point where you start hitting the upper limits of procedural and technical capabilities of your current environment. If you cannot take on any more events but know you need to, its time to take the next step.

5. Think strategically
Time to start thinking about how you want to handle security events on a larger scale. This involves a lot of discussions with various stakeholders - senior management, technical leads (network engineers, system admins, etc). How can you take in more events, ensure your compliance objectives and stay ontop of them as new compliance regimes come to the fore.

Here's some questions for you to consider:

How big is the security team at present? Do they have room to grow? Is there budget to allow it? If not can you build a business case to justify it? (Business cases for security staff are worthy of a post in their own right - but another day perhaps). If not, can other teams handle these events?
What be done to automate the response of security events? Can suspicious events be automatically responded to - further eliminating the need for manual processes? (e.g. you see 10 failed SSH login attempts to your SAP server at 1:04am and it isn't within a dedicated change window. you could argue its safe to auto-block and deal with the fallout later rather than having to manually investigate the host in the morning or being paged at ungodly hours).
Is event correlation a functional requirement (e.g. connect the dots between disparate logs). Are you willing to invest in an security information event management appliance to achieve this? This is a BIG question and perhaps starting to drill towards the heart of point 5. If you are thinking about it, you need to be mindful of what key technologies that a SIEM appliance must support? Also are you able and willing to spend the time learning how to use it, configure the appliance, tweak it and support it? This can be a 6 month+ journey for a given organisation. Tuning these things is often the most ghastly part of their implementation (but it does pay off in the end).
If you can't or won't move towards a SIEM, can you manage a reduced set of interfaces at the bare minimum with the logging information you require (e.g. anti virus, syslog events, IDS/IPS represents three consoles that would have to be checked regularly if not constantly). Try to factor in the cost of administering these machines on a daily basis as well into your justification of whether or not you can afford a SIEM. Reduction in staff overhead, licensing and admin costs is a cost benefit, I don't care what anyone else says!
Once you have implemented the solution of your choice, always review your processes and tech. Is it working? Are you reasonably sure you are capturing events? Are your security incidents showing events that simply are not being tracked by your SIEM (or newly optimised log management) appliance? Is there a new compliance regime with new auditing/logging requiremetns over and beyond your existing practises? If so, this is a sign that things need a change. However, if you are at this phase you are well poised to make the adaptations required.

If you hit step 5, pat yourself on the back. You're now 99.99% better than everyone else!

Apart from NIST 800-92 I also wanted to point people to Dr. Anton Chuvakin. He's done some awesome work on loggigng is also a champion of the security metrics space. If you're wanting some additional guidance and thought leadership on this subject, I suggest you check his material out as he's definitely the "go to" man on this subject.Even SANS says so. Also, since we're on the subject of logging, no post would be complete without a reference to the securitymetrics.org site. I haven't been as diligent in following this site as I would like - but I believe there is good work in progress here that we should all be paying mindful attention to.

Cheers

- J.

/dev/null - ramblings of an infosec professional

Monday, March 29, 2010

Wall of Shame: Logging

1 comment:

Blog Archive

About Me