🛡️ How to Save Your SOC from Stagnation: Lessons from Carson Zimmerman’s SANS Blue Team Summit 2023 Keynote

Introduction :star2:

Security Operations Centers (SOCs) play a crucial role in safeguarding organizations from cyber threats. However, like most things, SOCs come with their own challenges, stagnation and employee burnout being popular among them. As someone who’s been working in the SOC for over five years, I’m always on the lookout for ways to improve our operations. I recently came across a talk by Carson Zimmerman at the SANS Blue Team Summit that offered valuable insights into avoiding SOC stagnation and fostering a culture of continuous improvement. Watch the full talk here.

The Seven Tools for SOC Empowerment :hammer_and_wrench:

Carson Zimmerman, an expert with extensive experience and two decades in SOC operations, laid out seven effective tools that can transform the way your SOC operates. Here’s a brief overview of each:

  • Tool 1: Detection V-Team (Virtual Team) :dart:
    A dynamic, cross-functional team focused on improving threat detection capabilities.
  • Tool 2: Investigation Improvements :man_detective:
    Techniques and workflows to enhance the quality and efficiency of security investigations.
  • Tool 3: Response Automation :robot:
    Leveraging automation to make your SOC more responsive and efficient.
  • Tool 4: Proactive Hunt :lion:
    A disciplined approach to actively searching for new threats that may have evaded traditional detection methods.
  • Tool 5: Standard Operating Procedures (SOPs) :clipboard:
    The playbook that ensures everyone in the SOC knows how to respond to various types of incidents.
  • Tool 6: Post-Incident Review :bar_chart:
    A structured review process to learn from past incidents and improve future responses.
  • Tool 7: Stronger Together :handshake:
    Fostering a culture of collaboration both within the SOC and with external partners.

Tool 1: Detection V-Team (Virtual Team) :dart:

What is a V-Team?

  • Definition: The term “V-Team” stands for Virtual Team. While it appears to be a concept used at Microsoft (source), the idea is universally applicable. A V-Team in this particular scenario is a dynamic, cross-functional group that includes triage analysts, detection engineers, threat hunters, and data analysts.

Key intake points for the Detection V-Team:

  • Customer Engagement: Tailor detection strategies based on end-user needs and feedback.
  • Proactive Hunt: Don’t just wait for alerts; actively search for new threats.
  • Incidents and Pen Tests: Use past experiences to refine future detection methods.
  • Intel and Partner SOCs: Collaborate for a more comprehensive defence.
  • Vendors and Community: Stay updated with the latest tools and strategies from external sources.

Routine V-Team Activities:

  • Requirements Gathering: What needs to be detected or improved?
  • Ideation: Brainstorm new or improved detection methods.
  • Curation: Select the best ideas to implement.

Operational Best Practices:

  • Meet Routinely: Keep track of progress and make necessary adjustments.
  • Rank/Prioritize Requirements: Focus on high-impact ideas.
  • Appoint a Facilitator: Keep meetings focused and productive.
  • Federate Execution: Assign tasks based on expertise.
  • Agile Methodologies: Consider agile frameworks for efficient project management.

The Bottom Line:

The V-Team takes on the responsibility of implementing, testing and validating detections. This ensures that those who write the detections are also invested in their effectiveness.

Tool 2: Investigation Improvements :man_detective:

The Concept of “Technical Hotwash”

Zimmerman introduces “Technical Hotwash,” a post-investigation debrief that serves as an informal yet invaluable training ground. This is where team members share their investigative techniques, from the queries they used to the data sources they utilised.

The Workflow:

  • New Case: Start with a new security incident.
  • Investigate: Conduct a thorough investigation.
  • Maybe Respond: Take necessary actions based on the investigation.
  • Technical Hotwash: Conduct a post-investigation debrief.
  • Next Case: Move on to the next security incident.

The Importance of “Show and Tell”

The Technical Hotwash is essentially a “show and tell” session. It’s an opportunity to share queries, data sources, notebooks, hunts, and tools. This is not just a debrief; it’s a learning experience for everyone involved.

What You Gain:

  • Curate Better Data: Improve the quality of data you collect.
  • Enhance Events and Alerts: Make your alerts more informative.
  • Improve Analytic Notebooks: Refine your notebooks for better analytics.
  • Optimize Queries and Code: Learn to write more effective queries and code.
  • Document Data Sources: Keep a record of where your data is coming from.

This is generally a great learning opportunity that can benefit multiple team members without incurring any financial costs.

Make It a Habit:

Plan capacity for Technical Hotwash. Whether you do it immediately after an incident, the next week, or even a year later, make it part of your culture. By planning this, you’re building continuous improvement in your SOC.

Tool 3: Response Automation :robot:

The Essence of Automation

Automation in a SOC is not just a shortcut to reduce manual tasks. It’s a strategic move to make your operations more responsive and efficient. Zimmerman advises starting small. Focus on tasks that offer immediate benefits but are low in risk.

The Framework for Response

Traditionally, you might focus on machine hostnames and IP addresses when thinking about automation. Zimmerman encourages us to think more broadly. Consider automating actions across different entities and systems, such as:

  • User/Accounts: Concentrate on the identity layer. This could involve actions related to user accounts and authentication mechanisms.
  • Commerce/Billing: Consider integrating with billing and commerce systems for automated checks or alerts.
  • Apps & Services: Think about the security of the applications and services in use. For example, if a malicious adversary creates a harmful application in your cloud environment, automation should help you detect and respond to it.
  • Other Ticketing Systems: Your automation can also extend to existing ticketing systems for a more streamlined response process.

Getting Started? Here’s How:

  • High Benefit, Low Regret Actions: Begin with tasks that offer immediate advantages but carry minimal risks.
  • Speed Up Human-to-Human Communications: Consider automating initial communications between team members through tickets or alerts.
  • Avoid Spamming: Be mindful not to flood team members with automated messages.
  • Tool Agnostic: Remember, you don’t need a specific SOAR (Security Orchestration, Automation, and Response) product to implement automation.

The Ultimate Goal

Evicting threats may not always be your end goal. However, focusing on high-benefit, low-regret actions, like evidence collection, can be a great starting point. Even simple steps, like enriching your alerts with additional data, can save time and resources.

Tool 4: Proactive Hunt :lion:

What is Proactive Hunting?

Proactive hunting is not just aimless data exploration or routine alert investigation. It’s a disciplined, structured approach to finding threats that might evade conventional detection tools. Understanding the business context and the types of adversaries likely to target your organization can make your hunts more effective.

“Cyber threat hunting is a proactive security search through networks, endpoints, and datasets to hunt malicious, suspicious, or risky activities that have evaded detection by existing tools.” — Trellix

The Proactive Hunt Workflow

The process begins with planning, ideation, and hypothesis formation. This is often informed by various sources such as:

  • Threat Intelligence
  • External Tips
  • Case Work
  • Internal Partners

These sources feed into the initial stages where you prepare and gather data for analysis. The workflow can be summarized as:

  1. Plan, Ideate, Hypothesize: Prepare and gather the necessary data.
  2. Query/Analyze: Use this data to validate or invalidate your hypotheses.
  3. Set and Refine ‘Hunt Traps’: These are analytics used during the hunt to bring forth data that can prove or disprove your hypotheses.

Key Considerations

  • Use Data In Place: Don’t wait for perfect data; use what you have where it is. Don’t block your hunt on having ‘perfect’ data.
  • Timed Queries: Leverage queries on a timer for your traps.
  • Transitioning Hunts: Be clear about when you transition a hunt to a routine detection.

The Importance of Structure and Discipline

Just like with the Detection V-Team, the goal is to bring together various experts within the SOC to instil structure, order, priority, and consistent execution into the hunting process. This involves gathering data, running queries, and setting up hunt traps. Maintaining discipline is crucial during a hunt. It’s easy to get sidetracked by intriguing but unrelated findings. Instead of going down these tangential paths, it’s advisable to log these incidental findings for future investigation. This ensures that the team stays focused on validating or debunking the current threat hypothesis.

Time-Bounded Hunts

Consider having bounded timelines for your hunts. If you can’t validate or invalidate your hypotheses within a set timeframe, it might be best to move on to other priorities.

In Summary

Proactive hunting is a critical component for enhancing the effectiveness of a Security Operations Center. It requires a disciplined and structured approach, informed by various data sources and guided by well-defined hypotheses. The use of ‘hunt traps’ and analytics is a key component of a successful hunt and can also inspire future detection work. The success of proactive hunting is not solely dependent on technology but also on the methodology and focus of the team involved.

Tool 5: Standard Operating Procedures (SOPs) :clipboard:

The Importance of SOPs

Standard Operating Procedures (SOPs) serve as the playbook for the Security Operations Center (SOC). They are essential for ensuring that everyone on the team is on the same page and knows how to respond to various types of incidents. SOPs should be concise, up-to-date, and universally understood within the team.

Who Should Write SOPs?

The responsibility for writing and updating SOPs should not fall solely on the SOC manager or lead analysts. Instead, everyone from the most junior to the most senior members should be involved. This collective approach is the best defence against turnover and mistakes.

What Triggers an SOP Update?

Several factors can necessitate the creation of new SOPs or the updating of existing ones. These include:

  • New case types
  • Major incidents
  • Introduction of new or improved tools
  • Hunts
  • Operational mistakes
  • Organizational restructures

These triggers should be built into the SOC’s culture, ensuring that SOPs are continually refined and updated.

The SOP Lifecycle

  1. Create/Update: Draft the SOP focusing on the specifics of the task or incident type.
  2. Review: Have the SOP reviewed by team members for clarity and effectiveness.
  3. Approve: Depending on the level of criticality and risk, the SOP may require formal approval.
  4. Use: Implement the SOP in daily operations.
  5. Iterate: After using the SOP, revisit it for any necessary updates or refinements.

Keeping SOPs Effective

  • Conciseness: SOPs should be short and to the point, ideally no longer than 4–5 pages. They should be Concise; who, what, when, where, why.
  • Specificity: Each SOP should be scoped to a specific set of conditions and outcomes.
  • Accessibility: Store SOPs in an easily accessible format, whether it’s a Wiki, OneNote, Git, or Confluence.

A Word on Complexity

If anyone claims they have all the SOPs in their head, they’re mistaken. The complexity of SOC operations is too great for any one person to fully grasp the entire aspect of the operation. Therefore, SOPs serve as a collective knowledge base that no one should underestimate.

In Summary

SOPs are not just documents; they are living, evolving guides that help standardize responses and actions in the SOC. They should be a part of everyone’s role, from junior analysts to senior managers. Regular updates and reviews ensure that they remain relevant and effective, making them a cornerstone of a resilient and efficient SOC.

Tool 6: Post-Incident Review (PIR):bar_chart:

The Essence of Post-Incident Review

A Post-Incident Review (PIR) is more than just looking back at what went wrong; it’s a chance to get better. Zimmerman suggests using a “slop bucket,” a place where team members can quickly note down problems they run into during an incident. This helps the team figure out what needs the most attention and fixing.

The PIR Workflow

The process can be summarized as follows:

  1. Incident Creation: Identify and log the incident.
  2. Investigation & Response: Conduct a thorough investigation and take necessary actions.
  3. PIR “Slop Bucket”: Throughout the incident, team members can contribute to the slop bucket by noting down issues or challenges they face.

The “Slop Bucket” Explained

The “slop bucket” is an informal yet crucial part of the PIR process. Anyone involved in the SOC can contribute by writing brief notes about problems encountered during incidents. This creates a pool of potential improvements that can be formally reviewed and acted upon later.

Formalizing the PIR

  1. Groom your ‘Slop’: Prioritize the issues noted in the slop bucket and turn them into formal PIRs.
  2. Assign Ownership: Each PIR should be owned, tracked, and reported by a responsible individual.
  3. Drive Accountability: Ensure that each PIR is followed through to closure.
  4. Embrace the Red: Don’t shy away from highlighting issues, no matter how unflattering they may be.

The Impact of PIR

The PIR process not only helps in identifying areas for improvement but also provides a measurable way to track changes over time. It helps you understand what’s getting better and what still needs work, thereby driving continuous improvement in your SOC.

Closing Thoughts

The Post-Incident Review isn’t just paperwork; it’s a key way to make your team stronger. By using this process, you’re spotting weak points and making plans to fix them. It’s all about learning from the past to do better in the future.

Tool 7: Stronger Together :handshake:

The Power of Collaboration

Collaboration shouldn’t be confined to only members of the SOC. It’s also about reaching out to other departments for better data management and even partnering with external organizations for valuable threat intelligence. A SOC that collaborates is much more likely to thrive.

Types of Engagement

Collaboration can be both reactive and proactive:

  • Reactive: Involves working together during incidents for investigation and response.
  • Proactive: This includes preventive measures like system hygiene, software patching, creating new custom detections, and engaging in proactive threat hunting across other teams/clients.

Onboarding and Automation

Having a well-documented onboarding process is crucial, especially for teams integrating into the SOC. This not only accelerates the learning curve for new team members and collaboration but can also be automated to enhance operational efficiency. Additionally, this onboarding process can be designed to automatically enrol teams in additional security capabilities such as vulnerability management scanning, thereby turning it into a convenient self-service feature and increasing the organisation’s security posture.

Data Management

  • Familiar Data Platforms: Make sure everyone knows how to use the data platforms that the SOC relies on.
  • Data Access and Sharing Rules: Establish clear guidelines for who can access what data and how it can be shared.

Response Guidelines

  • Escalation and Response: Set clear boundaries and procedures for how to escalate issues and respond to incidents.

Closing Thoughts

Collaboration is a necessity for a successful SOC. By fostering a culture of teamwork both within and outside the SOC, you’re setting the stage for more effective and efficient operations. It’s not just about working together; it’s about working smarter.

Making It Happen: Implementing the Seven Tools for SOC Empowerment :hammer_and_wrench::star2:

The Road to Improvement

Implementing these seven tools is not just a one-time effort; it’s a journey towards continuous improvement. Here are some tips to make this journey smoother and more effective:

Tips for Doing Less

  • Identify Labor Drains: Examine what tasks are consuming most of your team’s time and see if they can be automated or optimized.
  • Review Alerts: Get rid of or improve useless detections that add noise rather than value.
  • Talk to Compliance: If you find yourself bogged down by box-checking, consult with compliance teams and legal advisors to streamline the process.

Tips for Doing More

  • Long-Term Planning: Engage in long-term investment planning every 6–12 months to align your SOC’s capabilities with organizational needs.
  • Resource Coordination: Consider lightweight approaches like Agile Scrum for regular resource planning and coordination.
  • Accountability: Hold team members accountable for regular progress on implementing these tools.

The Bottom Line

  • Start Now: You don’t need a large team or a mature SOC to start implementing these tools. They can be adapted to fit SOCs of any size or age.
  • Make Time: If implemented correctly, these tools should become as integral to your team’s routine as handling alerts and cases.
  • Be Prepared: With these tools in place, your analysts will be better equipped to handle the next big incident, making your SOC more resilient and effective.

By incorporating these seven tools into your SOC’s operations, you’re not just improving your current capabilities; you’re building a foundation for future success. So start today, and enhance your SOCs efficiency, resilience, and ability to continuously improve.

What’s Next?

If you enjoyed this content and want to stay updated on how to defend yourself against breaches, connect with me. I’ll be posting much more in the future! Additionally, I’ve Carson Zimmerman’s book ’11 Strategies for a World-class SOC Cybersecurity Operations Center’ to our Cybersecurity books wiki.

Do you have any questions or would like to share your thoughts? Be sure to join and contribute here! :busts_in_silhouette: Join Our Community

:globe_with_meridians: Connect with me:

:earth_africa: Steve @ Crushing Security
:bird: Twitter
:newspaper: Stay Updated with Our Cybersecurity Newsletter
:heart: Support Our Mission